Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

This is the code repository with the implementations of our submission. You can find our unified label set under data/unified_labels.json.

Abstract

Discourse understanding is essential for many NLP tasks, yet most existing work remains constrained by framework-dependent discourse representations.

This work investigates whether large language models (LLMs) capture discourse knowledge that generalizes across languages and frameworks. We address this question along two dimensions: (1) developing a unified discourse relation label set to facilitate cross-lingual and cross-framework discourse analysis, and (2) probing LLMs to assess whether they encode generalizable discourse abstractions.

Using multilingual discourse relation classification as a testbed, we examine a comprehensive set of $23$ LLMs of varying sizes and multilingual capabilities. Our results show that LLMs, especially those with multilingual training corpora, can generalize discourse information across languages and frameworks. Further layer-wise analyses reveal that language generalization at the discourse level is most salient in the intermediate layers. Lastly, our error analysis provides an account of challenging relation classes.

Quick Start

To reproduce our results, you need to first download the DISRPT 2023 dataset (see next section). Furthermore, you need to install a few non-standard python packages:

pip install torch transformers pandas numpy plotly scikit-learn

We then provide a script to obtain .csv tables of the relation classification datasets by running

python discourse/read_disrpt.py

which will create new partitions in the directory where you put your DISRPT files. You should copy these datasets to a new folder, let's say rel_embeddings_aya23_35b/. Finally, you can start computing attention representations using the following script:

python discourse/encode_att.py results/rel_embeddings_aya23_35b/ CohereForAI/aya-23-35B

where the last argument refers to the Huggingface identifier of the model you want to use. The script will create necessary subdirectory and store checkpoints every 100 documents which will be skipped if you restart the script. Once the encoding is terminated you can use our probing scripts to reproduce our probing results:

python discourse/run_probes.py discourse/rel_embeddings_aya23_35b/ unified  # probes using all layers of the model
python discourse/run_probes.py discourse/rel_embeddings_aya23_35b/ layer-wise  # layer-wise probes
python discourse/run_probes.py discourse/rel_embeddings_aya23_35b/ store-full-preds  # probe storing all prediction data and probe weights

# equivalent to all above:
python discourse/run_probes.py discourse/rel_embeddings_aya23_35b/ all

Once the probes have been calculated, you can run our plotting script to generate the plots and tables seen in the paper. They will be stored under results/disco_probe_results/.

python discourse/plots/make_plots.py

for your convenience, we've added the results of our own run in there already.

Data

In order to run our experiments, you need to download a copy of the DISRPT 2023 dataset. You can find instructions on that in their repository. The resulting dataset should be put in data/disrpt_private/ to work seamlessly with our code.

DisCoDisCo

We implemented the results of the DisCoDisCo reference using the code provided in their repository. Instructions on how to run it are provided there. In discodisco/ you can find the configs and bash scripts used to run our specific experiments. Here, you need to map the DISRPT dataset to our unified label set first by running the following command which will update the original DISRPT data with our proposed unified label set and create the corresponding dataset directories under discodisco/data_unified/:

python discodisco/scripts/data_modify.py

then, the following command will create the combined DISRPT data under discodisco/output/mul.dis.all/

python discodisco/scripts/combine_data.py

and then you can start training the model using our config and their script as follows

bash run_single_flair_clone_mul.sh mul.dis.all

where mul.dis.all contains the combined .rels and .conllu files. To obtain dataset-wise accuracy for all datasets, use the following script:

bash run_all_flair_clone_mul_test.sh

AI Usage Disclaimer

The code in this repository has been written with the support of code completions of an AI coding assistant, namely GitHub Copilot. Completions were mostly single lines up to a few lines of code and were always checked carefully to ensure their functionality and safety. Furthermore, we did our best to avoid accepting code completions that would be incompatible with the license of our code or could be regarded as plagiarism.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
discodisco		discodisco
discourse_probes		discourse_probes
images		images
results/disco_probe_results		results/disco_probe_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

Abstract

Quick Start

Data

DisCoDisCo

AI Usage Disclaimer

About

Releases

Packages

Languages

License

mainlp/discourse_probes

Folders and files

Latest commit

History

Repository files navigation

Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

Abstract

Quick Start

Data

DisCoDisCo

AI Usage Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages