Dec2Enc

Code for Transforming Decoder-Only Models into Encoder-Only Models with Improved Understanding Capabilities (Under Review).

We propose Dec2Enc, which transforms decoder-only models into encoder-only models by recovering bidirectional attention, thereby improving their understanding potential.
In particular, Dec2Enc uses a zero initialization strategy that begins fine-tuning with the original causal attention mechanism, gradually learning bidirectional attention during training, which mitigates the significant training disruptions that arise from mismatches between the attention mechanisms used in pre-training and fine-tuning.

Pip Installation

pip install -r requirements.txt

We use CLEAN dataset, CMQA dataset, MLQA dataset, and C3 dataset.

You can get it through their official website, or directly download the datasets in here, and put them under current folder.

Your can execute our bash scripts to train and evaluate our Dec2Enc:

bash scripts/run_clean.sh

In experiments with various decoder-only models form 0.5B to 9B, Dec2Enc boosts understanding capabilities and utilizes multilingual knowledge, achieving a 5.2% to 22.4% increase in the percentage of exact match answers in seven languages compared to vanilla decoder-only models.
Dec2Enc outperforms existing encoder-only models in four reading comprehension datasets in our experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
figures		figures
models		models
scripts		scripts
LICENSE		LICENSE
README.md		README.md
deepspeed_config.py		deepspeed_config.py
eval_mlqa.py		eval_mlqa.py
eval_script.py		eval_script.py
requirements.txt		requirements.txt
run_clean.py		run_clean.py
run_cmqa.py		run_cmqa.py
run_mc.py		run_mc.py
run_mlqa.py		run_mlqa.py
utils.py		utils.py