Code for Transforming Decoder-Only Models into Encoder-Only Models with Improved Understanding Capabilities (Under Review).
-
We propose Dec2Enc, which transforms decoder-only models into encoder-only models by recovering bidirectional attention, thereby improving their understanding potential.
-
In particular, Dec2Enc uses a zero initialization strategy that begins fine-tuning with the original causal attention mechanism, gradually learning bidirectional attention during training, which mitigates the significant training disruptions that arise from mismatches between the attention mechanisms used in pre-training and fine-tuning.
pip install -r requirements.txt
We use CLEAN dataset, CMQA dataset, MLQA dataset, and C3 dataset.
You can get it through their official website, or directly download the datasets in here, and put them under current folder.
Your can execute our bash scripts to train and evaluate our Dec2Enc:
bash scripts/run_clean.sh
-
In experiments with various decoder-only models form 0.5B to 9B, Dec2Enc boosts understanding capabilities and utilizes multilingual knowledge, achieving a 5.2% to 22.4% increase in the percentage of exact match answers in seven languages compared to vanilla decoder-only models.
-
Dec2Enc outperforms existing encoder-only models in four reading comprehension datasets in our experiments.