Skip to content

nju-websoft/Dec2Enc

Repository files navigation

Dec2Enc

Code for Transforming Decoder-Only Models into Encoder-Only Models with Improved Understanding Capabilities (Under Review).

  • We propose Dec2Enc, which transforms decoder-only models into encoder-only models by recovering bidirectional attention, thereby improving their understanding potential.

  • In particular, Dec2Enc uses a zero initialization strategy that begins fine-tuning with the original causal attention mechanism, gradually learning bidirectional attention during training, which mitigates the significant training disruptions that arise from mismatches between the attention mechanisms used in pre-training and fine-tuning.

model

Pip Installation

pip install -r requirements.txt

Data Preparation

We use CLEAN dataset, CMQA dataset, MLQA dataset, and C3 dataset.

You can get it through their official website, or directly download the datasets in here, and put them under current folder.

How to Run

Your can execute our bash scripts to train and evaluate our Dec2Enc:

bash scripts/run_clean.sh

Experimental Results

  • In experiments with various decoder-only models form 0.5B to 9B, Dec2Enc boosts understanding capabilities and utilizes multilingual knowledge, achieving a 5.2% to 22.4% increase in the percentage of exact match answers in seven languages compared to vanilla decoder-only models.

  • Dec2Enc outperforms existing encoder-only models in four reading comprehension datasets in our experiments.

table1

table2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published