This repository contains the official PyTorch
implementation of the paper:
Modularized Pre-training for End-to-end Task-oriented Dialogue.
Libo Qin, Xiao Xu, Lehan Wang, Yue Zhang, Wanxiang Che.
IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
If you use any source codes or the datasets included in this toolkit in your work, please cite the following paper. The bibtex are listed below:
@ARTICLE{qin-etal-2023-modularized,
author={Qin, Libo and Xu, Xiao and Wang, Lehan and Zhang, Yue and Che, Wanxiang},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={Modularized Pre-training for End-to-end Task-oriented Dialogue},
year={2023},
volume={},
number={},
pages={1-10},
doi={10.1109/TASLP.2023.3244503}
}
Install conda and then create a new environment with our configuration file:
wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.12.0-Linux-x86_64.sh
bash Miniconda3-py37_4.12.0-Linux-x86_64.sh
conda env create -f environment.yml
We use fitlog to track our experiments. Here is the documentation of fitlog (but only Chinese).
fitlog init
# you can use the below commands to view your experiment results. See https://fitlog.readthedocs.io/zh/latest/user/command_line.html for command line instructions.
fitlog log <log-dir>
All the data used in pre-training and fine-tuning are publicly available. Below are the links to the original data sources:
You can download all data from here and unzip them into the data/
folder, which should have the following structure:
data
├── pre-train
├── fine-tune
└── original
We also provide the pre-process scripts in prepare_data/
:
preprocess_gp.py
: convert the data indata/original/
folder into the format of Generation Module Pre-training (data/pre-train/
provides the processed data)augmentation_xxx.py
: convert the data indata/fine-tune/
folder into the format of Knowledge-retriever Module Pre-training (data/pre-train/
provides the processed data)
You can download all saved checkpoints from pre-train and fine-tune . Then, please unzip them into the save/
folder, which should have the following structure:
save
├── fine-tune
│ ├── SMD_Best
│ ├── WOZ_Best
│ └── CAM_Best
└── pre-train
├── GP
└── KRP
In this paper, we use GPT-2 or DialoGPT-Medium to initialize our generation module.
If you want to re-pre-train generation module, please download them from here and unzip them into the pre-train/
folder, which should have the following structure:
pre-train
├── medium_ft.pkl
├── pytorch_model.bin
├── merges.txt
├── vocab.json
└── config.json
conda activate MPEToDs
# 1. Pre-train Generation Module
python pretrain_GPT.py -g -uf -fld=logs/GP -bsz=32 -accs=2 -gl=5e-05 --warmup_steps=2500 --total_steps=110000 --valid_steps=2000 --logging_steps=200
# 2. Pre-train Knowledge-retriever Module
python pretrain_KB.py -g -uf -fg -ds=smd -fld=logs/KRP_SMD -pgpt=save/pre-train/GP -bsz=32 -accs=2 -dr=0.1 -hdd=256 -lr=0.001 --warmup_steps=250 --total_steps=15000 --valid_steps=100 --logging_steps=10
python pretrain_KB.py -g -uf -fg -ds=cam -fld=logs/KRP_CAM -pgpt=save/pre-train/GP -bsz=32 -accs=1 -dr=0.2 -hdd=128 -lr=0.001 --warmup_steps=400 --total_steps=10000 --valid_steps=100 --logging_steps=8
python pretrain_KB.py -g -uf -fg -ds=woz -fld=logs/KRP_WOZ -pgpt=save/pre-train/GP -bsz=32 -accs=1 -dr=0.1 -hdd=128 -lr=0.001 --warmup_steps=1600 --total_steps=40000 --valid_steps=200 --logging_steps=20
# 3. Fine-tune
python fine_tune.py -g -uf -ft -ds=smd -fld=logs/fine_tune_SMD -pa=3 -pgpt=save/pre-train/GP -pkb=save/pre-train/KRP/SMD -bsz=16 -accs=1 -dr=0.1 -hdd=256 -lr=0.0007 -gl=4e-05 --warmup_steps=500 --total_steps=4000 --logging_steps=4
python fine_tune.py -g -uf -ft -ds=cam -fld=logs/fine_tune_CAM -pa=3 -pgpt=save/pre-train/GP -pkb=save/pre-train/KRP/CAM -bsz=16 -accs=1 -dr=0.1 -hdd=128 -lr=0.001 -gl=4e-05 --warmup_steps=60 --total_steps=1050 --logging_steps=4
python fine_tune.py -g -uf -ft -ds=woz -fld=logs/fine_tune_CAM -pa=3 -pgpt=save/pre-train/GP -pkb=save/pre-train/KRP/WOZ -bsz=4 -accs=4 -dr=0.1 -hdd=128 -lr=0.001 -gl=7e-05 --warmup_steps=500 --total_steps=5332 --logging_steps=5
Config Notes:
- The actual batch size =
bsz
*accs
, i.e., batch_size and accumulation_steps. - We valid the model every
valid_steps
in pre-training and every epoch in fine-tuning. - We fine-tune the model for 10 epochs in our experiments. You can change the
total_steps
to control the number of epochs. total_steps
,warmup_steps
should change withbsz
andaccs
. TakeFine-tune of SMD
as an example, if you changebsz
from16
to32
andaccs
from1
to3
, you should changetotal_steps
from4000
to4000/2/3
, and changewarmup_steps
from500
to500/2/3
.
We are highly grateful for the public code of the following papers, our code is partly based on them:
-
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation.
Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
-
Global-to-local Memory Pointer Networks for Task-Oriented Dialogue.
Chien-Sheng Wu, Richard Socher, Caiming Xiong.
ICLR 2019. [Paper] [Open Reivew] [Code]
-
Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog.
Libo Qin, Xiao Xu, Wanxiang Che, Yue Zhang, Ting Liu.