This repository contains the code for our ACL2022 paper: An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models.
Run bash scripts/setup.sh
to install the libraries and dependencies.
The data for Abstractive Summarization from Toutanova et al. (2016) can be found in data-summ
, which contains 6K short input texts, with upto 5 summaries each.. The Newsela data can be requested from here.
AR
> bash scripts/train.sh -i 1 -j 1 -m seq2seq -u data-summ
EDITOR (From Reference)
> bash scripts/train.sh -i 2 -j 1 -m nat -u data-summ
Editing Roll-in
> bash scripts/train.sh -i 3 -j 1 -m nat -u data-summ -r experiments/exp-2/checkpoints1/checkpoint_best.pt -a " --use-source 1 --noisy-expert --lr 0.0001 "
Editing CL
> bash scripts/train.sh -i 4 -j 1 -m nat -u data-summ -r experiments/exp-2/checkpoints1/checkpoint_best.pt -a " --use-source 1 --noisy-expert --pacing root --lr 0.0001 "
The skip-token-refine
is used to ignore the grade tokens on the decoder side for simplification task and should be set to two to ignore the source and grade tokens.
Example evaluation configs for the summarization tasks can be found in scripts/evaluate_summ.sh
.
We thank Eleftheria Briakou, Khanh Nguyen, Kianté Brantley, Weijia Xu, the members of the CLIP lab at UMD, and the anonymous ARR reviewers for their helpful and constructive comments.
If you make use of the code, models, or algorithm, please cite our paper:
@inproceedings{agrawal-carpuat-2022-imitation,
title = "An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models",
author = "Agrawal, Sweta and
Carpuat, Marine",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.520",
doi = "10.18653/v1/2022.acl-long.520",
pages = "7550--7563",
abstract = "We propose a framework for training non-autoregressive sequence-to-sequence models for editing tasks, where the original input sequence is iteratively edited to produce the output. We show that the imitation learning algorithms designed to train such models for machine translation introduces mismatches between training and inference that lead to undertraining and poor generalization in editing scenarios. We address this issue with two complementary strategies: 1) a roll-in policy that exposes the model to intermediate training sequences that it is more likely to encounter during inference, 2) a curriculum that presents easy-to-learn edit operations first, gradually increasing the difficulty of training samples as the model becomes competent. We show the efficacy of these strategies on two challenging English editing tasks: controllable text simplification and abstractive summarization. Our approach significantly improves output quality on both tasks and controls output complexity better on the simplification task.",
}