This document describes the rough procedure to train a SLTUnet model.
-
Get the phoenix2014T dataset from here or using
wget https://www-i6.informatik.rwth-aachen.de/ftp/pub/rwth-phoenix/2016/phoenix-2014-T.v3.tar.gz
-
Get MuST-C En-De dataset from FBK; note we used the data in v1.0
We applied tokenization and subword modeling to these dataset. See preprocess_phoenix.sh for reference.
We adopt the SMKD method to pretrain sign embeddings and further
adapt it for sign language translation. smkd
shows the adapted source code.
To pretrain SMKD embeddings,
-
preprocess the dataset
python preprocess/dataset_preprocess.py --dataset phoenix2014 --dataset-root PHOENIX-2014 -T-release-v3/PHOENIX-2014-T/
-
launch training
python main.py --work-dir exp/resnet34 --config baseline.yaml --device 0,1
-
checkpoint averaging (optional)
Among all saved checkpoints, select top-K (e.g. 5) checkpoint and put their (abs)path into a file named
checkpoint
under exp/resnet34python ckpt_avg.py --path exp/resnet34 --checkpoints 5 --output avg
-
extract sign features
python main.py --load-weights avg/average.pt --phase features --device 0 --num-feature-aug 10 --work-dir exp/resnet34 --config baseline.yaml
Then combine different training features
python sign_feature_cmb.py train\*h5
At the end, you will have train/dev/test.h5 files as the sign feature inputs
See the given running scripts train.sh
for reference.
-
we saved top-10 checkpoints based on dev set performance. we averaged them before final evaluation.
python checkpoint_averaging.py --path path-to-best-ckpt-dir --checkpoints 10 --output avg --gpu 0
-
See the given running scripts
test.sh
for decoding. -
Regarding evaluation, please checkout
eval/metrics.py
for details.For future evaluation and dataset construction, we suggest retaining the punctuations and adopt detokenized BLEU. E.g.
python eval/metrics.py -t slt -hyp model-output-file -ref gold-reference-file