-
Notifications
You must be signed in to change notification settings - Fork 31
Configuration
WEI HAORAN edited this page Sep 30, 2018
·
7 revisions
We configure an NMT training task with a YAML file. Totally configuration can be split into three parts: data configuration, model configuration, and training configuration.
The overview of the training configuration is like:
data_configs:
lang_pair:
train_data:
valid_data:
bleu_valid_reference:
vocabularies:
max_len:
num_refs:
Option | Help |
---|---|
lang_pair | Language directions of the translation model. It is the same format as the '--lang' option of sacrebleu (e.g. de-en). |
train_data | Source and target training data. |
valid_data | Source and target validation data. As this is used to compute loss, both files should be tokenized. |
bleu_valid_reference | Reference data to compute BLEU. |
vocabularies | See details below |
max_len | Maximum length of sentence. |
num_refs | Number of references. The default is 1. If the value is larger than 1 and bleu_valid_reference is 'xxx', the actual reference file should be xxx1, xxx2, ... |
* vocabularies: each vocabulary item is configured like below:
Option | Help |
---|---|
type | Type of token. can be "word |
dict_path | Path of the dictionary file. |
max_n_words | Maximum number of words to generate embedding. |
codes | Path to bpe model. Only work if the type is "bpe".If the type is "bpe" but does not give this option, it means all files are already segmented into bpe tokens. |
Currently our code support RNN based model and Transformer. There are some options shared between all types of model
model_configs:
model:
d_word_vec:
d_model:
proj_share_weight:
label_smoothing:
Option | Help |
---|---|
model | Can be "DL4MT", "Transformer" |
d_word_vec | Dimension of word embedding |
d_model | Dimension of hidden state |
proj_share_weight | Share target side embedding and weights of the output layer. |
label_smoothing | Smoothing factor true label. Always less than 1.0. Close if the value is not a positive value. |
model_configs:
model: DL4MT
d_word_vec:
d_model:
dropout:
proj_share_weight:
bridge_type:
label_smoothing:
Option | Help |
---|---|
dropout | Dropout of last output layer. |
bridge_type | Method to feed initial state to the decoder. Can be "zero" |
model_configs:
model: Transformer
n_layers:
n_head:
d_word_vec:
d_model:
d_inner_hid:
dropout:
proj_share_weight:
label_smoothing:
Option | Help |
---|---|
n_layers | Number of layers |
n_heads | Number of heads |
d_innter_hid | Size of hidden layer in position-wise feedforward layer. |
dropout | Dropout of all the layer. |
bridge_type | Method to feed initial state to the decoder. Can be "zero" |
The overview of the training configuration is like:
training_configs:
seed:
max_epochs:
shuffle:
use_bucket:
batching_key:
batch_size:
update_cycle:
valid_batch_size:
bleu_valid_batch_size:
bleu_valid_max_steps:
bleu_valid_warmup:
bleu_valid_alpha:
bleu_valid_beam_size:
bleu_valid_configs:
disp_freq:
save_freq:
num_kept_checkpoints:
loss_valid_freq:
bleu_valid_freq:
early_stop_patience:
Option | Help |
---|---|
seed | Randoom seed. |
max_epochs | Maximum training epochs. |
shuffle | Whether to shuffle the whole data set after every epoch |
use_bucket | Whether using bucket. Bucket try to put sentences with similar lengths into a batch |
batching_key | The way to measure the size of a batch. Currently support "tokens" and "samples" |
batch_size | The size of a batch according to batching_key
|
update_cycle | Update parameters every N batches. Default is 1 |
valid_batch_size | Batch size when evaluationg loss on dev set. Always measured as "samples" |
bleu_valid_batch_size | Batch size when evaluating BLEU on dev set. Always measures as "samples" |
bleu_valid_warmup | Start to evaluate on dev set after N steps or epoch. If |
bleu_valid_configs | Configurations on decoding and BLEU computation. See * for details |
dis_feq | Print information on tensorboard every N steps |
save_freq | Saving checkpoints every N steps |
num_kept_checkpoints | Maximum numbers to keep checkpoints |
loss_valid_freq | Evaluate loss on dev set every N steps |
bleu_valid_freq | Evaluate BLEU on dev set every N steps |
early_stop_patience | Stop training if N subsequence BLEU evaluation on dev set does not increase |
* bleu_valid_config:
bleu_valid_configs:
max_steps:
beam_size:
alpha:
sacrebleu_args:
postprocess:
Option | Help |
---|---|
max_steps | maximum decoding steps when decoding on dev set. |
beam_size | Beam size when decoding on dev set. |
alpha | Length penalty value when decoding on dev set. |
sacrebleu_args | Same as the arguments of sacrebleu command. For example '-tok none -lc' |
postprocess | whether to do post-processing on translation, including detokenizing and re-casing |