Configuration

We configure an NMT training task with a YAML file. Totally configuration can be split into three parts: data configuration, model configuration, and training configuration.

Data Configuration
Model Configuration
- RNN
- Transformer
Training Configuration

Data Configuration

The overview of the training configuration is like:

data_configs:
  lang_pair:
  train_data:
  valid_data:
  bleu_valid_reference:
  vocabularies:
  max_len:
  num_refs:

Option	Help
lang_pair	Language directions of the translation model. It is the same format as the '--lang' option of sacrebleu (e.g. de-en).
train_data	Source and target training data.
valid_data	Source and target validation data. As this is used to compute loss, both files should be tokenized.
bleu_valid_reference	Reference data to compute BLEU.
vocabularies	See details below
max_len	Maximum length of sentence.
num_refs	Number of references. The default is 1. If the value is larger than 1 and bleu_valid_reference is 'xxx', the actual reference file should be xxx1, xxx2, ...

* vocabularies: each vocabulary item is configured like below:

Option	Help
type	Type of token. can be "word
dict_path	Path of the dictionary file.
max_n_words	Maximum number of words to generate embedding.
codes	Path to bpe model. Only work if the type is "bpe".If the type is "bpe" but does not give this option, it means all files are already segmented into bpe tokens.

Model Configuration

Currently our code support RNN based model and Transformer. There are some options shared between all types of model

model_configs:
  model:
  d_word_vec:
  d_model: 
  proj_share_weight:
  label_smoothing:

Option	Help
model	Can be "DL4MT", "Transformer"
d_word_vec	Dimension of word embedding
d_model	Dimension of hidden state
proj_share_weight	Share target side embedding and weights of the output layer.
label_smoothing	Smoothing factor true label. Always less than 1.0. Close if the value is not a positive value.

RNN

model_configs:
  model: DL4MT
  d_word_vec:
  d_model: 
  dropout:
  proj_share_weight:
  bridge_type:
  label_smoothing:

Option	Help
dropout	Dropout of last output layer.
bridge_type	Method to feed initial state to the decoder. Can be "zero"

Transformer

model_configs:
  model: Transformer
  n_layers: 
  n_head: 
  d_word_vec: 
  d_model: 
  d_inner_hid: 
  dropout: 
  proj_share_weight: 
  label_smoothing:

Option	Help
n_layers	Number of layers
n_heads	Number of heads
d_innter_hid	Size of hidden layer in position-wise feedforward layer.
dropout	Dropout of all the layer.
bridge_type	Method to feed initial state to the decoder. Can be "zero"

Training Configuration

The overview of the training configuration is like:

training_configs:
  seed: 
  max_epochs: 
  shuffle: 
  use_bucket:
  batching_key:
  batch_size:
  update_cycle:
  valid_batch_size:
  bleu_valid_batch_size:
  bleu_valid_max_steps:
  bleu_valid_warmup:
  bleu_valid_alpha:
  bleu_valid_beam_size:
  bleu_valid_configs:
  disp_freq:
  save_freq:
  num_kept_checkpoints:
  loss_valid_freq:
  bleu_valid_freq:
  early_stop_patience:

Option	Help
seed	Randoom seed.
max_epochs	Maximum training epochs.
shuffle	Whether to shuffle the whole data set after every epoch
use_bucket	Whether using bucket. Bucket try to put sentences with similar lengths into a batch
batching_key	The way to measure the size of a batch. Currently support "tokens" and "samples"
batch_size	The size of a batch according to `batching_key`
update_cycle	Update parameters every N batches. Default is 1
valid_batch_size	Batch size when evaluationg loss on dev set. Always measured as "samples"
bleu_valid_batch_size	Batch size when evaluating BLEU on dev set. Always measures as "samples"
bleu_valid_warmup	Start to evaluate on dev set after N steps or epoch. If $0 < N < 50$, N means number of epochs. Otherwise, N means update steps.
bleu_valid_configs	Configurations on decoding and BLEU computation. See * for details
dis_feq	Print information on tensorboard every N steps
save_freq	Saving checkpoints every N steps
num_kept_checkpoints	Maximum numbers to keep checkpoints
loss_valid_freq	Evaluate loss on dev set every N steps
bleu_valid_freq	Evaluate BLEU on dev set every N steps
early_stop_patience	Stop training if N subsequence BLEU evaluation on dev set does not increase

* bleu_valid_config:

bleu_valid_configs:
  max_steps:
  beam_size:
  alpha:
  sacrebleu_args:
  postprocess:

Option	Help
max_steps	maximum decoding steps when decoding on dev set.
beam_size	Beam size when decoding on dev set.
alpha	Length penalty value when decoding on dev set.
sacrebleu_args	Same as the arguments of `sacrebleu` command. For example '-tok none -lc'
postprocess	whether to do post-processing on translation, including detokenizing and re-casing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration

Data Configuration

Model Configuration

RNN

Transformer

Training Configuration

Clone this wiki locally