Skip to content

Configuration

WEI HAORAN edited this page Sep 30, 2018 · 7 revisions

We configure an NMT training task with a YAML file. Totally configuration can be split into three parts: data configuration, model configuration, and training configuration.

Data Configuration

The overview of the training configuration is like:

data_configs:
  lang_pair:
  train_data:
  valid_data:
  bleu_valid_reference:
  vocabularies:
  max_len:
  num_refs:
Option Help
lang_pair Language directions of the translation model. It is the same format as the '--lang' option of sacrebleu (e.g. de-en).
train_data Source and target training data.
valid_data Source and target validation data. As this is used to compute loss, both files should be tokenized.
bleu_valid_reference Reference data to compute BLEU.
vocabularies See details below
max_len Maximum length of sentence.
num_refs Number of references. The default is 1. If the value is larger than 1 and bleu_valid_reference is 'xxx', the actual reference file should be xxx1, xxx2, ...

* vocabularies: each vocabulary item is configured like below:

Option Help
type Type of token. can be "word
dict_path Path of the dictionary file.
max_n_words Maximum number of words to generate embedding.
codes Path to bpe model. Only work if the type is "bpe".If the type is "bpe" but does not give this option, it means all files are already segmented into bpe tokens.

Model Configuration

Currently our code support RNN based model and Transformer. There are some options shared between all types of model

model_configs:
  model:
  d_word_vec:
  d_model: 
  proj_share_weight:
  label_smoothing:
Option Help
model Can be "DL4MT", "Transformer"
d_word_vec Dimension of word embedding
d_model Dimension of hidden state
proj_share_weight Share target side embedding and weights of the output layer.
label_smoothing Smoothing factor true label. Always less than 1.0. Close if the value is not a positive value.

RNN

model_configs:
  model: DL4MT
  d_word_vec:
  d_model: 
  dropout:
  proj_share_weight:
  bridge_type:
  label_smoothing:
Option Help
dropout Dropout of last output layer.
bridge_type Method to feed initial state to the decoder. Can be "zero"

Transformer

model_configs:
  model: Transformer
  n_layers: 
  n_head: 
  d_word_vec: 
  d_model: 
  d_inner_hid: 
  dropout: 
  proj_share_weight: 
  label_smoothing:
Option Help
n_layers Number of layers
n_heads Number of heads
d_innter_hid Size of hidden layer in position-wise feedforward layer.
dropout Dropout of all the layer.
bridge_type Method to feed initial state to the decoder. Can be "zero"

Training Configuration

The overview of the training configuration is like:

training_configs:
  seed: 
  max_epochs: 
  shuffle: 
  use_bucket:
  batching_key:
  batch_size:
  update_cycle:
  valid_batch_size:
  bleu_valid_batch_size:
  bleu_valid_max_steps:
  bleu_valid_warmup:
  bleu_valid_alpha:
  bleu_valid_beam_size:
  bleu_valid_configs:
  disp_freq:
  save_freq:
  num_kept_checkpoints:
  loss_valid_freq:
  bleu_valid_freq:
  early_stop_patience:
Option Help
seed Randoom seed.
max_epochs Maximum training epochs.
shuffle Whether to shuffle the whole data set after every epoch
use_bucket Whether using bucket. Bucket try to put sentences with similar lengths into a batch
batching_key The way to measure the size of a batch. Currently support "tokens" and "samples"
batch_size The size of a batch according to batching_key
update_cycle Update parameters every N batches. Default is 1
valid_batch_size Batch size when evaluationg loss on dev set. Always measured as "samples"
bleu_valid_batch_size Batch size when evaluating BLEU on dev set. Always measures as "samples"
bleu_valid_warmup Start to evaluate on dev set after N steps or epoch. If $0 < N < 50$, N means number of epochs. Otherwise, N means update steps.
bleu_valid_configs Configurations on decoding and BLEU computation. See * for details
dis_feq Print information on tensorboard every N steps
save_freq Saving checkpoints every N steps
num_kept_checkpoints Maximum numbers to keep checkpoints
loss_valid_freq Evaluate loss on dev set every N steps
bleu_valid_freq Evaluate BLEU on dev set every N steps
early_stop_patience Stop training if N subsequence BLEU evaluation on dev set does not increase

* bleu_valid_config:

bleu_valid_configs:
  max_steps:
  beam_size:
  alpha:
  sacrebleu_args:
  postprocess:
Option Help
max_steps maximum decoding steps when decoding on dev set.
beam_size Beam size when decoding on dev set.
alpha Length penalty value when decoding on dev set.
sacrebleu_args Same as the arguments of sacrebleu command. For example '-tok none -lc'
postprocess whether to do post-processing on translation, including detokenizing and re-casing
Clone this wiki locally