Training pipeline #92

nefastosaturo · 2020-08-06T19:42:19Z

edit 2: ~~edit: change batch size to 128~~ nevermind, it crashes

I think is better to define a training pipeline as the official Deepspeech releases state.

We dont have the same amount of hours and videocards as DeepSpeech guys so lets start with 0.6 version hyperparameters.

I was thinking to some kind of pipelines to apply to a training-from-scratch model or starting from a pretrained checkpoint (transfer learning). What do you think?

PIPELINE 1 (with 0.6 hyperparameters from the fr repo)

I step

generate the scorer with LM_ALPHA and LM_BETA = 0
EPOCHS=30
BATCH_SIZE=64
N_HIDDEN=2048
LEARNING_RATE=0.0001
DROPOUT=0.4
EARLY_STOP
ES_EPOCHS (early stop after)=10
MAX_TO_KEEP=3 (we can keep more checkpoint when we will have more disk space)
DROP_SOURCE_LAYERS=1 (if using transfer learning)
USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

II step:

use LM_OPTIMIZER to search good ALPHA and BETA values
MAX_ALPHA=5 MAX_BETA=5 MAX_ITER=600

III step:

EPOCHS=30
BATCH_SIZE=64
N_HIDDEN=2048
LEARNING_RATE=0.00001 (lower LR)
DROPOUT=0.4
EARLY_STOP
ES_EPOCHS=10
MAX_TO_KEEP=3
DROP_SOURCE_LAYERS=1 (if using transfer learning)
USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

or:

PIPELINE 2

I step

generate the scorer with LM_ALPHA and LM_BETA = 0
EPOCHS=100
BATCH_SIZE=64
N_HIDDEN=2048
LEARNING_RATE=0.0001
DROPOUT=0.4
EARLY_STOP
ES_EPOCHS (early stop after)=25 (default value)
MAX_TO_KEEP=3
REDUCE_LR_ON_PLATEAU=1 (when learning got stuck, LR will be reduced)
PLATEAU_EPOCHS=10 (default,number of epochs to consider for RLROP. Smaller than ES_EPOCHS)
DROP_SOURCE_LAYERS=1 (if using transfer learning)
USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

II step:

use LM_OPTIMIZER to search good ALPHA and BETA values
MAX_ALPHA=5 MAX_BETA=5 MAX_ITER=600

Mte90 · 2020-08-11T17:53:50Z

At the end we did it this or not?

nefastosaturo · 2020-08-12T09:14:07Z

Right now, the latest release, has been trained with just the first step of 1st pipeline. So, for sure we need to do the lm_optimizer step to find the best ALPHA and BETA values as described here: https://discourse.mozilla.org/t/custom-lm-causes-terrible-false-positive-rate/50166/34

However, at the end, the best pipeline will be the one that will score the lowest WER, so I think the only way is to try both of them.

Just opened this issue to see if someone could give some hints

Mte90 added enhancement New feature or request help wanted Extra attention is needed labels Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training pipeline #92

Training pipeline #92

nefastosaturo commented Aug 6, 2020 •

edited

Loading

Mte90 commented Aug 11, 2020

nefastosaturo commented Aug 12, 2020

Training pipeline #92

Training pipeline #92

Comments

nefastosaturo commented Aug 6, 2020 • edited Loading

PIPELINE 1 (with 0.6 hyperparameters from the fr repo)

I step

II step:

III step:

PIPELINE 2

I step

II step:

Mte90 commented Aug 11, 2020

nefastosaturo commented Aug 12, 2020

nefastosaturo commented Aug 6, 2020 •

edited

Loading