Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

Training pipeline #92

Open
nefastosaturo opened this issue Aug 6, 2020 · 2 comments
Open

Training pipeline #92

nefastosaturo opened this issue Aug 6, 2020 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@nefastosaturo
Copy link
Collaborator

nefastosaturo commented Aug 6, 2020

edit 2: edit: change batch size to 128 nevermind, it crashes

I think is better to define a training pipeline as the official Deepspeech releases state.

We dont have the same amount of hours and videocards as DeepSpeech guys so lets start with 0.6 version hyperparameters.

I was thinking to some kind of pipelines to apply to a training-from-scratch model or starting from a pretrained checkpoint (transfer learning). What do you think?

PIPELINE 1 (with 0.6 hyperparameters from the fr repo)

I step

  • generate the scorer with LM_ALPHA and LM_BETA = 0

  • EPOCHS=30
    BATCH_SIZE=64
    N_HIDDEN=2048
    LEARNING_RATE=0.0001
    DROPOUT=0.4
    EARLY_STOP
    ES_EPOCHS (early stop after)=10
    MAX_TO_KEEP=3 (we can keep more checkpoint when we will have more disk space)
    DROP_SOURCE_LAYERS=1 (if using transfer learning)
    USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

II step:

  • use LM_OPTIMIZER to search good ALPHA and BETA values
  • MAX_ALPHA=5 MAX_BETA=5 MAX_ITER=600

III step:

  • EPOCHS=30
    BATCH_SIZE=64
    N_HIDDEN=2048
    LEARNING_RATE=0.00001 (lower LR)
    DROPOUT=0.4
    EARLY_STOP
    ES_EPOCHS=10
    MAX_TO_KEEP=3
    DROP_SOURCE_LAYERS=1 (if using transfer learning)
    USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

or:

PIPELINE 2

I step

  • generate the scorer with LM_ALPHA and LM_BETA = 0

  • EPOCHS=100
    BATCH_SIZE=64
    N_HIDDEN=2048
    LEARNING_RATE=0.0001
    DROPOUT=0.4
    EARLY_STOP
    ES_EPOCHS (early stop after)=25 (default value)
    MAX_TO_KEEP=3
    REDUCE_LR_ON_PLATEAU=1 (when learning got stuck, LR will be reduced)
    PLATEAU_EPOCHS=10 (default,number of epochs to consider for RLROP. Smaller than ES_EPOCHS)
    DROP_SOURCE_LAYERS=1 (if using transfer learning)
    USE_AUTOMATIC_MIXED_PRECISION (if training from scratch)

II step:

  • use LM_OPTIMIZER to search good ALPHA and BETA values
  • MAX_ALPHA=5 MAX_BETA=5 MAX_ITER=600
@Mte90
Copy link
Member

Mte90 commented Aug 11, 2020

At the end we did it this or not?

@nefastosaturo
Copy link
Collaborator Author

Right now, the latest release, has been trained with just the first step of 1st pipeline. So, for sure we need to do the lm_optimizer step to find the best ALPHA and BETA values as described here: https://discourse.mozilla.org/t/custom-lm-causes-terrible-false-positive-rate/50166/34

However, at the end, the best pipeline will be the one that will score the lowest WER, so I think the only way is to try both of them.

Just opened this issue to see if someone could give some hints

@Mte90 Mte90 added enhancement New feature or request help wanted Extra attention is needed labels Nov 9, 2020
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants