Skip to content

How to reuse Sentencepiece tokenizer from subword ASR training into TransformerLM training? #2746

Answered by VahidooX
muntasir2000 asked this question in Q&A
Discussion options

You must be logged in to vote

The tokenizer for neural rescorer does not need to be the same as the one for the asr model. Actually as some asr models use low vocab sizes like 128, it is better to use another tokenizer for the Transformer with larger vocab size like 4k with yttm tokenizer.

You may find more info on Transformer LM here:
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/language_modeling.html

@AlexGrinch would you please take a look at this issue?

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@AlexGrinch
Comment options

Answer selected by muntasir2000
# for free to join this conversation on GitHub. Already have an account? # to comment
Category
Q&A
Labels
None yet
3 participants