-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Script for ASR inference on long files #2373
Conversation
jbalam-nv
commented
Jun 18, 2021
- speech_to_text_buffered_infer.py supports inference on long audio files by running inference on smaller chunks of audio and them merging the tokens for final transcription.
- This version only supports EncDecCTCModelBPE with AudioMelSpectrogramProcessor as the preprocessor
- Feature normalization is done on each buffer using "per_feature" normalization, future versions will have other methods for experimentation
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
This pull request introduces 2 alerts when merging 4351694 into e070e04 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging e00359a into e5bde15 - view on LGTM.com new alerts:
|
Signed-off-by: jbalam <jbalam@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, minor comments
# Create a preprocessor to convert audio samples into raw features, | ||
# Normalization will be done per buffer in frame_bufferer | ||
# Do not normalize whatever the model's preprocessor setting is | ||
preprocessor_cfg.normalize = "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hack to get through without any normalization. None throws an error here
elif "fixed_mean" in normalize_type and "fixed_std" in normalize_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. no_norm sounds like a good option
frame_overlap: duration of overlaps before and after current frame, seconds | ||
offset: number of symbols to drop for smooth streaming | ||
''' | ||
self.ZERO_LEVEL_SPEC_DB_VAL = -16.635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some info can be added on how this was calculated?
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
* First version of script for buffered inference Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Removed unused variables Signed-off-by: jbalam <jbalam@nvidia.com> * Style fix Signed-off-by: jbalam <jbalam@nvidia.com> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * style fix Signed-off-by: jbalam <jbalam@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* First version of script for buffered inference Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Removed unused variables Signed-off-by: jbalam <jbalam@nvidia.com> * Style fix Signed-off-by: jbalam <jbalam@nvidia.com> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * style fix Signed-off-by: jbalam <jbalam@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* First version of script for buffered inference Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Cleaned up commented code and added comments Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * More clean up and simplified the call to transcribe Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * Removed unused variables Signed-off-by: jbalam <jbalam@nvidia.com> * Style fix Signed-off-by: jbalam <jbalam@nvidia.com> * Added a comment for zero_level_spec_db constant Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com> * style fix Signed-off-by: jbalam <jbalam@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>