Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Script for ASR inference on long files #2373

Merged
merged 13 commits into from
Jul 14, 2021
Merged

Script for ASR inference on long files #2373

merged 13 commits into from
Jul 14, 2021

Conversation

jbalam-nv
Copy link
Collaborator

  • speech_to_text_buffered_infer.py supports inference on long audio files by running inference on smaller chunks of audio and them merging the tokens for final transcription.
  • This version only supports EncDecCTCModelBPE with AudioMelSpectrogramProcessor as the preprocessor
  • Feature normalization is done on each buffer using "per_feature" normalization, future versions will have other methods for experimentation

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
@lgtm-com
Copy link

lgtm-com bot commented Jun 18, 2021

This pull request introduces 2 alerts when merging 4351694 into e070e04 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jul 8, 2021

This pull request introduces 2 alerts when merging e00359a into e5bde15 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@jbalam-nv jbalam-nv closed this Jul 13, 2021
Signed-off-by: jbalam <jbalam@nvidia.com>
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, minor comments

# Create a preprocessor to convert audio samples into raw features,
# Normalization will be done per buffer in frame_bufferer
# Do not normalize whatever the model's preprocessor setting is
preprocessor_cfg.normalize = "None"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hack to get through without any normalization. None throws an error here

elif "fixed_mean" in normalize_type and "fixed_std" in normalize_type:
, as we are looking for a key in a dict. We should probably clean up this normalize_batch function and add a "no_norm" option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. no_norm sounds like a good option

frame_overlap: duration of overlaps before and after current frame, seconds
offset: number of symbols to drop for smooth streaming
'''
self.ZERO_LEVEL_SPEC_DB_VAL = -16.635
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some info can be added on how this was calculated?

titu1994 and others added 5 commits July 14, 2021 09:52
@titu1994 titu1994 merged commit 1d29828 into main Jul 14, 2021
fayejf pushed a commit that referenced this pull request Jul 16, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Removed unused variables

Signed-off-by: jbalam <jbalam@nvidia.com>

* Style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Removed unused variables

Signed-off-by: jbalam <jbalam@nvidia.com>

* Style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* First version of script for buffered inference

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Cleaned up commented code and added comments

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* More clean up and simplified the call to transcribe

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Removed unused variables

Signed-off-by: jbalam <jbalam@nvidia.com>

* Style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a comment for zero_level_spec_db constant

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
@blisc blisc deleted the streaming_asr branch January 11, 2022 16:39
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants