Skip to content

Stops working after long gap with no speech? #29

Answered by jongwook
gmaxwell asked this question in Q&A
Discussion options

You must be logged in to vote

This is one of the limitations of the current hacky approach to long-form transcription. The VAD output from the model is not very accurate, and the predicted no_speech_prob is often not a reliable predictor of voice activity.

I chose the default no_speech_threshold of 0.6 which worked okay for a few datasets that I tested with, but different combinations of --compression_ratio_threshold, --logprob_threshold, and --no_speech_threshold values might be needed depending on the audio.

There's also a hard-coded constant as @shirayu mentioned, which determines whether the text from the previous window gets fed as the prompt:

if

Replies: 11 comments 36 replies

Comment options

You must be logged in to vote
3 replies
@olastor
Comment options

@Shredsauce
Comment options

@endimionzf
Comment options

Comment options

You must be logged in to vote
4 replies
@koka0630
Comment options

@koka0630
Comment options

@shirayu
Comment options

@shirayu
Comment options

Comment options

You must be logged in to vote
6 replies
@light42
Comment options

@AdolfVonKleist
Comment options

@shirayu
Comment options

@cndhng
Comment options

@StefanNa3Shape
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
16 replies
@KTRosenberg
Comment options

@aadnk
Comment options

@KTRosenberg
Comment options

@phineas-pta
Comment options

@tvone
Comment options

Comment options

You must be logged in to vote
1 reply
@alexlyzhov
Comment options

Comment options

You must be logged in to vote
3 replies
@ejkitchen
Comment options

@nicholasgcotton
Comment options

@nicholasgcotton
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@cndhng
Comment options

@turnkit
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@Ko4ka
Comment options

# for free to join this conversation on GitHub. Already have an account? # to comment
Category
Q&A
Labels
None yet