Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})" #205

Open
JoLiu-ai opened this issue Aug 13, 2024 · 2 comments

Comments

@JoLiu-ai
Copy link

JoLiu-ai commented Aug 13, 2024

I met this problem several times,what can I do to fix it? Thanks
Perhaps we should implement a feature to temporarily save transcribed files, allowing us to double-check the results and ensure that previous work isn't lost


WARNING:whisper_timestamped:Inconsistent number of segments: whisper_segments (339) != timestamped_word_segments (340)
Traceback (most recent call last):
  File "/usr/local/bin/whisper_timestamped", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 3097, in cli
    result = transcribe_timestamped(
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 296, in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,
  File "/usr/local/lib/python3.10/dist-packages/whisper_timestamped/transcribe.py", line 920, in _transcribe_timestamped_efficient
    assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})"
AssertionError: Inconsistent number of segments: whisper_segments (339) != timestamped_word_segments (340)
@KillerX
Copy link

KillerX commented Aug 27, 2024

I just started seeing this. Did you per chance recently start using a different whisper model?

@Jeronymous
Copy link
Member

Jeronymous commented Aug 27, 2024

There is an opened discussion on this : #79 (reply in thread)

It seems to be a corner case, that happens when the Whisper model predicts a transcript which only involves special language tokens up to the maximum token length (e.g. <|0.00|><|de|><|de|><|de|><|de|><|de|>...).

I am just waiting to have a quick way to reproduce this corner case, to be able to fix it safely.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants