Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve token timestamps and language detection #114

Merged
merged 4 commits into from
Apr 12, 2024

Conversation

ZachNagengast
Copy link
Contributor

@ZachNagengast ZachNagengast commented Apr 11, 2024

This addresses a couple of issues

  1. Word level timestamps slightly off, noticed in Incorrect timestamps (0.5sec off) #105
  2. Detect language was not usable easily in conjunction with prefill or prompt tokens noticed by Diirge in discord.

The word timestamps are still not using a median filter but they line up quite well without it. With these changes, the main differences are when words start, most of the endings are perfectly in line.

Here are some comparisons using the audio provided in #105 (Top is ours, bottom is from HEAD openai/whisper python repo)

WhisperKit better starting point:
image

OpenAI better starting point:
image

Will continue to refine these over time, thanks @finnvoor for finding this and providing a great example to replicate.

@ZachNagengast ZachNagengast linked an issue Apr 12, 2024 that may be closed by this pull request
@ZachNagengast ZachNagengast requested a review from atiorh April 12, 2024 01:30
@ZachNagengast ZachNagengast merged commit d9cd774 into main Apr 12, 2024
3 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect timestamps (0.5sec off)
1 participant