Improve token timestamps and language detection #114

ZachNagengast · 2024-04-11T22:55:32Z

This addresses a couple of issues

Word level timestamps slightly off, noticed in Incorrect timestamps (0.5sec off) #105
Detect language was not usable easily in conjunction with prefill or prompt tokens noticed by Diirge in discord.

The word timestamps are still not using a median filter but they line up quite well without it. With these changes, the main differences are when words start, most of the endings are perfectly in line.

Here are some comparisons using the audio provided in #105 (Top is ours, bottom is from HEAD openai/whisper python repo)

WhisperKit better starting point:

OpenAI better starting point:

Will continue to refine these over time, thanks @finnvoor for finding this and providing a great example to replicate.

ZachNagengast added 2 commits April 11, 2024 15:45

Improve token timestamps and detect language

03bacd7

Cleanup

58b3708

ZachNagengast mentioned this pull request Apr 11, 2024

Issue with languages other than English #98

Closed

ZachNagengast added 2 commits April 11, 2024 16:50

Minor cleanup

2984806

Merge branch 'main' into timestamps-and-lang-detection-improvements

760fcce

ZachNagengast linked an issue Apr 12, 2024 that may be closed by this pull request

Incorrect timestamps (0.5sec off) #105

Closed

ZachNagengast requested a review from atiorh April 12, 2024 01:30

ZachNagengast merged commit d9cd774 into main Apr 12, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve token timestamps and language detection #114

Improve token timestamps and language detection #114

ZachNagengast commented Apr 11, 2024 •

edited

Loading

Improve token timestamps and language detection #114

Improve token timestamps and language detection #114

Conversation

ZachNagengast commented Apr 11, 2024 • edited Loading

ZachNagengast commented Apr 11, 2024 •

edited

Loading