generate duplicated phrases #94

x180380 · 2023-05-19T05:07:38Z

Whisper-timestamped will generate duplicated phrases for some audio, such as https://flex2.acast.com/s/pbs-newshour-segments/u/d3i6fh83elv35t.cloudfront.net/static/2023/05/newswrap-15.mp3
I use small and medium model

passerbya · 2023-05-19T11:01:04Z

I have also encountered the same issue.

blundercode · 2023-05-22T17:49:32Z

I have seen this happen outside of whisper-timestamped with other whisper implementations as well. Is it caused by hallucination or not using VAD, I am curious?

pinballelectronica · 2023-05-24T23:54:38Z

Also seeing this- mostly during quiet parts if that helps at all. Otherwise the transcription is spot on- even with the hardest content.

misutoneko · 2023-05-25T09:56:50Z

For this particular sample, --accurate will get rid of the duplicates.
The problem is, there is no single set of parameters that works best for everything.
Sometimes I've even had to switch to a smaller model to get the timings right.

Jeronymous · 2023-05-25T12:41:10Z

Yes, exactly @misutoneko
No free lunch...

x180380 · 2023-05-26T06:39:16Z

When using small or tiny model, the duplicated phrases decrease. WhiperX also has this issue.

Jeronymous · 2023-06-27T21:41:00Z

Some people reported that using a higher value for compression_ratio_threshold than the default improves this issue.
typically --compression_ratio_threshold 1

mattdl-radix · 2023-11-21T10:20:43Z

Had the same problem, with >10 repititions for several .mp3's.
Solution that worked for me was adding --compression_ratio_threshold 1 --accurate

Jeronymous mentioned this issue May 22, 2023

Update transcribe.py #95

Closed

Jeronymous added the bug Something isn't working label Nov 15, 2023

Jeronymous mentioned this issue Feb 26, 2024

Repetitive Phrase Looping #171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate duplicated phrases #94

generate duplicated phrases #94

x180380 commented May 19, 2023

passerbya commented May 19, 2023

blundercode commented May 22, 2023

pinballelectronica commented May 24, 2023

misutoneko commented May 25, 2023

Jeronymous commented May 25, 2023

x180380 commented May 26, 2023

Jeronymous commented Jun 27, 2023

mattdl-radix commented Nov 21, 2023 •

edited

Loading

generate duplicated phrases #94

generate duplicated phrases #94

Comments

x180380 commented May 19, 2023

passerbya commented May 19, 2023

blundercode commented May 22, 2023

pinballelectronica commented May 24, 2023

misutoneko commented May 25, 2023

Jeronymous commented May 25, 2023

x180380 commented May 26, 2023

Jeronymous commented Jun 27, 2023

mattdl-radix commented Nov 21, 2023 • edited Loading

mattdl-radix commented Nov 21, 2023 •

edited

Loading