Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect timestamps when using VAD with large model only #192

Open
freddyertl opened this issue Apr 24, 2024 · 1 comment
Open

Incorrect timestamps when using VAD with large model only #192

freddyertl opened this issue Apr 24, 2024 · 1 comment

Comments

@freddyertl
Copy link

I came across a problem when using VAD (silero and auditok) with the large model in my application where I try to break parts of the transcription based on pauses. In the following sample you can see that with VAD and the large model (but not with the smaller ones!), I get incorrect timestamps for the "A":

--model large --language en --accurate --vad auditok

[01:37.040 --> 01:37.240] Could
[01:37.240 --> 01:37.360] you
[01:37.360 --> 01:37.540] please
[01:37.540 --> 01:37.800] hold
[01:37.800 --> 01:37.920] up
[01:37.920 --> 01:38.060] your
[01:38.060 --> 01:38.280] ID
[01:38.280 --> 01:38.500] to
[01:38.500 --> 01:38.660] the
[01:38.660 --> 01:38.860] webcam?
*** [01:39.120 --> 01:39.700] A >>>>>>>>>>>>>> Pause between "A" and "little" is wrong
[01:45.050 --> 01:45.250] little
[01:45.250 --> 01:45.430] bit
[01:45.430 --> 01:45.690] closer,
[01:45.770 --> 01:46.070] please.

--model large --language en --accurate --vad silero:v3.1

[01:37.030 --> 01:37.230] Could
[01:37.230 --> 01:37.350] you
[01:37.350 --> 01:37.550] please
[01:37.550 --> 01:37.830] hold
[01:37.830 --> 01:37.930] up
[01:37.930 --> 01:38.070] your
[01:38.070 --> 01:38.290] ID
[01:38.290 --> 01:38.510] to
[01:38.510 --> 01:38.650] the
[01:38.650 --> 01:38.810] webcam?
*** [01:39.130 --> 01:39.790] A >>>>>>>>>>>>>> Pause between "A" and "little" is wrong
[01:45.050 --> 01:45.230] little
[01:45.230 --> 01:45.430] bit
[01:45.430 --> 01:45.690] closer,
[01:45.770 --> 01:46.050] please.

--model large --language en --accurate --vad False

[01:36.860 --> 01:37.180] Could
[01:37.180 --> 01:37.340] you
[01:37.340 --> 01:37.600] please
[01:37.600 --> 01:37.820] hold
[01:37.820 --> 01:37.940] up
[01:37.940 --> 01:38.060] your
[01:38.060 --> 01:38.280] ID
[01:38.280 --> 01:38.520] to
[01:38.520 --> 01:38.660] the
[01:38.660 --> 01:38.920] webcam?
*** [01:44.240 --> 01:45.020] A >>>>>>>>>>>>>> This is okay
[01:45.020 --> 01:45.260] little
[01:45.260 --> 01:45.420] bit
[01:45.420 --> 01:45.680] closer,
[01:45.820 --> 01:46.020] please.

--model medium --language en --accurate --vad auditok

[01:37.180 --> 01:37.360] Could
[01:37.360 --> 01:37.520] you
[01:37.520 --> 01:37.780] please
[01:37.780 --> 01:37.940] hold
[01:37.940 --> 01:38.080] up
[01:38.080 --> 01:38.260] your
[01:38.260 --> 01:38.500] ID
[01:38.500 --> 01:38.680] to
[01:38.680 --> 01:38.800] the
[01:38.800 --> 01:39.260] webcam?
*** [01:44.890 --> 01:45.270] A >>>>>>>>>>>>>> This is okay
[01:45.270 --> 01:45.410] little
[01:45.410 --> 01:45.610] bit
[01:45.610 --> 01:45.850] closer,
[01:46.070 --> 01:46.650] please.

Please find the attached sample audio in a zip archive to reproduce this.

Thanks in advance
Freddy

sample.zip

@freddyertl freddyertl changed the title Incorrect timestamps based with --vad with large model only Incorrect timestamps when using VAD with large model only Apr 24, 2024
@LaurinmyReha

This comment was marked as abuse.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants