-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Problems running the stream
example - [Start speaking] frozen
#747
Comments
Try to increase the
|
I'm getting the same result unfortunately even if I increase the step size to 2000, 4000 or 8000 |
@catdumitru you fixed that? i'm having the same issue right now |
Hi same issue, in a Linux environment. I already verified that the speech recognition via |
Had the same issue, I opened libsdl-org/SDL#9706. |
I'm having problems running the
stream
example on a Mac. There is no transcript displayed in the console, instead the output is frozen in the "[Start speaking]" state:Below is the output for "make stream":
sysctl: unknown oid 'hw.optional.arm64'
I whisper.cpp build info:
I UNAME_S: Darwin
I UNAME_P: i386
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
make: `stream' is up to date.
./stream -m ./models/ggml-base.en.bin -t 8 --step 500 --length 5000 -c 0
init: found 2 capture devices:
init: - Capture device #0: 'Built-in Microphone'
init: - Capture device #1: 'Microsoft Teams Audio'
init: attempt to open capture device 0 : 'Built-in Microphone' ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
whisper_init_from_file_no_state: loading model from './models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 218.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 8 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1
[Start speaking]
The text was updated successfully, but these errors were encountered: