-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bug: Speculative Decoding "Segmentation fault (core dumped)" #10176
Bug: Speculative Decoding "Segmentation fault (core dumped)" #10176
Comments
Looks like a crash in the DRY sampler when it is cloned due to
|
While running this with address sanitizer, it also detects a buffer overflow after generating tokens for a while (unrelated to the DRY issue):
cc @ggerganov |
Interestingly, I seem to have run into a different issue with the --sampling-seq modifier when using speculative decoding with Qwen 2.5, Llama3.1 seems to be working just fine: |
Looks like an issue with |
It seems it only occurs when using Qwen2.5-0.5B as the draft model, 1.5B and onwards operate as expected |
I could reproduce it now. I think this is because this model is so small that the tensor does not have enough rows, and some devices end with 0 rows, which causes the event to not be created. It can be reproduced with |
I see, that does make sense |
Thank you for the heads up, I will try to get this fixed ASAP. |
@slaren Do you have a repro? I'm running a few tests here with |
I can reproduce it reliably with this command line:
|
What happened?
Hey all, I wanted to report a segmentation fault issue with llama-speculative. I have never once gotten this executable to work; I don't believe it is my command, as I have tried copy-pasting the speculative example commands as well.
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
Device 0: Tesla P40, compute capability 6.1, VMM: yes
Device 1: Tesla P40, compute capability 6.1, VMM: yes
Device 2: Tesla P40, compute capability 6.1, VMM: yes
version: 4031 (d5a409e)
built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: