Bug: "speculative" example is crashing? #10174
Labels
bug-unconfirmed
medium severity
Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
What happened?
I tried to give a shot for Speculative Decoding. But no matter what I did, I got back at most one token and then nothing.
In search of the most simple and reliable case, I pick the same model for draft, and it is not working anyway. Then I rolled back to an older release to August and it didn't work either. I dug back even more, and found a May release that did work, printing more than one token!
So I assume something is broken currently?
My command:
[llama-]speculative.exe --model llama2-13b-tiefighter.Q6_K.gguf --model-draft llama2-13b-tiefighter.Q6_K.gguf --prompt "You are" --temp 0 -n 4
(I'm pretty sure the exact model file does not matter, I see the same behavior with any model).
Here are logs from b2995, May 25:
llama-b2995-bin-win-avx2-x64
I think that worked correctly?
But here are logs from b3497, August 1:
llama-b3497-bin-win-avx2-x64
It looked like the program crashed (abrupt output), the mouse pointer briefly flashed with "wait" cursor.
Anyway, here is the current version b4027, November 4, with verbose flag added:
llama-b4027-bin-win-avx2-x64
It also crashes!
Am I doing something wrong? I've read all relevant pages mentioned in
llama.cpp/examples/speculative/README.md
Lines 7 to 9 in d5a409e
Is the
llama-speculative
example obsoleted? Maybe I'm missing some special option that is somehow needed?But since everything suggests that the program is really crashing, I've decided to post this Issue here in case there is a bug needs to be fixed.
Name and Version
llama-cli.exe --version
version: 4027 (ea02c75)
built with MSVC 19.29.30156.0 for x64
What operating system are you seeing the problem on?
Windows
Relevant log output
No response
The text was updated successfully, but these errors were encountered: