[Bug]: Speculative decoding generate gibberish when receiving parallel requests with different seeds #9441
Closed
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
I was doing some experiments with speculative decoding and found an strange behavior when the engine process request in parallel, there are some requests generated with gibberish.
Here's the script to reproduce the bug:
In a nutshell:
A possible output
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: