-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
FEAT: support deepseek-r1-distill-qwen #2781
Conversation
DeepSeek-R1-Distill-Qwen-14B-GGUF 和 deepseek-r1-distill-qwen-14b-awq 都加载失败 |
Im encountering a weird problem with deepseek-r1-distill-qwen 32b awq. I loaded the model with vllm backend. With each request, the model seems to stop generating after outputting 1000+ tokens. There is no warnings or errors from inference or vllm. |
What's the stop reason? |
There isn't an apparent stop reason other than "finished request xxx". |
I think I found the problem. It seems inference is not passing max_tokens to vllm's inference parameters, so its default, 1024, is used by vllm. |
No description provided.