-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Bug] AWQ scalar type error #3780
Comments
Hi let me try to reproduce it. |
I managed to run this model with the following command by adding the argument python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half |
This can indeed be started, but I'm not sure if the issue is caused by the precision of different dtypes. Using R1-AWQ in fp16 tends to result in gibberish, which I didn't encounter in the quantization like 1.58bit of Ollama I deployed earlier. The answer of R1-AWQ is shown in the picture. |
I get same problem, how did you solve it |
I figured it out, AWQ does not work with MLA yet. with this command, you can get the model running and generating the expected output. python3 -m sglang.launch_server --model cognitivecomputations/DeepSeek-R1-AWQ --tp 8 --trust-remote-code --dtype half --disable-mla My input is like messages=[
{"role": "user", "content": "List 3 countries and their capitals."},
],
temper This is my output: "<think>\nOkay, let's see. The user is asking for a list of three countries and their capitals. Hmm, I need to make sure I pick countries that are well-known so the answer is useful. Maybe start with some obvious ones. United States, Canada, Mexico? Wait, but maybe some people might" |
Checklist
Describe the bug
When I run the Deepseek-R1-AWQ, I met a scalar type bug same as pr #3450 . @hnyls2002
Reproduction
I use the command recommended by instructions
But it produces the mistake same as preview.
Environment
The text was updated successfully, but these errors were encountered: