-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix deepseek awq v3 #3450
Fix deepseek awq v3 #3450
Conversation
hnyls2002
commented
Feb 10, 2025
•
edited by merrymercy
Loading
edited by merrymercy
After this pr being merged. Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ? |
I am having a try...... |
We should also introduce triton fused moe kernel like moe_wna16. |
Yes, this PR is exactly for this |
``> > After this pr being merged.
still have a problem, i am running this model cognitivecomputations/DeepSeek-V3-AWQ
|
What is your launch command? |
|
So, does this pr still use AWQ marlin kernel? |
I replaced the config.json with the awq version. |
R1 and MLA are not supported by now, due to some unknown accuracy reasons. You can use V3-AWQ with this command python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla |
I succeeded to deploy the model on 8*A800 by building docker image on branch fix-dpsk-v3-awq. |
Could you share some benchmark? |
How about benchmark?@chenchunhui97 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix is a bit tricky, I'll merge it first to unblock the awq usage. Refactoring is on its way.
My launch script on 8*A800 80G. This model havs been successfully deployed with vLLM with a smaller context length. But it seems vLLM does not optimize well on MLA now.
Error:
@chenchunhui97 @zhyncs Any suggestions? |