Fix deepseek awq v3 #3450

hnyls2002 · 2025-02-10T03:35:07Z

python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla

python/sglang/srt/layers/quantization/awq_marlin.py

halexan · 2025-02-10T07:58:25Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

chenchunhui97 · 2025-02-10T08:06:22Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

I am having a try......

Xu-Chen · 2025-02-10T08:19:02Z

We should also introduce triton fused moe kernel like moe_wna16.
AWQ marlin kernel may be just get 10 token/s on 8*A100.

hnyls2002 · 2025-02-10T09:27:07Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

Yes, this PR is exactly for this

pachinko · 2025-02-11T06:50:00Z

``> > After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

Yes, this PR is exactly for this

still have a problem, i am running this model cognitivecomputations/DeepSeek-V3-AWQ

[2025-02-11 14:42:20 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/WORK/sglang/python/sglang/srt/managers/scheduler.py", line 1816, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_executor/model_runner.py", line 186, in __init__
    self.load_model()
  File "/WORK/sglang/python/sglang/srt/model_executor/model_runner.py", line 307, in load_model
    self.model = get_model(
                 ^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/WORK/sglang/python/sglang/srt/model_loader/loader.py", line 362, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/WORK/sglang/python/sglang/srt/models/deepseek_v2.py", line 924, in load_weights
    param = params_dict[name]
            ~~~~~~~~~~~^^^^^^
KeyError: 'model.layers.6.mlp.experts.w2_weight'

[2025-02-11 14:42:20] Received sigquit from a child proces. It usually means the child failed.

halexan · 2025-02-11T07:06:43Z

@pachinko

What is your launch command?

pachinko · 2025-02-11T07:09:46Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

halexan · 2025-02-11T07:09:46Z

We should also introduce triton fused moe kernel like moe_wna16. AWQ marlin kernel may be just get 10 token/s on 8*A100.

So, does this pr still use AWQ marlin kernel?

pachinko · 2025-02-11T07:10:28Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

I replaced the config.json with the awq version.

hnyls2002 · 2025-02-11T08:39:48Z

@halexan

python3 -m sglang.launch_server \
    --model-path /home/model/DeepSeek-R1 \
    --tp 8 \
    --dist-init-addr 10.10.0.1:6000 \
    --nnodes 1 \
    --node-rank 0 \
    --trust-remote-code \
    --disable-radix-cache  \
    --disable-outlines-disk-cache \
    --host 0.0.0.0 \
    --port 40000

I replaced the config.json with the awq version.

R1 and MLA are not supported by now, due to some unknown accuracy reasons. You can use V3-AWQ with this command

 python -m sglang.launch_server --model-path cognitivecomputations/DeepSeek-V3-AWQ --tp-size 8 --trust-remote --disable-mla

chenchunhui97 · 2025-02-12T01:50:26Z

After this pr being merged.

Can sglang run this cognitivecomputations/DeepSeek-V3-AWQ?

I succeeded to deploy the model on 8*A800 by building docker image on branch fix-dpsk-v3-awq.

Xu-Chen · 2025-02-12T02:00:32Z

Could you share some benchmark？

Zachary-ai-engineer · 2025-02-12T03:24:35Z

We tested V3 AWQ based on the latest code and found that indicators such as tpot were relatively poor. How should we solve this problem?

halexan · 2025-02-12T09:16:31Z

How about benchmark？@chenchunhui97

zhyncs

This fix is a bit tricky, I'll merge it first to unblock the awq usage. Refactoring is on its way.

luweizheng · 2025-02-21T02:27:04Z

My launch script on 8*A800 80G. This model havs been successfully deployed with vLLM with a smaller context length. But it seems vLLM does not optimize well on MLA now.

python3 -m sglang.launch_server --model-path /path/to/DeepSeek-R1-awq/DeepSeek-R1-awq --tp 8 --host 0.0.0.0 --port 11434 --trust-remote-code

Error:

File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/__init__.py", line 22, in get_model
    return loader.load_model(
           ^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/model_loader/loader.py", line 362, in load_model
    model.load_weights(self._get_all_weights(model_config, model))
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/sglang/srt/models/deepseek_v2.py", line 962, in load_weights
    w = ops.awq_dequantize(
        ^^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/vllm/_custom_ops.py", line 222, in awq_dequantize
    return torch.ops._C.awq_dequantize(qweight, scales, zeros, split_k_iters,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs/fast/u20247643/envs/sglang/lib/python3.12/site-packages/torch/_ops.py", line 1116, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected scalar type Half but found BFloat16

@chenchunhui97 @zhyncs Any suggestions?

adapt code

9fba060

hnyls2002 requested review from merrymercy, Ying1123, zhyncs and ispobock as code owners February 10, 2025 03:35

update quantization __init__

35c7fe0

zhyncs reviewed Feb 10, 2025

View reviewed changes

python/sglang/srt/layers/quantization/awq_marlin.py Outdated Show resolved Hide resolved

hnyls2002 marked this pull request as draft February 10, 2025 04:43

format

b019214

This was referenced Feb 10, 2025

[Feature] sglang can run on the deepseek v3 model and support chunked prefill or prefix caching #3458

Closed

[Bug] TypeError: _ColumnvLLMParameter.load_column_parallel_weight() got an unexpected keyword argument 'tp_rank' #3464

Closed

halexan mentioned this pull request Feb 10, 2025

[Feature] add support for deepseek v3 gptq / awq #2706

Closed

2 tasks

hnyls2002 added 3 commits February 10, 2025 08:22

fix create weights

39402da

fix select_experts

d951003

update

2dd38d7

hnyls2002 added 3 commits February 10, 2025 10:01

fix skip quantization

d13f3df

fix

1e0b14f

fix param

002c12a

hnyls2002 marked this pull request as ready for review February 10, 2025 11:47

hnyls2002 requested review from ByronHsu and HaiShaw as code owners February 10, 2025 11:47

Merge branch 'main' into fix-dpsk-v3-awq

fd1e1ea

hnyls2002 changed the title ~~Fix deepseek awq v3~~ [DO NOT MERGE] Fix deepseek awq v3 Feb 10, 2025

hnyls2002 added 3 commits February 10, 2025 13:36

fix mla

391d4f4

monkey patch

0347ea2

patch apply

b464e6a

remove

fba3941

hnyls2002 changed the title ~~[DO NOT MERGE] Fix deepseek awq v3~~ Fix deepseek awq v3 Feb 10, 2025

hnyls2002 added 2 commits February 11, 2025 01:55

Merge branch 'main' into fix-dpsk-v3-awq

11162b2

Merge branch 'main' into fix-dpsk-v3-awq

1b9e363

Merge branch 'main' into fix-dpsk-v3-awq

33747f6

zhyncs added the high priority label Feb 11, 2025

Merge branch 'main' into fix-dpsk-v3-awq

a0f5418

Merge branch 'main' into fix-dpsk-v3-awq

d3ce4fb

zhyncs approved these changes Feb 12, 2025

View reviewed changes

zhyncs merged commit 8616357 into main Feb 12, 2025
21 checks passed

zhyncs deleted the fix-dpsk-v3-awq branch February 12, 2025 14:09

chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025

Fix deepseek awq v3 (sgl-project#3450)

39e946c

zjp-shadow mentioned this pull request Feb 22, 2025

[Bug] AWQ scalar type error #3780

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deepseek awq v3 #3450

Fix deepseek awq v3 #3450

hnyls2002 commented Feb 10, 2025 •

edited by merrymercy

Loading

halexan commented Feb 10, 2025 •

edited

Loading

chenchunhui97 commented Feb 10, 2025

Xu-Chen commented Feb 10, 2025 •

edited

Loading

hnyls2002 commented Feb 10, 2025

pachinko commented Feb 11, 2025

halexan commented Feb 11, 2025 •

edited

Loading

pachinko commented Feb 11, 2025

halexan commented Feb 11, 2025

pachinko commented Feb 11, 2025

hnyls2002 commented Feb 11, 2025 •

edited

Loading

chenchunhui97 commented Feb 12, 2025

Xu-Chen commented Feb 12, 2025

Zachary-ai-engineer commented Feb 12, 2025

halexan commented Feb 12, 2025

zhyncs left a comment

luweizheng commented Feb 21, 2025

Fix deepseek awq v3 #3450

Fix deepseek awq v3 #3450

Conversation

hnyls2002 commented Feb 10, 2025 • edited by merrymercy Loading

halexan commented Feb 10, 2025 • edited Loading

chenchunhui97 commented Feb 10, 2025

Xu-Chen commented Feb 10, 2025 • edited Loading

hnyls2002 commented Feb 10, 2025

pachinko commented Feb 11, 2025

halexan commented Feb 11, 2025 • edited Loading

pachinko commented Feb 11, 2025

halexan commented Feb 11, 2025

pachinko commented Feb 11, 2025

hnyls2002 commented Feb 11, 2025 • edited Loading

chenchunhui97 commented Feb 12, 2025

Xu-Chen commented Feb 12, 2025

Zachary-ai-engineer commented Feb 12, 2025

halexan commented Feb 12, 2025

zhyncs left a comment

Choose a reason for hiding this comment

luweizheng commented Feb 21, 2025

hnyls2002 commented Feb 10, 2025 •

edited by merrymercy

Loading

halexan commented Feb 10, 2025 •

edited

Loading

Xu-Chen commented Feb 10, 2025 •

edited

Loading

halexan commented Feb 11, 2025 •

edited

Loading

hnyls2002 commented Feb 11, 2025 •

edited

Loading