使用VLLM推理速度慢，吞吐量低 #1931

sunsn1997 · 2025-03-11T02:49:25Z

sunsn1997
Mar 11, 2025

评估deepseek 时,
双卡L20的情况输入和输出吞吐量都很低
Processed prompts: 3%|▒VN | 1/30 [04:04<1:58:10, 244.50s/it, est. speed input: 0.62 toks/s, output: 15.22 toks/s]^[[A
^MProcessed prompts: 7%|▒VK | 2/30 [06:01<1:19:03, 169.40s/it, est. speed input: 0.82 toks/s, output: 26.10 toks/s]^[[A
^MProcessed prompts: 10%|▒VH | 3/30 [06:37<48:47, 108.44s/it, est. speed input: 0.92 toks/s, output: 39.49 toks/s] ^[[A
^MProcessed prompts: 13%|▒VH▒VN | 4/30 [06:50<30:39, 70.75s/it, est. speed input: 1.61 toks/s, output: 53.94 toks/s] ^[[A
^MProcessed prompts: 17%|▒VH▒VK | 5/30 [08:12<31:15, 75.01s/it, est. speed input: 1.57 toks/s, output: 60.74 toks/s]^[[A
^MProcessed prompts: 20%|▒VH▒VH | 6/30 [08:57<25:53, 64.74s/it, est. speed input: 1.88 toks/s, output: 71.51 toks/s]^[[A
^MProcessed prompts: 23%|▒VH▒VH▒VN | 7/30 [09:18<19:16, 50.27s/it, est. speed input: 1.93 toks/s, output: 84.66 toks/s]^[[A
^MProcessed prompts: 27%|▒VH▒VH▒VK | 8/30 [09:59<17:20, 47.31s/it, est. speed input: 1.97 toks/s, output: 92.62 toks/s]^[[A
^MProcessed prompts: 30%|▒VH▒VH▒VH | 9/30 [14:18<39:44, 113.56s/it, est. speed input: 1.53 toks/s, output: 77.57 toks/s]^[[A
现存占用如下：

配置文件模型和infer：
models += [
# You can comment out the models you don't want to evaluate
# All models use sampling mode
dict(
type=VLLMwithChatTemplate,
abbr='qwq',
path='/workspace/qwq',
model_kwargs=dict(
max_model_len=32768, # 设定最大序列长度
tensor_parallel_size=2, # tp=1
max_num_seqs=128 # 设定批处理大小
),
generation_kwargs=dict(
do_sample=True,
temperature=0.6,
top_p=0.95,
max_tokens=32768),
max_seq_len=32768,
max_out_len=32768,
batch_size=64,
run_cfg=dict(num_gpus=2),
pred_postprocessor=dict(type=extract_non_reasoning_content)
),]

infer = dict(
partitioner=dict(
    type=NumWorkerPartitioner,
    num_worker=1,
),
runner=dict(
    type=LocalRunner,
    task=dict(type=OpenICLInferTask)
),

)
请问什么原因

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用VLLM推理速度慢，吞吐量低 #1931

{{title}}

Replies: 0 comments

Select a reply

使用VLLM推理速度慢，吞吐量低 #1931

sunsn1997 Mar 11, 2025

Replies: 0 comments

sunsn1997
Mar 11, 2025