You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
评估deepseek 时,
双卡L20的情况输入和输出吞吐量都很低
Processed prompts: 3%|▒
VN | 1/30 [04:04<1:58:10, 244.50s/it, est. speed input: 0.62 toks/s, output: 15.22 toks/s]^[[A^MProcessed prompts: 7%|▒
VK | 2/30 [06:01<1:19:03, 169.40s/it, est. speed input: 0.82 toks/s, output: 26.10 toks/s]^[[A^MProcessed prompts: 10%|▒
VH | 3/30 [06:37<48:47, 108.44s/it, est. speed input: 0.92 toks/s, output: 39.49 toks/s] ^[[A^MProcessed prompts: 13%|▒
VH▒VN | 4/30 [06:50<30:39, 70.75s/it, est. speed input: 1.61 toks/s, output: 53.94 toks/s] ^[[A^MProcessed prompts: 17%|▒
VH▒VK | 5/30 [08:12<31:15, 75.01s/it, est. speed input: 1.57 toks/s, output: 60.74 toks/s]^[[A^MProcessed prompts: 20%|▒
VH▒VH | 6/30 [08:57<25:53, 64.74s/it, est. speed input: 1.88 toks/s, output: 71.51 toks/s]^[[A^MProcessed prompts: 23%|▒
VH▒VH▒VN | 7/30 [09:18<19:16, 50.27s/it, est. speed input: 1.93 toks/s, output: 84.66 toks/s]^[[A^MProcessed prompts: 27%|▒
VH▒VH▒VK | 8/30 [09:59<17:20, 47.31s/it, est. speed input: 1.97 toks/s, output: 92.62 toks/s]^[[A^MProcessed prompts: 30%|▒
VH▒VH▒VH | 9/30 [14:18<39:44, 113.56s/it, est. speed input: 1.53 toks/s, output: 77.57 toks/s]^[[A现存占用如下:
| 0 NVIDIA L20 On | 00000000:2A:00.0 Off | 0 |
| N/A 79C P0 285W / 350W | 42571MiB / 46068MiB | 97% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L20 On | 00000000:AB:00.0 Off | 0 |
| N/A 81C P0 264W / 350W | 42569MiB / 46068MiB | 96% Default |
| | | N/A |
配置文件模型和infer:
models += [
# You can comment out the models you don't want to evaluate
# All models use sampling mode
dict(
type=VLLMwithChatTemplate,
abbr='qwq',
path='/workspace/qwq',
model_kwargs=dict(
max_model_len=32768, # 设定最大序列长度
tensor_parallel_size=2, # tp=1
max_num_seqs=128 # 设定批处理大小
),
generation_kwargs=dict(
do_sample=True,
temperature=0.6,
top_p=0.95,
max_tokens=32768),
max_seq_len=32768,
max_out_len=32768,
batch_size=64,
run_cfg=dict(num_gpus=2),
pred_postprocessor=dict(type=extract_non_reasoning_content)
),]
)
请问什么原因
Beta Was this translation helpful? Give feedback.
All reactions