使用这个 sh scripts/run_assistant_server.sh 部署模型之后，会不会比VLLM速度慢很多 #506

LIUKAI0815 · 2024-06-26T10:58:37Z

sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights
这个比VLLM推理速度慢吗

zzhangpurdue · 2024-06-26T12:28:20Z

不会，一样快，底层就是调用vllm

LIUKAI0815 · 2024-06-27T10:58:19Z

sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights 这条命令怎么修改模型路径的位置，因为使用这条命令，读取的模型位置会自动跳转到modelscope下载的位置，而不是我的本地位置，我本地的模型在自己的路径里，所以会出现requests.exceptions.HTTPError: The request model: /workspace/model/llm/Qwen/Qwen2-7B-Instruct/ does not exist!的报错

LIUKAI0815 · 2024-06-27T14:22:48Z

@zzhangpurdue vi /opt/conda/lib/python3.10/site-packages/vllm/config.py

注释掉里面的代码就好了
咱们开发这种框架，能不能把这种参数在命令行里的参数就添加上

zzhangpurdue · 2024-06-28T02:02:21Z

之前我们尝试的时候确实是利用modelscope的下载地址进行测试的，没有考虑非modelscope的地址，这里我们看看如何修改。
感谢提供意见。

zzhangpurdue · 2024-06-28T02:23:20Z

刚试了一下，把模型挪出modelscope的下载路径然后也还是没有复现这个问题，是否可以告诉我一下你的vllm版本？

LIUKAI0815 · 2024-06-28T03:33:33Z

GPU环境镜像(python3.10)，ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1 。这个官方镜像里的，vllm是0.3.0。

LIUKAI0815 · 2024-06-28T03:48:26Z

@zzhangpurdue

包括swift的框架里也有这个代码，之前运行的时候也会遇过同类问题。
vi /opt/conda/envs/swift/lib/python3.10/site-packages/vllm/config.py 在这里面，环境也是上面提到的。

zzhangpurdue · 2024-06-28T06:16:19Z

我这里run脚本的时候默认 export VLLM_USE_MODELSCOPE=false 应该是可以解决这个问题。
参考 #507

zzhangpurdue mentioned this issue Jun 28, 2024

update modelscope-server setting #507

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用这个 sh scripts/run_assistant_server.sh 部署模型之后，会不会比VLLM速度慢很多 #506

使用这个 sh scripts/run_assistant_server.sh 部署模型之后，会不会比VLLM速度慢很多 #506

LIUKAI0815 commented Jun 26, 2024

zzhangpurdue commented Jun 26, 2024

LIUKAI0815 commented Jun 27, 2024 •

edited

Loading

LIUKAI0815 commented Jun 27, 2024 •

edited

Loading

zzhangpurdue commented Jun 28, 2024

zzhangpurdue commented Jun 28, 2024

LIUKAI0815 commented Jun 28, 2024 •

edited

Loading

LIUKAI0815 commented Jun 28, 2024 •

edited

Loading

zzhangpurdue commented Jun 28, 2024

使用这个 sh scripts/run_assistant_server.sh 部署模型之后，会不会比VLLM速度慢很多 #506

使用这个 sh scripts/run_assistant_server.sh 部署模型之后，会不会比VLLM速度慢很多 #506

Comments

LIUKAI0815 commented Jun 26, 2024

zzhangpurdue commented Jun 26, 2024

LIUKAI0815 commented Jun 27, 2024 • edited Loading

LIUKAI0815 commented Jun 27, 2024 • edited Loading

zzhangpurdue commented Jun 28, 2024

zzhangpurdue commented Jun 28, 2024

LIUKAI0815 commented Jun 28, 2024 • edited Loading

LIUKAI0815 commented Jun 28, 2024 • edited Loading

zzhangpurdue commented Jun 28, 2024

LIUKAI0815 commented Jun 27, 2024 •

edited

Loading

LIUKAI0815 commented Jun 27, 2024 •

edited

Loading

LIUKAI0815 commented Jun 28, 2024 •

edited

Loading

LIUKAI0815 commented Jun 28, 2024 •

edited

Loading