-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
使用这个 sh scripts/run_assistant_server.sh 部署模型之后,会不会比VLLM速度慢很多 #506
Comments
不会,一样快,底层就是调用vllm |
sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights 这条命令怎么修改模型路径的位置,因为使用这条命令,读取的模型位置会自动跳转到modelscope下载的位置,而不是我的本地位置,我本地的模型在自己的路径里,所以会出现requests.exceptions.HTTPError: The request model: /workspace/model/llm/Qwen/Qwen2-7B-Instruct/ does not exist!的报错 |
@zzhangpurdue vi /opt/conda/lib/python3.10/site-packages/vllm/config.py |
之前我们尝试的时候确实是利用modelscope的下载地址进行测试的,没有考虑非modelscope的地址,这里我们看看如何修改。 |
刚试了一下,把模型挪出modelscope的下载路径然后也还是没有复现这个问题,是否可以告诉我一下你的vllm版本? |
GPU环境镜像(python3.10),ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1 。 这个官方镜像里的,vllm是0.3.0。 |
@zzhangpurdue |
我这里run脚本的时候默认 export VLLM_USE_MODELSCOPE=false 应该是可以解决这个问题。 |
sh scripts/run_assistant_server.sh --served-model-name Qwen2-7B-Instruct --model path/to/weights
这个比VLLM推理速度慢吗
The text was updated successfully, but these errors were encountered: