-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Bug] Can not run vLLM with tensor parallel #1354
Comments
Hi, the issue appears to be due to vLLM's inability to run the Mixtral model internally, rather than an issue with OpenCompass. I suggest trying to create a minimal reproducible script that excludes OpenCompass components. Instead, write a simple Python file to run this model using vLLM and see if it can be loaded successfully. |
Thank you for your reply, I created a minimal reproducible script from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="mistralai/Mixtral-8x7B-v0.1", tensor_parallel_size=8, download_dir="/home/data/huggingface", gpu_memory_utilization=0.9)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") and run it with command |
After trying more models, it appears that this issue seems to be related to tensor parallelism. When adjusting the configuration file |
Thank you for reporting the issue. To resolve this, try modify the tensor parallel parameter in the configuration file |
After modified the Is there a specific environment version that can successfully run with tensor parallel? Are there any vllm, torch or opencompass version requirements? |
After some searching, this should be caused by a behavior change of vLLM since vllm-0.5.1. As metioned here: vllm-project/vllm#5669 (comment). |
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
重现问题 - 代码/配置示例
Just the built in
run.py
file.重现问题 - 命令或脚本
CUDA_VISIBLE_DEVICES=4,5 python run.py --models vllm_mixtral_8x7b_v0_1 --datasets mmlu_gen -m infer --max-num-workers 1 --debug
重现问题 - 错误信息
其他信息
No response
The text was updated successfully, but these errors were encountered: