[Installation]: Nvidia runtime issue? On new VLLM 0.7.0 #12505

Playerrrrr · 2025-01-28T08:59:49Z

Your current environment

The output of `python collect_env.py`

docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host -e VLLM_ENABLE_PREFIX_CACHING=true --name qwen2.5_20250128 vllm/vllm-openai:v0.7.0 --model Qwen/Qwen2.5-72B-Instruct --tensor-parallel-size=4 --gpu-memory-utilization=0.90 --enforce-eager --rope-scaling '{"type": "yarn","factor": 4,"original_max_position_embeddings": 32768}'
error:
/usr/bin/ld: cannot find -lcuda: No such file or directory

How you are installing vllm

docker

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

jamesbraza · 2025-01-28T22:01:03Z

I hit this as well during vllm serve --tensor-parallel-size 2 today with vllm==0.7.0:

INFO 01-28 13:54:38 weight_utils.py:251] Using model weights format ['*.safetensors']
(VllmWorkerProcess pid=1241396) INFO 01-28 13:54:38 weight_utils.py:251] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00<00:00,  1.48it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.14it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:01<00:00,  1.18it/s]

INFO 01-28 13:54:40 model_runner.py:1115] Loading model weights took 7.1441 GB
(VllmWorkerProcess pid=1241396) INFO 01-28 13:54:41 model_runner.py:1115] Loading model weights took 7.1441 GB
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
/usr/bin/ld: cannot find -lcuda: No such file or directory
/usr/bin/ld: cannot find -lcuda: No such file or directory
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status

mgoin · 2025-01-29T00:59:41Z

@tlrmchlsmth would you have an idea? This seems related to #12424

robertgshaw2-redhat · 2025-01-29T02:22:46Z

@russellb this looks similar to what you were helping dan with

tlrmchlsmth · 2025-01-29T02:37:47Z

@tlrmchlsmth would you have an idea? This seems related to #12424

Yep, does seem suspicious. Not sure what's going wrong though

tlrmchlsmth · 2025-01-29T03:18:56Z

We might need to do something like this instead:
https://github.com/pytorch/pytorch/blob/50f834f13417e4e4b930472c8cf237aa5a60f38e/c10/cuda/driver_api.cpp#L12

dhuangnm · 2025-01-29T14:16:32Z

I hit similar issue on my build instance (Ubuntu 20.04) and here is what i did to workaround the error:

find out where libcuda.so is installed on the instance, e.g. on my machine with CUDA 12.4 installed, it's located under:
/usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs/libcuda.so
softlink /usr/lib64/libcuda.so to the libcuda.so you found above:
sudo ln -s /usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs/libcuda.so /usr/lib64/libcuda.so

The ld command looks only looking for libraries under certain locations. Since the libcuda.so is not under where it's looking for thus the error. After setting the softlink, vllm can build and run successfully.

tlrmchlsmth · 2025-01-29T15:16:43Z

I've put up #12552 to revert #12424.

For those having issues with vLLM 0.7.0, the easiest solution will be adding the directory containing libcuda.so to your LD_LIBRARY_PATH environment variable

dhuangnm · 2025-01-29T15:34:33Z

I tried setting LD_LIBRARY_PATH initially but it didn't work for me for some reason. The ld command still complained about -lcuda not found and I had to use the softlink.

gargnipungarg · 2025-01-30T14:28:40Z

+1

mgoin · 2025-01-30T14:53:38Z

This should have been fixed with #12552 so please wait for the next release to include the revert

stefanobranco · 2025-01-30T18:35:02Z

I assume this only happens for 0.7.0 for everyone here then, since the reverted change is relatively recent? I'm asking because I've been having this issue ever since 0.6.5 which would suggest a different root cause (or a different issue altogether) as also mentioned here #11643

gargnipungarg · 2025-01-31T04:06:58Z

+1
Issue has been happening since 0.6.5
I built the latest main changes as well, dint work for me

Playerrrrr · 2025-01-31T04:16:40Z

++1
@mgoin @dhuangnm

Tried setting LD_LIBRARY_PATH -> didnt work
Tried softlink -> also didnt work

Playerrrrr · 2025-02-04T04:37:32Z

thanks it started working normally again in v0.7.1
@mgoin

gargnipungarg · 2025-02-04T07:32:52Z

Did it work for anyone else?
Not for me, even with 0.7.1.

OswaldoBornemann · 2025-02-14T06:06:29Z

+1

Playerrrrr · 2025-02-16T09:00:35Z

It works normally again for me since 0.7.1

Playerrrrr added the installation Installation problems label Jan 28, 2025

DarkLight1337 changed the title ~~[Installation]: Nvidia runtime issue? On new VLLM 7.0~~ [Installation]: Nvidia runtime issue? On new VLLM 0.7.0 Jan 28, 2025

tlrmchlsmth mentioned this issue Jan 29, 2025

Revert "[Build/CI] Fix libcuda.so linkage" #12552

Merged

csiefer2 mentioned this issue Jan 29, 2025

[Bug]: vllm container does not set LD_LIBRARY_PATH correctly #12559

Open

1 task

stefanobranco mentioned this issue Jan 31, 2025

[Bug]: I try to use vllm==0.6.5 for GLM4-9b-chat but error "/usr/bin/ld: cannot find -lcuda" #11643

Open

1 task

Playerrrrr closed this as completed Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Installation]: Nvidia runtime issue? On new VLLM 0.7.0 #12505

[Installation]: Nvidia runtime issue? On new VLLM 0.7.0 #12505

Playerrrrr commented Jan 28, 2025

jamesbraza commented Jan 28, 2025 •

edited

Loading

mgoin commented Jan 29, 2025

robertgshaw2-redhat commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

dhuangnm commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

dhuangnm commented Jan 29, 2025

gargnipungarg commented Jan 30, 2025

mgoin commented Jan 30, 2025

stefanobranco commented Jan 30, 2025

gargnipungarg commented Jan 31, 2025

Playerrrrr commented Jan 31, 2025

Playerrrrr commented Feb 4, 2025

gargnipungarg commented Feb 4, 2025

OswaldoBornemann commented Feb 14, 2025

Playerrrrr commented Feb 16, 2025

[Installation]: Nvidia runtime issue? On new VLLM 0.7.0 #12505

[Installation]: Nvidia runtime issue? On new VLLM 0.7.0 #12505

Comments

Playerrrrr commented Jan 28, 2025

Your current environment

How you are installing vllm

Before submitting a new issue...

jamesbraza commented Jan 28, 2025 • edited Loading

mgoin commented Jan 29, 2025

robertgshaw2-redhat commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

dhuangnm commented Jan 29, 2025

tlrmchlsmth commented Jan 29, 2025

dhuangnm commented Jan 29, 2025

gargnipungarg commented Jan 30, 2025

mgoin commented Jan 30, 2025

stefanobranco commented Jan 30, 2025

gargnipungarg commented Jan 31, 2025

Playerrrrr commented Jan 31, 2025

Playerrrrr commented Feb 4, 2025

gargnipungarg commented Feb 4, 2025

OswaldoBornemann commented Feb 14, 2025

Playerrrrr commented Feb 16, 2025

jamesbraza commented Jan 28, 2025 •

edited

Loading