RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

Jayku88 · 2023-09-12T04:17:17Z

[09/12 09:35:46 main-logger]: use SyncBN
/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown
len(cache))
Traceback (most recent call last):
File "train.py", line 902, in
main()
File "train.py", line 90, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/vrlabhlbs/SphereFormer/train.py", line 156, in main_worker
torch.cuda.set_device(gpu)
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/cuda/init.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

Bob-Maxwell · 2024-10-04T14:31:50Z

I ran into the same problem, but actually, you just need to modify the train_gpu parameter in the .yaml and all will be fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

Jayku88 commented Sep 12, 2023

Bob-Maxwell commented Oct 4, 2024

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

Comments

Jayku88 commented Sep 12, 2023

Bob-Maxwell commented Oct 4, 2024