Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

Open
Jayku88 opened this issue Sep 12, 2023 · 1 comment

Comments

@Jayku88
Copy link

Jayku88 commented Sep 12, 2023

[09/12 09:35:46 main-logger]: use SyncBN
/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown
len(cache))
Traceback (most recent call last):
File "train.py", line 902, in
main()
File "train.py", line 90, in main
mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/vrlabhlbs/SphereFormer/train.py", line 156, in main_worker
torch.cuda.set_device(gpu)
File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/cuda/init.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

@Bob-Maxwell
Copy link

Screenshot from 2024-10-04 22-22-11
I ran into the same problem, but actually, you just need to modify the train_gpu parameter in the .yaml and all will be fine.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants