You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting a GPU to a fractional value appears to cause RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES to be ignored when using TorchTrainer, as demonstrated below:
We're still attempting to work elegantly around the lack of GPU spreading, as discussed here #48012 . Self management of the GPUs would be an easy acceptable solution!
The text was updated successfully, but these errors were encountered:
choosehappy
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 21, 2025
What happened + What you expected to happen
Setting a GPU to a fractional value appears to cause RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES to be ignored when using TorchTrainer, as demonstrated below:
I’m using Ray 2.24, and this works as expected
With output:
However adding a fractional GPU resource like this
Now causes this output:
We're still attempting to work elegantly around the lack of GPU spreading, as discussed here #48012 . Self management of the GPUs would be an easy acceptable solution!
Versions / Dependencies
ray==2.40.0
Python 3.10.12
Docker container: nvcr.io/nvidia/pytorch:24.08-py3
Reproduction script
As provided above
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: