Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GPU isolation options #45

Open
andy108369 opened this issue Jan 8, 2024 · 0 comments
Open

GPU isolation options #45

andy108369 opened this issue Jan 8, 2024 · 0 comments

Comments

@andy108369
Copy link

We want to make sure one cannot request more AMD GPU than he should by using certain environment variables. (e.g. HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES).
I am not sure whether this is an issue as of today, we cannot verify this since we don't have a box with more than one AMD GPU at the present time.

To bring more clarity, it is possible to expose access to all NVIDIA GPU on the Host via NVIDIA_VISIBLE_DEVICES=all env. variable set to the Pod. Luckily, we were able to work it around by setting --set deviceListStrategy=volume-mounts for nvdp/nvidia-device-plugin helm chart along with these configs in /etc/nvidia-container-runtime/config.toml file:

accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant