Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Permissions for the /dev/{kfd,dri/renderXXXX} devices in containers #39

Open
elukey opened this issue Apr 18, 2023 · 1 comment
Open

Comments

@elukey
Copy link

elukey commented Apr 18, 2023

Hi folks!

I am trying the AMD device plugin on my system, deployed as Systemd unit on Debian 11 (so not a DaemonSet, but directly on the K8s node). Everything works fine and I am able to see two devices in my test container:

  • /dev/kfd
  • /dev/dri/renderD128

I am trying to run the container with an unpriviledged user, like nobody, but I am struggling to assign the proper permissions to the above devices. In the container I see something like the following (tested via nsenter):

root@alexnet-tf-gpu-pod:/# ls -l /dev/kfd 
crw-rw---- 1 root 106 242, 0 Apr 18 15:58 /dev/kfd

root@alexnet-tf-gpu-pod:/# ls -l /dev/dri/renderD128 
crw-rw---- 1 root 106 226, 128 Apr 18 15:58 /dev/dri/renderD128

The gid 106 is the render group on the underlying "bare metal" K8s worker OS, that gets mapped to the test container, but in this way I don't have a clear way to add nobody to render or similar (in the Docker image). Is there a best practice that you can suggest?

Thanks in advance!

wmfgerrit pushed a commit to wikimedia/operations-puppet that referenced this issue Apr 20, 2023
On k8s nodes we need to be able to bypass the restriction
on GPU related devices (/dev/kfd, /dev/dri/renderXXXX) set
for root:render, see
ROCm/k8s-device-plugin#39

We don't need anymore to vary the kfd access policies, so it seems
good to transform the option into something more flexible for
a broader range of use cases.

Bug: T333009
Change-Id: Idab004a1a725b1223d4ee36d2d0d900c329140f9
@sdwilsh
Copy link

sdwilsh commented Apr 11, 2024

In the securityContext for the pod, you can add supplementalGroups that the pod is run as, which I found enabled me to use the hardware.

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.29/#podsecuritycontext-v1-core

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants