Skip to content

Commit

Permalink
fix: glibc search paths for nvidia
Browse files Browse the repository at this point in the history
Set `glibc/lib` as first `rpath` for `nvidia-container-cli`. Also
install nvidia libraries to `/usr/local/glibc/lib` so any musl libraries
lives separately.

`nvidia-container-cli` explicitly sets an `RPATH` as `$ORIGIN/../$LIB` here:
https://gitlab.com/nvidia/container-toolkit/libnvidia-container/-/blob/v1.14.6/Makefile?ref_type=tags#L183,
this means `/usr/local/lib` would be searched first, since `zfs` and
nvidia ship their own `libtirpc`, `nvidia-container-cli` first tries to
use the `libtirpc` shippeed with `zfs` at `/usr/local/lib` instead of
the one at `/usr/local/glibc/lib`. Fix this by setting an additional
`RPATH` as `$ORIGIN/../glibc/$LIB`, so that libraries in
`/usr/local/glibc/lib` have higher preference.

```bash
❯ scanelf -r _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
 TYPE   RPATH FILE
ET_DYN $ORIGIN/../glibc/$LIB:$ORIGIN/../$LIB _out/rootfs/rootfs/usr/local/bin/nvidia-container-cli
```

Properly fixes: #380

Fixes from #401 and #410 were not complete.

Manually tested by spinning up a NVIDIA worker in AWS.

Signed-off-by: Noel Georgi <git@frezbo.dev>
  • Loading branch information
frezbo committed Jun 24, 2024
1 parent 3197e22 commit 5334e89
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 13 deletions.
5 changes: 0 additions & 5 deletions nvidia-gpu/nvidia-container-toolkit/glibc/ld.so.conf
Original file line number Diff line number Diff line change
@@ -1,6 +1 @@
# libc default configuration
/usr/local/lib

/usr/local/glibc/lib
/usr/lib
/lib
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ steps:
cd libnvidia-container
# LDLIBS=-L/usr/local/glibc/lib is set so that libnvidia-container-cli libs which are hardcoded as -llibname and not using pkg-config
CPPFLAGS="-I/usr/local/glibc/include/tirpc" LDLIBS="-L/usr/local/glibc/lib -ltirpc -lelf -lseccomp" make
CPPFLAGS="-I/usr/local/glibc/include/tirpc" LDLIBS="-L/usr/local/glibc/lib -ltirpc -lelf -lseccomp" LDFLAGS='-Wl,--rpath=\$$ORIGIN/../glibc/\$$LIB' make
install:
- |
mkdir -p /rootfs
Expand Down
17 changes: 10 additions & 7 deletions nvidia-gpu/nvidia-container-toolkit/nvidia-pkgs/pkg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,12 @@ steps:
bash nvidia.run --extract-only
install:
- |
mkdir -p /rootfs/usr/local \
/rootfs/usr/local/lib/containers/nvidia-persistenced \
/rootfs/usr/local/etc/containers \
/rootfs/usr/etc/udev/rules.d
cd NVIDIA-Linux-*
./nvidia-installer --silent \
--opengl-prefix=/rootfs/usr/local \
--utility-prefix=/rootfs/usr/local \
--utility-libdir=glibc/lib \
--documentation-prefix=/rootfs/usr/local \
--no-rpms \
--no-kernel-modules \
Expand All @@ -61,13 +57,20 @@ steps:
cp -r /usr/share/egl/* /rootfs/usr/share/egl
cp -r /etc/vulkan/* /rootfs/etc/vulkan
# mv over files from /usr/local/lib -> /usr/local/glibc/lib
mv /rootfs/usr/local/lib/* /rootfs/usr/local/glibc/lib/
# copy xorg files
mkdir -p /rootfs/usr/local/lib/nvidia/xorg
find /usr/lib/xorg/modules -type f -exec cp {} /rootfs/usr/local/lib/nvidia/xorg \;
mkdir -p /rootfs/usr/local/glibc/lib/nvidia/xorg
find /usr/lib/xorg/modules -type f -exec cp {} /rootfs/usr/local/glibc/lib/nvidia/xorg \;
# run ldconfig to update the cache
/rootfs/usr/local/glibc/sbin/ldconfig -r /rootfs
mkdir -p /rootfs/usr/local/lib/containers/nvidia-persistenced \
/rootfs/usr/local/etc/containers \
/rootfs/usr/etc/udev/rules.d
# copy udev rule
cp /pkg/files/15-nvidia-device.rules /rootfs/usr/etc/udev/rules.d
finalize:
Expand Down

0 comments on commit 5334e89

Please # to comment.