Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

imports for GPU packages from compat #119

Merged

Conversation

jperez999
Copy link
Collaborator

This PR enables a user to run our GPU docker container without GPUs and successfully be able to run all tests. Previously we had try catches in a few places in the code for importing GPU specific packages. This PR remediates those try catches because in an environment where you have the package but no GPU you end up getting ccuda.pyx init failure. This is because, the package (i.e. cudf, dask_cudf, rmm) do exist but when they try to access information about GPUs it fails and throws an error something like:

collecting ... Traceback (most recent call last):
  File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit
  File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Exception ignored in: 'cuda._lib.ccudart.utils.cudaPythonGlobal.lazyInitGlobal'
Traceback (most recent call last):
  File "cuda/_cuda/ccuda.pyx", line 3671, in cuda._cuda.ccuda._cuInit
  File "cuda/_cuda/ccuda.pyx", line 435, in cuda._cuda.ccuda.cuPythonInit
RuntimeError: Failed to dlopen libcuda.so
Fatal Python error: Segmentation fault

This PR leverages the compat file, and makes it the single point of import for the main packages (cudf, cupy) and it adds a security around it that ensures you can only import those packages if GPUs are detected. So if you find yourself in a scenario where the package is installed but no GPUs are detected you can, now, still safely use the core package. Therefore, we can now run our containers with and without the --gpus=all flag in docker. This was a customer ask and it helps developers when trying to test cpu only environment on a resource that has GPUs.

similar to NVIDIA-Merlin/core#261 and NVIDIA-Merlin/NVTabular#1791

@jperez999 jperez999 added enhancement New feature or request ci chore labels Mar 29, 2023
@jperez999 jperez999 added this to the Merlin 23.03 milestone Mar 29, 2023
@jperez999 jperez999 self-assigned this Mar 29, 2023
@karlhigley karlhigley modified the milestones: Merlin 23.03, Merlin 23.04 Apr 4, 2023
@karlhigley karlhigley merged commit 0bfd68f into NVIDIA-Merlin:main Apr 4, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
chore ci enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants