Make Magma optional for cuda builds? #275

zklaus · 2024-10-15T09:38:03Z

Comment:

A conda-forge environment with nothing but pytorch for cuda in it currently ways in at 7.2GB, which is perceived to be rather on the heavy side of things.

Looking at potential for slimming things down, libmagma at ~2GB looks like a good candidate.

The pytorch docs seem to suggest that libmagma is used as an alternative for cusolver, which is included anyway, at a much more modest 150 MB.

In the past, magma was significantly faster than cusolver, as demonstrated by [1]. However, a recent 2024 paper by the magma authors [2] shows that cusolver has made progress and is now faster for some of the most important problems.
Magma still offers significant performance benefits for certain workloads, but given that pytorch has the ability to switch between the available libraries, we could make magma an optional dependency, i.e. merely include it in run_constrained and leave it up to the user to choose space or performance optimization based on their use-case.

Is this feasible or am I missing something about the use of magma in pytorch?

Do you think this is desirable?

[1] S. Abdelfattah, A. Haidar, S. Tomov and J. Dongarra, "Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs," in IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 12, pp. 2700-2712, 1 Dec. 2018, doi: 10.1109/TPDS.2018.2842785. keywords: {Graphics processing units;Energy efficiency;Task analysis;Multicore processing;Dense linear solvers;GPU computing;energy efficiency},

[2] Abdelfattah A, Beams N, Carson R, et al. MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures. The International Journal of High Performance Computing Applications. 2024;38(5):468-490. doi:10.1177/10943420241261960

The text was updated successfully, but these errors were encountered:

isuruf · 2024-10-15T13:54:53Z

we could make magma an optional dependency, i.e. merely include it in run_constrained and leave it up to the user to choose space or performance optimization based on their use-case.

If you build with magma, you need it at runtime even if you don't use magma.

mgorny · 2024-11-28T19:11:05Z

Ok, I have some good news. The use of Magma is entirely limited to libtorch_cuda_linalg.so which is ~640k, and replacing Magma-enabled library with one built without Magma seems to work just fine.

I think we can split it into a separate subpackage, and build a Magma and non-Magma variants to choose from. On the minus side, we probably can't avoid building most of libtorch twice — though it could be possible to use ccache to minimize the cost of doing that.

isuruf · 2024-11-28T19:25:18Z

The use of Magma is entirely limited to libtorch_cuda_linalg.so which is ~640k

Isn't it used by libtorch_cuda.so ?

mgorny · 2024-11-28T19:30:11Z

No, it's dynamically loaded:

// In that case load library is dynamically loaded when first linalg call is made
// This helps reduce size of GPU memory context if linear algebra functions are not used

https://github.com/pytorch/pytorch/blob/a8d6afb511a69687bbb2b7e88a3cf67917e1697e/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp

isuruf · 2024-11-28T19:32:46Z

What about https://github.com/pytorch/pytorch/blob/a8d6afb511a69687bbb2b7e88a3cf67917e1697e/aten/src/ATen/cuda/detail/CUDAHooks.cpp#L110-L113 ?

mgorny · 2024-11-28T19:37:12Z

FWICS it's stubbed too:

https://github.com/pytorch/pytorch/blob/a8d6afb511a69687bbb2b7e88a3cf67917e1697e/aten/src/ATen/native/cuda/LinearAlgebraStubs.cpp#L26
https://github.com/pytorch/pytorch/blob/a8d6afb511a69687bbb2b7e88a3cf67917e1697e/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebra.cpp#L48-L52

isuruf · 2024-12-20T03:22:36Z

With the libmagma fix, the size of libmagma is significantly reduced now.

From @rgommers

$ mamba create -n pytorch-gpu-19dec2024 pytorch-gpu
$ cd ~/mambaforge/envs/pytorch-gpu-19dec2024/
$ du -hsx * | sort -rh | head -3
4,1G    lib
1,6G    targets
69M     include

$ du -hsx lib/* | sort -rh | head -10
1,4G    lib/libtorch_cuda.so
432M    lib/libcudnn_engines_precompiled.so.9.3.0
296M    lib/python3.13
293M    lib/libmagma.so
276M    lib/libtorch_cpu.so
252M    lib/libnccl.so.2.23.4
233M    lib/libcudnn_adv.so.9.3.0
104M    lib/libcudnn_ops.so.9.3.0
91M     lib/libmagma_sparse.so
68M     lib/libmkl_core.so.2

$ du -hsx targets/x86_64-linux/lib/* | sort -rh | head -10
469M    targets/x86_64-linux/lib/libcublasLt.so.12.6.4.1
280M    targets/x86_64-linux/lib/libcusparse.so.12.5.4.2
266M    targets/x86_64-linux/lib/libcufft.so.11.3.0.4
146M    targets/x86_64-linux/lib/libcusolver.so.11.7.1.2
104M    targets/x86_64-linux/lib/libcublas.so.12.6.4.1
92M     targets/x86_64-linux/lib/libcurand.so.10.3.7.77
86M     targets/x86_64-linux/lib/libcusolverMg.so.11.7.1.2
56M     targets/x86_64-linux/lib/libnvrtc.so.12.6.85
49M     targets/x86_64-linux/lib/libnvJitLink.so.12.6.85
5,1M    targets/x86_64-linux/lib/libnvrtc-builtins.so.12.6.85

With 385MB for libmagma, does it still make sense that we make magma optional with a opt-out? If that size is tolerable, I'd rather we don't make magma optional.

hmaarrfk · 2024-12-20T05:05:45Z

How do we reconcile 385MB compare to the 600MB you stated before

conda-forge/libmagma-feedstock#22 (comment)

rgommers · 2024-12-20T16:37:58Z

Re the size difference of 385 Mb vs 600 Mb, do you have CUDA 11 locally perhaps @isuruf? I posted the exact package hash I measured at #298 (comment), so this is hopefully not too hard to clarify - and that's probably worth doing after all the effort spent on this issue.

With 385MB for libmagma, does it still make sense that we make magma optional with a opt-out? If that size is tolerable, I'd rather we don't make magma optional.

Assuming 385 Mb is correct, then I tend to agree that making the recipe more complex, some build time increases I assume, and having an extra user-facing variable is likely not worth it anymore. And not merging it is perfectly fine imho - this whole exercise has already been valuable either way.

isuruf · 2024-12-21T09:43:28Z

It should be 600MB. We missed a few SMs in the last build. See conda-forge/libmagma-feedstock#23

rgommers · 2024-12-22T15:56:13Z

Ah that is unfortunate. It probably doesn't change the conclusion though. My gut feel is that the 2 GB was clearly worth it, the 385 MB not, and somewhere in the middle is a gray zone - with 600 MB on the lower end of that gray zone.

It can always be revisited later anyway, while adding now and removing later is harder (it's a user-facing knob, so bc-breaking to remove). For now I suggest to be happy with the achieved gains, and not merge nomagma.

mgorny · 2024-12-22T16:03:52Z

I agree. It's better not to expose the option to users, than to temporarily have it and then remove it. If we are to change this in the future, the work's been done, it will remain in the commit history and we can reuse it again.

isuruf · 2024-12-23T08:47:35Z

isuruf@209b1cd should also give a small reduction in binary sizes. (Didn't measure it yet)

isuruf · 2024-12-23T09:34:31Z

We should also ignore the libmagma_sparse run-export from from magma since torch doesn't use that.

mgorny · 2024-12-23T14:43:38Z

We should also ignore the libmagma_sparse run-export from from magma since torch doesn't use that.

I'll prepare that locally but I don't want to restart the CI again.

@isuruf

Pointed out by @isuruf in conda-forge#275 (comment)

zklaus added the question Further information is requested label Oct 15, 2024

hmaarrfk mentioned this issue Oct 15, 2024

Help-Wanted Priority List #273

Open

mgorny mentioned this issue Dec 4, 2024

Recipe overhaul: more tests & documentation, various clean-ups #298

Merged

5 tasks

mgorny added a commit to mgorny/pytorch-cpu-feedstock that referenced this issue Dec 23, 2024

Ignore run_exports for libmagma_sparse

2d94c39

Pointed out by @isuruf in conda-forge#275 (comment)

mgorny mentioned this issue Dec 23, 2024

Ignore run_exports for libmagma_sparse mgorny/pytorch-cpu-feedstock#3

Merged

hmaarrfk closed this as completed in #298 Dec 25, 2024

hmaarrfk closed this as completed in 831ff8c Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Magma optional for cuda builds? #275

Make Magma optional for cuda builds? #275

zklaus commented Oct 15, 2024

isuruf commented Oct 15, 2024

mgorny commented Nov 28, 2024

isuruf commented Nov 28, 2024

mgorny commented Nov 28, 2024 •

edited

Loading

isuruf commented Nov 28, 2024

mgorny commented Nov 28, 2024

isuruf commented Dec 20, 2024

hmaarrfk commented Dec 20, 2024

rgommers commented Dec 20, 2024

isuruf commented Dec 21, 2024

rgommers commented Dec 22, 2024

mgorny commented Dec 22, 2024

isuruf commented Dec 23, 2024

isuruf commented Dec 23, 2024

mgorny commented Dec 23, 2024

Make Magma optional for cuda builds? #275

Make Magma optional for cuda builds? #275

Comments

zklaus commented Oct 15, 2024

Comment:

isuruf commented Oct 15, 2024

mgorny commented Nov 28, 2024

isuruf commented Nov 28, 2024

mgorny commented Nov 28, 2024 • edited Loading

isuruf commented Nov 28, 2024

mgorny commented Nov 28, 2024

isuruf commented Dec 20, 2024

hmaarrfk commented Dec 20, 2024

rgommers commented Dec 20, 2024

isuruf commented Dec 21, 2024

rgommers commented Dec 22, 2024

mgorny commented Dec 22, 2024

isuruf commented Dec 23, 2024

isuruf commented Dec 23, 2024

mgorny commented Dec 23, 2024

mgorny commented Nov 28, 2024 •

edited

Loading