Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

XPU and MPS take 3 #276

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

XPU and MPS take 3 #276

wants to merge 30 commits into from

Conversation

jwallwork23
Copy link
Contributor

@jwallwork23 jwallwork23 commented Feb 6, 2025

Closes #127.
Builds upon #125 and #209.
(Contains changes from #268 so that will need to be merged first.)

This PR adds support XPU and MPS. Unfortunately, it ended up requiring an overhaul of the pt2ts scripts, too.

Notable changes:

  • Switching from ENABLE_CUDA to the more general and extensible GPU_DEVICE=<NONE/CUDA/XPU/MPS>.
  • Pre-processor directives for handling different GPU types.
  • Support for XPU under GPU device code 12.
  • Support for MPS under GPU device code 13.
  • Use of argparse for reading command line arguments into Python scripts, rather than sys.argv.
  • Updates to docs.

Checklist

  • Test on 2 Nvidia GPUs
  • Test on 2 XPUs
  • Test on 1 MPS device

@jwallwork23 jwallwork23 added enhancement New feature or request gpu Related to buiding and running on GPU labels Feb 6, 2025
@jwallwork23 jwallwork23 self-assigned this Feb 6, 2025
. ftorch_venv/bin/activate # Uses .clang-tidy config file if present
fortitude check src/
. ftorch_venv/bin/activate
fortitude check --ignore=E001,T041 src/ftorch.F90
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the Fortran linting check to pass, I had to ignore errors of the form reported in https://github.com/Cambridge-ICCS/FTorch/actions/runs/13240946802/job/36956002877, as well as ones of the form

src/ftorch.F90:153:57: T041 'tensor_shape' has assumed size
    |
151 |       type(c_ptr), value, intent(in)    :: data
152 |       integer(c_int), value, intent(in) :: ndims
153 |       integer(c_int64_t), intent(in)    :: tensor_shape(*)
    |                                                         ^ T041
154 |       integer(c_int64_t), intent(in)    :: strides(*)
155 |       integer(c_int), value, intent(in) :: dtype

because we established that the cases where we use assumed size in ftorch.F90 are required.

@jwallwork23 jwallwork23 marked this pull request as ready for review February 10, 2025 16:21
@jwallwork23
Copy link
Contributor Author

Offline testing for CUDA version of MultiGPU example with 2 devices passed on Ampere. In the queue for XPU testing on PVC.

@jwallwork23 jwallwork23 requested a review from ma595 February 10, 2025 16:22
This was referenced Feb 11, 2025
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jwallwork23 I have only done a quick pass of this so far and will need to schedule time for a closer look, but I suspect you know my first comment - can you update the docs at utils/README etc. to reflect the new args and usage of pt2ts?

I would also like to see it documented somewhere how the device enums are managed from CMake. Perhaps under the developer docs. As, whilst a very nifty solution, it's slightly abstract if you are not the one who came up with it 😉

@jatkinson1000
Copy link
Member

Also looks like you may want a rebase after #268

@jwallwork23
Copy link
Contributor Author

Thanks @jwallwork23 I have only done a quick pass of this so far and will need to schedule time for a closer look, but I suspect you know my first comment - can you update the docs at utils/README etc. to reflect the new args and usage of pt2ts?

Oh, I'd forgotten about that README actually. Addressed here 50bd953.

I would also like to see it documented somewhere how the device enums are managed from CMake. Perhaps under the developer docs. As, whilst a very nifty solution, it's slightly abstract if you are not the one who came up with it 😉

Ah yes... done in 838f3ae.

Also looks like you may want a rebase after #268

Done!

@jwallwork23
Copy link
Contributor Author

jwallwork23 commented Feb 12, 2025

I don't understand why the clang-tidy errors didn't crop up previously...?

For one of them, I can fix with

+#include <stdbool.h>

For the device related one, I could fix with

+// NOTE: These need to be defined here for clang-tidy to pass
+#ifndef GPU_DEVICE_NONE
+#define GPU_DEVICE_NONE 0
+#endif
+#ifndef GPU_DEVICE_CUDA
+#define GPU_DEVICE_CUDA 1
+#endif
+#ifndef GPU_DEVICE_XPU
+#define GPU_DEVICE_XPU 12
+#endif
+#ifndef GPU_DEVICE_MPS
+#define GPU_DEVICE_MPS 13
+#endif

(although would prefer to avoid duplicating these codes)

But the other torch/script one - is clang-tidy meant to be compiler-aware or something?

@jatkinson1000
Copy link
Member

jatkinson1000 commented Feb 13, 2025

I have just run this on a Mac using Mps and made the relevant tweaks to get it to work.
@jwallwork23 if you can check the diff in c863cc7 to see if you are happy that would be great. Hopefully everything should still work OK on XPU/CUDA though I had to adjust how num_devices was set.

I'll look to schedule time to review the rest of this.

Copy link
Contributor Author

@jwallwork23 jwallwork23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor suggestions but otherwise those additions looks good - thanks for testing on MPS @jatkinson1000. Still waiting for my XPU job to run.

Co-authored-by: Joe Wallwork <22053413+jwallwork23@users.noreply.github.com>
@jwallwork23
Copy link
Contributor Author

Good news - test passed on 2 XPU devices on Dawn! 🥳

@jwallwork23
Copy link
Contributor Author

Re-tested on Dawn with latest version of branch - all good

@@ -165,7 +165,7 @@ To build and install the library:
| [`CMAKE_INSTALL_PREFIX`](https://cmake.org/cmake/help/latest/variable/CMAKE_INSTALL_PREFIX.html) | `</path/to/install/lib/at/>` | Location at which the library files should be installed. By default this is `/usr/local` |
| [`CMAKE_BUILD_TYPE`](https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html) | `Release` / `Debug` | Specifies build type. The default is `Debug`, use `Release` for production code|
| `CMAKE_BUILD_TESTS` | `TRUE` / `FALSE` | Specifies whether to compile FTorch's [test suite](https://cambridge-iccs.github.io/FTorch/page/testing.html) as part of the build. |
| `ENABLE_CUDA` | `TRUE` / `FALSE` | Specifies whether to check for and enable CUDA<sup>3</sup> |
| `GPU_DEVICE` | `NONE` / `CUDA` / `XPU` / `MPS` | Specifies the target GPU architecture (if any) <sup>3</sup> |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth at least mentioning Intel/Mac GPUs in the GPU Support section of the README too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point! Addressed in 0897b20.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request gpu Related to buiding and running on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XPU and MPS support
4 participants