Add `get_dtype` and `get_device_type` methods for `torch_tensor` #251

jwallwork23 · 2025-01-24T11:33:13Z

Closes #248.

This PR adds functions for getting the data type and device type of a tensor, including unit tests. It also improves the consistency of the existing functions so that they are called torch_tensor_get_X but are mapped to methods of the torch_tensor class as just get_X. This makes it clearer what we are getting the rank/shape of when we call as functions, and reduces unnecessary verbosity when calling as methods.

Using the new utility for getting the device type, the operator overloads involving scalars are corrected so that CPU isn't assumed.

src/ftorch.F90

jatkinson1000

This all looks good code-wise @jwallwork23
You are right that it was a tricky review!!

My overarching thought is perhaps we need an example for basic tensor manipulation.
I think that would be a separate task from this PR however. Thoughts?

Now that you explicitly set the device to create the tensor in some overloads I have a question.
What happens if we call this with tensors that are on different devices?
I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

jatkinson1000 · 2025-01-27T17:13:52Z

src/ctorch.cpp

+const torch_device_t get_ftorch_device(torch::DeviceType device_type) {
+  switch (device_type) {
+  case torch::kCPU:
+    return torch_kCPU;
+  case torch::kCUDA:
+    return torch_kCUDA;
+  default:
+    std::cerr << "[ERROR]: device type " << device_type << " not implemented in FTorch"
+              << std::endl;
+    exit(EXIT_FAILURE);
+  }
+}
+


Note: This will need extending in #209

src/ctorch.cpp

jatkinson1000 · 2025-01-29T09:44:59Z

src/test/unit/test_tensor_interrogation_cuda.pf

Do you think it would be worth adding a test for the other functions running on a CUDA device (get rank etc)?

On thinking about this, if they do not map over from CPU to other devices then that is likely an issue with the backends, rather than our code, so would only be serving as a warning that there were problems with the underlying dependencies.

Such tests would be duplicating those in the CPU tests and then we'd need to do the same for XPU, etc. I think we can lean on the underlying implementation for this, but can add tests if preferred.

Nah, I think I agree, and am keen to avoid the verbosity until proven otherwise.
Just wondered what your thoughts were. :)

jwallwork23 · 2025-01-29T09:58:42Z

You are right that it was a tricky review!!

Apologies!

My overarching thought is perhaps we need an example for basic tensor manipulation. I think that would be a separate task from this PR however. Thoughts?

That's a good idea. I'm increasingly thinking we need to rethink the ordering of the examples. (See #258 (comment).) Such an example should be near the start.

Now that you explicitly set the device to create the tensor in some overloads I have a question. What happens if we call this with tensors that are on different devices? I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

Will check, thanks for pointing out this case.

jatkinson1000 · 2025-01-29T10:03:04Z

Cool, opened #261
Happy for this to be merged after you check the "different device" query.

jwallwork23 · 2025-01-30T12:00:52Z

Now that you explicitly set the device to create the tensor in some overloads I have a question. What happens if we call this with tensors that are on different devices? I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

@jatkinson1000 hm the error isn't so helpful (in fact there isn't even one). Making the modifications in the last commit on 248_get-dtype-devicetype_GPU-test - which attempts to assign a tensor on a CUDA device to a tensor on the CPU - I get the output

4: Test command: /home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU/multigpu_infer_fortran "/home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU/saved_multigpu_model_cuda.pt"
4: Working Directory: /home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU
4: Test timeout computed to be: 1500
4: input on rank 0: [  0.0,  1.0,  2.0,  3.0,  4.0]
4: output on rank 0: [*****,  0.0,  0.0,  0.0,*****]
4:  MultiGPU example ran successfully
4/4 Test #4: multigpu_infer_fortran ...........   Passed    8.12 sec

That is, it doesn't raise an error at all. So I guess we should build in errors for when you try to apply operator overloads to tensors on different devices.

jwallwork23 · 2025-01-30T13:56:39Z

Opened #269. Will merge and follow up there.

* Add dtype and device_type attrs for torch_tensor; implement getters * Rename get_<rank/shape> as torch_tensor_get_<rank/shape> for consistency * Make torch_tensor_get_device_index a class method * Add unit test for torch_tensor_get_device_type on CPU * Add unit test for torch_tensor_get_device_type on CUDA device * Add unit test for torch_tensor_get_dtype * Make use of getters for device type and index * Alias methods to be less verbose * Implement get_device_type on C++ side; introduce get_ftorch_device * Implement get_dtype on C++ side; introduce get_ftorch_dtype * Drop dtype/device type attributes

jwallwork23 added 8 commits January 24, 2025 11:29

Add dtype and device_type attrs for torch_tensor; implement getters

6c5cef8

Rename get_<rank/shape> as torch_tensor_get_<rank/shape> for consistency

0d8ffd3

Make torch_tensor_get_device_index a class method

c4974a4

Add unit test for torch_tensor_get_device_type on CPU

e420b23

Add unit test for torch_tensor_get_device_type on CUDA device

1de6755

Add unit test for torch_tensor_get_dtype

e12dacc

Make use of getters for device type and index

e046c9e

Alias methods to be less verbose

6e4b218

jwallwork23 added enhancement New feature or request testing Related to FTorch testing labels Jan 24, 2025

jwallwork23 self-assigned this Jan 24, 2025

jwallwork23 added 2 commits January 24, 2025 16:50

Implement get_device_type on C++ side; introduce get_ftorch_device

9160652

Implement get_dtype on C++ side; introduce get_ftorch_dtype

c574e41

jwallwork23 marked this pull request as ready for review January 24, 2025 16:55

jwallwork23 commented Jan 27, 2025

View reviewed changes

src/ftorch.F90 Outdated Show resolved Hide resolved

jwallwork23 requested review from jatkinson1000 and TomMelt January 27, 2025 10:12

jwallwork23 added 2 commits January 27, 2025 10:53

Drop dtype/device type attributes

7e52efe

Inlining; formatting

6fce43b

jatkinson1000 approved these changes Jan 29, 2025

View reviewed changes

jatkinson1000 mentioned this pull request Jan 29, 2025

Add tensor manipulation example #261

Open

jwallwork23 mentioned this pull request Jan 30, 2025

Raise error if overloaded operator applied to tensors on different devices #269

Open

jwallwork23 merged commit 08d6da2 into main Jan 30, 2025
5 checks passed

jwallwork23 deleted the 248_get-dtype-devicetype branch January 30, 2025 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `get_dtype` and `get_device_type` methods for `torch_tensor` #251

Add `get_dtype` and `get_device_type` methods for `torch_tensor` #251

jwallwork23 commented Jan 24, 2025

jatkinson1000 left a comment

jatkinson1000 Jan 27, 2025

jatkinson1000 Jan 29, 2025

jwallwork23 Jan 29, 2025

jatkinson1000 Jan 29, 2025

jwallwork23 commented Jan 29, 2025

jatkinson1000 commented Jan 29, 2025

jwallwork23 commented Jan 30, 2025 •

edited

Loading

jwallwork23 commented Jan 30, 2025

Add get_dtype and get_device_type methods for torch_tensor #251

Add get_dtype and get_device_type methods for torch_tensor #251

Conversation

jwallwork23 commented Jan 24, 2025

jatkinson1000 left a comment

Choose a reason for hiding this comment

jatkinson1000 Jan 27, 2025

Choose a reason for hiding this comment

jatkinson1000 Jan 29, 2025

Choose a reason for hiding this comment

jwallwork23 Jan 29, 2025

Choose a reason for hiding this comment

jatkinson1000 Jan 29, 2025

Choose a reason for hiding this comment

jwallwork23 commented Jan 29, 2025

jatkinson1000 commented Jan 29, 2025

jwallwork23 commented Jan 30, 2025 • edited Loading

jwallwork23 commented Jan 30, 2025

Add `get_dtype` and `get_device_type` methods for `torch_tensor` #251

Add `get_dtype` and `get_device_type` methods for `torch_tensor` #251

jwallwork23 commented Jan 30, 2025 •

edited

Loading