Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add get_dtype and get_device_type methods for torch_tensor #251

Merged
merged 12 commits into from
Jan 30, 2025

Conversation

jwallwork23
Copy link
Contributor

Closes #248.

This PR adds functions for getting the data type and device type of a tensor, including unit tests. It also improves the consistency of the existing functions so that they are called torch_tensor_get_X but are mapped to methods of the torch_tensor class as just get_X. This makes it clearer what we are getting the rank/shape of when we call as functions, and reduces unnecessary verbosity when calling as methods.

Using the new utility for getting the device type, the operator overloads involving scalars are corrected so that CPU isn't assumed.

@jwallwork23 jwallwork23 added enhancement New feature or request testing Related to FTorch testing labels Jan 24, 2025
@jwallwork23 jwallwork23 self-assigned this Jan 24, 2025
@jwallwork23 jwallwork23 marked this pull request as ready for review January 24, 2025 16:55
src/ftorch.F90 Outdated Show resolved Hide resolved
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good code-wise @jwallwork23
You are right that it was a tricky review!!

My overarching thought is perhaps we need an example for basic tensor manipulation.
I think that would be a separate task from this PR however. Thoughts?

Now that you explicitly set the device to create the tensor in some overloads I have a question.
What happens if we call this with tensors that are on different devices?
I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

Comment on lines +100 to +112
const torch_device_t get_ftorch_device(torch::DeviceType device_type) {
switch (device_type) {
case torch::kCPU:
return torch_kCPU;
case torch::kCUDA:
return torch_kCUDA;
default:
std::cerr << "[ERROR]: device type " << device_type << " not implemented in FTorch"
<< std::endl;
exit(EXIT_FAILURE);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This will need extending in #209

src/ctorch.cpp Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be worth adding a test for the other functions running on a CUDA device (get rank etc)?

On thinking about this, if they do not map over from CPU to other devices then that is likely an issue with the backends, rather than our code, so would only be serving as a warning that there were problems with the underlying dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such tests would be duplicating those in the CPU tests and then we'd need to do the same for XPU, etc. I think we can lean on the underlying implementation for this, but can add tests if preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think I agree, and am keen to avoid the verbosity until proven otherwise.
Just wondered what your thoughts were. :)

@jwallwork23
Copy link
Contributor Author

You are right that it was a tricky review!!

Apologies!

My overarching thought is perhaps we need an example for basic tensor manipulation. I think that would be a separate task from this PR however. Thoughts?

That's a good idea. I'm increasingly thinking we need to rethink the ordering of the examples. (See #258 (comment).) Such an example should be near the start.

Now that you explicitly set the device to create the tensor in some overloads I have a question. What happens if we call this with tensors that are on different devices? I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

Will check, thanks for pointing out this case.

@jatkinson1000
Copy link
Member

Cool, opened #261
Happy for this to be merged after you check the "different device" query.

@jwallwork23
Copy link
Contributor Author

jwallwork23 commented Jan 30, 2025

Now that you explicitly set the device to create the tensor in some overloads I have a question. What happens if we call this with tensors that are on different devices? I presume it fails with a meaningful error message from the C++, but does it provide a useful traceback to where the error originated in the Fortran? I recall sometimes libtorch gives an error report, but no code location making it hard to work out where your Fortran is going wrong.

@jatkinson1000 hm the error isn't so helpful (in fact there isn't even one). Making the modifications in the last commit on 248_get-dtype-devicetype_GPU-test - which attempts to assign a tensor on a CUDA device to a tensor on the CPU - I get the output

4: Test command: /home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU/multigpu_infer_fortran "/home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU/saved_multigpu_model_cuda.pt"
4: Working Directory: /home/joewa/software/FTorch/src/build/test/examples/3_MultiGPU
4: Test timeout computed to be: 1500
4: input on rank 0: [  0.0,  1.0,  2.0,  3.0,  4.0]
4: output on rank 0: [*****,  0.0,  0.0,  0.0,*****]
4:  MultiGPU example ran successfully
4/4 Test #4: multigpu_infer_fortran ...........   Passed    8.12 sec

That is, it doesn't raise an error at all. So I guess we should build in errors for when you try to apply operator overloads to tensors on different devices.

@jwallwork23
Copy link
Contributor Author

Opened #269. Will merge and follow up there.

@jwallwork23 jwallwork23 merged commit 08d6da2 into main Jan 30, 2025
5 checks passed
@jwallwork23 jwallwork23 deleted the 248_get-dtype-devicetype branch January 30, 2025 13:57
jwallwork23 added a commit that referenced this pull request Feb 6, 2025
* Add dtype and device_type attrs for torch_tensor; implement getters
* Rename get_<rank/shape> as torch_tensor_get_<rank/shape> for consistency
* Make torch_tensor_get_device_index a class method
* Add unit test for torch_tensor_get_device_type on CPU
* Add unit test for torch_tensor_get_device_type on CUDA device
* Add unit test for torch_tensor_get_dtype
* Make use of getters for device type and index
* Alias methods to be less verbose
* Implement get_device_type on C++ side; introduce get_ftorch_device
* Implement get_dtype on C++ side; introduce get_ftorch_dtype
* Drop dtype/device type attributes
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request testing Related to FTorch testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement get_dtype and get_device_type for tensors
2 participants