Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add XPU support (duplicate #125) #209

Closed
wants to merge 26 commits into from
Closed

Add XPU support (duplicate #125) #209

wants to merge 26 commits into from

Conversation

ma595
Copy link
Member

@ma595 ma595 commented Dec 20, 2024

Adds XPU support to examples and associated instructions in the documentation.

Copy link

Cpp-Linter Report ⚠️

Some files did not pass the configured checks!

clang-format (v12.0.0) reports: 1 file(s) not formatted
  • src/ctorch.cpp

Have any feedback or feature suggestions? Share it here.

@ma595
Copy link
Member Author

ma595 commented Dec 20, 2024

Build script for CSD3 (@ma595 needs to check this works end to end).

module purge
module load default-dawn
module load intel-oneapi-compilers/2025.0.3/gcc/sb5vj5us
module load gcc/14.2.0/vaetnoca
module load python/3.11.9/gcc/7xr7o47s

python3 -m venv ./venv3-pvc
source venv3-pvc/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/xpu

git clone git@github.com:Cambridge-ICCS/FTorch.git
cd FTorch/src; mkdir build; cd build

export TORCH=$(python -c "import torch; print(torch.__path__[0])")

export CMAKE_PREFIX_PATH=$TORCH

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/src/build/install

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/src/build/install -DCMAKE_BUILD_TESTS=TRUE  -DCMAKE_Fortran_COMPILER=$(which ifx)

cmake --build . --target install

@ma595 ma595 changed the title Add PVC support (duplicate #125) Add XPU support (duplicate #125) Dec 20, 2024
@ma595
Copy link
Member Author

ma595 commented Dec 20, 2024

Running the 2_ResNet_18 example (using gfortran).

./resnet_infer_fortran

[ERROR]: 0 <= device && static_cast<size_t>(device) < device_allocators.size() INTERNAL ASSERT FAILED at "/pytorch/c10/xpu/XPUCachingAllocator.cpp":555, please report a bug to PyTorch. Allocator not initialized for device 0: did you call init?

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x14626d1e6688 in ???
#1  0x146251733d68 in ???
#2  0x1463501f95af in ???
#3  0x14633846fca4 in ???
#4  0x1463385010bc in ???
#5  0x14633835ca87 in ???
#6  0x146350e79501 in ???
#7  0x1463501fc1f6 in ???
#8  0x146350e65fc6 in ???
Segmentation fault

@ma595 ma595 self-assigned this Dec 21, 2024
@ma595
Copy link
Member Author

ma595 commented Jan 24, 2025

import torch
torch.xpu.is_available()
>>>True
torch.xpu.is_initialized()
>>>False
torch.xpu.init()
torch.xpu.is_initialized()
>>>True
a = torch.tensor([1,2,3])
a.to('xpu') 
>>>tensor([1, 2, 3], device='xpu:0')
torch.xpu.is_initialized()
>>> True

Failing example:

import torch
torch.jit.load("examples/2_ResNet18/saved_resnet18_model_cpu.pt")
>>>
[WARNING] Failed to create Level Zero tracer: 2013265921
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/rds/project/rds-5mCMIDBOkPU/rse/ftorch/FTorch/venv3-pvc/lib/python3.11/site-packages/torch/jit/_serialization.py", line 163, in load
    cpp_module = torch._C.import_ir_module(cu, os.fspath(f), map_location, _extra_files, _restore_shapes)  # type: ignore[call-arg]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: 0 <= device && static_cast<size_t>(device) < device_allocators.size() INTERNAL ASSERT FAILED at "/pytorch/c10/xpu/XPUCachingAllocator.cpp":555, please report a bug to PyTorch. Allocator not initialized for device 0: did you call init?

Solution

Successful example:

import torch
torch.xpu.init()
torch.jit.load("examples/2_ResNet18/saved_resnet18_model_cpu.pt")
>>>
lots of model output here.

@jwallwork23
Copy link
Contributor

6034cb7 should've been "DO NOT MERGE".

@jwallwork23
Copy link
Contributor

Closing as superseded by #276.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
gpu Related to buiding and running on GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants