Skip to content

[rocm6.5_internal_testing] fix sparse tests, enable fp16/bf16 for testing #2108

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged

Conversation

dnikolaev-amd
Copy link

@dnikolaev-amd dnikolaev-amd commented May 9, 2025

!!!
WIP - one test was missed in this PR python test/test_sparse_csr.py -v -k test_sparse_addmm_cuda_float16

PR to:

  • enable fp16/bf16 sparse for testing
  • enable complex data types for sparse matmul on ROCm
  • fix sparse addmm/baddbmm on ROCm
  • fix sparse hipification for ROCm
  • enable data types for testing with PYTORCHTEST_WITH_ROCM=0:
  • fp16 on ROCm6.5+
  • bf16 on ROCm7.0+
  • fix/enable sparse tests on ROCm (~40 tests total):
  • test_sparse_csr.py::TestSparseCSRCUDA::test_bmm_cuda_*
  • test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_*
  • test_sparse_csr.py::TestSparseCSRCUDA::test_mm_cuda_float64
  • test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_all_sparse_csr_SparseCS*
  • test_sparse_csr.py::TestSparseCSRCUDA::test_addmm_sizes_all_sparse_csr_*

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 9, 2025

Jenkins build for ae79568b2eb7d1529549ce29a20a5c94282f8b88 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[5439/7991] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/os.cc.o
[5440/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/kernels/quantize.cpp.o
[5441/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/kernels/prelu.cpp.o
[5442/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/utils/CMakeFiles/dnnl_graph_utils.dir/utils.cpp.o
[5443/7991] Building CXX object third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o
FAILED: third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_VERSION=60400 -DTORCH_ENABLE_LLVM -DTORCH_HIP_VERSION=604 -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/opt/rocm-6.4.0/include -I/var/lib/jenkins/pytorch/cmake/../third_party/benchmark/include -I/opt/llvm/include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/third_party/dynolog -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric -I/extras/CUPTI/include -I/include -I/opt/rocm/include/roctracer -I/opt/rocm/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -O3 -DNDEBUG -DNDEBUG -std=c++17 -fPIC -DMKL_HAS_SBGEMM -D__HIP_PLATFORM_AMD__=1 -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DHAS_ROCTRACER -D__HIP_PLATFORM_AMD__ -DKINETO_NAMESPACE=libkineto -DFMT_HEADER_ONLY -DENABLE_IPC_FABRIC -std=c++17 -MD -MT third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o -MF third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o.d -o third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o -c /var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.cpp
In file included from /var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.cpp:9:
/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.h:227:15: error: field ‘kernelName’ has incomplete type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’}
  227 |   std::string kernelName;
      |               ^~~~~~~~~~

@pruthvistony
Copy link
Collaborator

FP16 - to be enabled >= ROCm 6.5
BF16 - to be enabled >= ROCm 7.0

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 13, 2025

Jenkins build for ae79568b2eb7d1529549ce29a20a5c94282f8b88 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[5440/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/kernels/reduction.cpp.o
[5441/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/kernels/prelu.cpp.o
[5442/7991] Building CXX object third_party/ideep/mkl-dnn/src/graph/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/kernels/shuffle.cpp.o
[5443/7991] Building CXX object third_party/fmt/CMakeFiles/fmt.dir/src/os.cc.o
[5444/7991] Building CXX object third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o
FAILED: third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o 
/opt/cache/bin/sccache /opt/cache/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DROCM_VERSION=60400 -DTORCH_ENABLE_LLVM -DTORCH_HIP_VERSION=604 -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/opt/rocm-6.4.0/include -I/var/lib/jenkins/pytorch/cmake/../third_party/benchmark/include -I/opt/llvm/include -I/var/lib/jenkins/pytorch/third_party/onnx -I/var/lib/jenkins/pytorch/build/third_party/onnx -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/third_party/dynolog -I/var/lib/jenkins/pytorch/third_party/fmt/include -I/var/lib/jenkins/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric -I/extras/CUPTI/include -I/include -I/opt/rocm/include/roctracer -I/opt/rocm/include -isystem /var/lib/jenkins/pytorch/build/third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/gloo -isystem /var/lib/jenkins/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/googletest/googletest/include -isystem /var/lib/jenkins/pytorch/third_party/protobuf/src -isystem /opt/conda/envs/py_3.12/include -isystem /var/lib/jenkins/pytorch/third_party/XNNPACK/include -isystem /var/lib/jenkins/pytorch/third_party/ittapi/include -isystem /var/lib/jenkins/pytorch/cmake/../third_party/eigen -isystem /var/lib/jenkins/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /var/lib/jenkins/pytorch/third_party/ideep/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -O3 -DNDEBUG -DNDEBUG -std=c++17 -fPIC -DMKL_HAS_SBGEMM -D__HIP_PLATFORM_AMD__=1 -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DHAS_ROCTRACER -D__HIP_PLATFORM_AMD__ -DKINETO_NAMESPACE=libkineto -DFMT_HEADER_ONLY -DENABLE_IPC_FABRIC -std=c++17 -MD -MT third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o -MF third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o.d -o third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/RocLogger.cpp.o -c /var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.cpp
In file included from /var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.cpp:9:
/var/lib/jenkins/pytorch/third_party/kineto/libkineto/src/RocLogger.h:227:15: error: field ‘kernelName’ has incomplete type ‘std::string’ {aka ‘std::__cxx11::basic_string<char>’}
  227 |   std::string kernelName;
      |               ^~~~~~~~~~

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 14, 2025

Jenkins build for ae79568b2eb7d1529549ce29a20a5c94282f8b88 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented May 14, 2025

Jenkins build for ae79568b2eb7d1529549ce29a20a5c94282f8b88 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

@dnikolaev-amd dnikolaev-amd force-pushed the dnikolaev/enable_sparse_tests_rocm6.5 branch 2 times, most recently from a9d6586 to 6a8150c Compare May 15, 2025 20:42
@dnikolaev-amd dnikolaev-amd force-pushed the dnikolaev/enable_sparse_tests_rocm6.5 branch from 6a8150c to bfa47d1 Compare May 15, 2025 20:44
@pruthvistony pruthvistony marked this pull request as ready for review June 4, 2025 17:34
@pruthvistony pruthvistony merged commit 5b34421 into rocm6.5_internal_testing Jun 4, 2025
@pruthvistony pruthvistony deleted the dnikolaev/enable_sparse_tests_rocm6.5 branch June 4, 2025 17:34
@pruthvistony pruthvistony restored the dnikolaev/enable_sparse_tests_rocm6.5 branch June 4, 2025 17:35
pruthvistony added a commit that referenced this pull request Jun 4, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants