Description
Describe the bug
I wrote some application, comparing SYCL and OpenMP offloading
performance using the oneAPI products on devcloud where everything
works happily with OpenCL. I am failing to
run the same code on an x86 system with several CUDA devices attached to it.
I observe the following: when using -fsycl and -fopenmp-targets on the same translation unit,
the unnamed lambda kernels no longer work. In addition, a bunch of warnings show up and
I need to specify include paths for SYCL all of a sudden.
To Reproduce
Please describe the steps to reproduce the behavior:
I build llvm -b sycl per the instructions with cuda 11.4 using GNU 10:
CUDA_LIB_PATH=/apps/cuda/cuda-11.4/lib64/stubs CC=gcc-10 CXX=g++-10 python llvm/buildbot/configure.py --cuda --cmake-opt="-DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4" -o syclcudagcc
python llvm/buildbot/compile.py -o syclcudagcc
I update my environment
export PATH=BASEDIR/syclcudagcc/bin:$PATH
export LD_LIBRARY_PATH=BASEDIR/syclcudagcc/lib:$LD_LIBRARY_PATH
I then use the newly built clang to build OpenMP as such:
cd buildopenmp
cmake -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="61;70;75;80;86" -DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4 -DLIBOMPTARGET_ENABLE_DEBUG=YES -DCMAKE_C_COMPILER=BASEDIR/llvm/build/install/bin/clang -DCMAKE_CXX_COMPILER=BASEDIR/llvm/build/install/bin/clang++ -DCMAKE_INSTALL_PREFIX=$PWD/local -GNinja ../llvm/openmp
ninja
I update my environment
export LD_LIBRARY_PATH=BASEDIR/buildopenmp/lib:$LD_LIBRARY_PATH
This is my test program, test.cxx. I reduced it as much as possible. There is just one
kernel, a single task that adds the first two elements of the array and writes the sum
to the 3rd element.
#include <iostream>
#include <CL/sycl.hpp>
using namespace sycl;
static queue Q(default_selector{});
int main()
{
std::cout << " Using SYCL device: " << Q.get_device().get_info<sycl::info::device::name>() << std::endl;
int *data = malloc_shared<int>(3, Q);
data[0] = 1;
data[1] = 2;
data[2] = 0;
Q.submit([&](handler &cgh)
{
#ifdef NONAME
cgh.single_task( [=](){data[2] = data[0] + data[1];} );
#else
cgh.single_task<class simple_sum>( [=](){data[2] = data[0] + data[1];} );
#endif
}).wait();
std::cout << "Result: " << data[2] << "\n";
return 0;
}
I use -DNONAME to toggle between the named kernel and the anonymous lambda version (which is the problematic one).
I will first list all the things that work as expected:
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda -DNONAME
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda -DNONAME -fopenmp -LBASEDIR/buildopenmp/lib -lomp
All of this compiles fine, I do get a warning but I don't think it is too bad:
warning: linking module '/home2/gcgt96/sycl_workspace/syclcudagcc/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home2/gcgt96/sycl_workspace/syclcudagcc/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'test.cxx' is 'nvptx64-unknown-cuda'
When additionally switching on openmp offloading, things get messy. (I removed the OpenMP offloading code from test.cxx as
the reproducer does not need it).
I can get the version with the named kernel to compile and run, but need to add more flags to find sycl headers and I get a lot of deprecation warnings (I add a selection below the compile command):
clang++ -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include
[warnings:]
/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/stream.hpp:743:21: warning: 'sycl_special_class' attribute ignored [-Wignored-attributes]
class __SYCL_EXPORT __SYCL_SPECIAL_CLASS stream {
/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/detail/cg_types.hpp:226:23: warning: 'interop_handler' is deprecated: interop_handler class is deprecated, use interop_handle instead with host-task [-Wdeprecated-declarations]
void call(cl::sycl::interop_handler &h) { MFunc(h); }
/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/accessor.hpp:2268:7: warning: 'sycl_special_class' attribute ignored [-Wignored-attributes]
class __SYCL_SPECIAL_CLASS accessor<DataT, Dimensions, AccessMode,
Despite the warnings, this compiles and the program runs correctly.
When I use kernels with anonymous lambdas, however, I get an error:
clang++ -DNONAME -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include
[error:]
/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/kernel.hpp:42:3: error: static_assert failed due to requirement '!std::is_same<sycl::detail::auto_name, sycl::detail::auto_name>::value' "No kernel name provided without -fsycl-unnamed-lambda enabled!"
I tried with the obvious -fsycl-unnamed-lambda but to no avail, this is seemingly ignored:
clang++ -fsycl-unnamed-lambda -DNONAME -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include
[same error:]
/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/kernel.hpp:42:3: error: static_assert failed due to requirement '!std::is_same<sycl::detail::auto_name, sycl::detail::auto_name>::value' "No kernel name provided without -fsycl-unnamed-lambda enabled!"
- Include code snippet as short as possible
- Specify the command which should be used to compile the program
- Specify the comment which should be used to launch the program
- Indicate what is wrong and what was expected
Environment (please complete the following information):
- OS: Linux, Ubuntu 20.04.4 LTS
- Target device and vendor: Nvidia TITAN X, TITAN Xp, GeForce RTX 2080 Ti, TITAN RTX
- DPC++ version: clang version 15.0.0 (https://github.com/intel/llvm.git e200720)
- Dependencies version: N/A
Additional context
Add any other context about the problem here.