Skip to content

SYCL and OpenMP + cuda --- -fopenmp-targets=nvptx64-cuda makes unnamed lambda kernels not work #5804

Open
@iamholger

Description

@iamholger

Describe the bug
I wrote some application, comparing SYCL and OpenMP offloading
performance using the oneAPI products on devcloud where everything
works happily with OpenCL. I am failing to
run the same code on an x86 system with several CUDA devices attached to it.

I observe the following: when using -fsycl and -fopenmp-targets on the same translation unit,
the unnamed lambda kernels no longer work. In addition, a bunch of warnings show up and
I need to specify include paths for SYCL all of a sudden.

To Reproduce
Please describe the steps to reproduce the behavior:

I build llvm -b sycl per the instructions with cuda 11.4 using GNU 10:

CUDA_LIB_PATH=/apps/cuda/cuda-11.4/lib64/stubs CC=gcc-10 CXX=g++-10 python llvm/buildbot/configure.py --cuda --cmake-opt="-DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4" -o syclcudagcc
python llvm/buildbot/compile.py -o syclcudagcc

I update my environment

export PATH=BASEDIR/syclcudagcc/bin:$PATH
export LD_LIBRARY_PATH=BASEDIR/syclcudagcc/lib:$LD_LIBRARY_PATH

I then use the newly built clang to build OpenMP as such:

cd buildopenmp
cmake  -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES="61;70;75;80;86" -DCUDA_TOOLKIT_ROOT_DIR=/apps/cuda/cuda-11.4  -DLIBOMPTARGET_ENABLE_DEBUG=YES -DCMAKE_C_COMPILER=BASEDIR/llvm/build/install/bin/clang -DCMAKE_CXX_COMPILER=BASEDIR/llvm/build/install/bin/clang++ -DCMAKE_INSTALL_PREFIX=$PWD/local  -GNinja ../llvm/openmp
ninja

I update my environment

export LD_LIBRARY_PATH=BASEDIR/buildopenmp/lib:$LD_LIBRARY_PATH

This is my test program, test.cxx. I reduced it as much as possible. There is just one
kernel, a single task that adds the first two elements of the array and writes the sum
to the 3rd element.

#include <iostream>

#include <CL/sycl.hpp>
using namespace sycl;
static queue Q(default_selector{});

int main()
{
    std::cout << "  Using SYCL device: " << Q.get_device().get_info<sycl::info::device::name>() << std::endl;
    int *data = malloc_shared<int>(3, Q); 
    data[0] = 1;
    data[1] = 2;
    data[2] = 0;

    Q.submit([&](handler &cgh)
    {   
#ifdef NONAME
       cgh.single_task(                    [=](){data[2] = data[0] + data[1];}  );
#else
       cgh.single_task<class simple_sum>(  [=](){data[2] = data[0] + data[1];}  );
#endif
    }).wait();

    std::cout << "Result: " << data[2] << "\n";
    return 0;
}

I use -DNONAME to toggle between the named kernel and the anonymous lambda version (which is the problematic one).

I will first list all the things that work as expected:

clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda -DNONAME
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda                      -fopenmp -LBASEDIR/buildopenmp/lib -lomp
clang++ test.cxx -fsycl -fsycl-targets=nvptx64-cuda -DNONAME -fopenmp -LBASEDIR/buildopenmp/lib -lomp

All of this compiles fine, I do get a warning but I don't think it is too bad:

warning: linking module '/home2/gcgt96/sycl_workspace/syclcudagcc/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home2/gcgt96/sycl_workspace/syclcudagcc/lib/clang/15.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'test.cxx' is 'nvptx64-unknown-cuda'

When additionally switching on openmp offloading, things get messy. (I removed the OpenMP offloading code from test.cxx as
the reproducer does not need it).
I can get the version with the named kernel to compile and run, but need to add more flags to find sycl headers and I get a lot of deprecation warnings (I add a selection below the compile command):

clang++ -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include

[warnings:]

/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/stream.hpp:743:21: warning: 'sycl_special_class' attribute ignored [-Wignored-attributes]
class __SYCL_EXPORT __SYCL_SPECIAL_CLASS stream {

/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/detail/cg_types.hpp:226:23: warning: 'interop_handler' is deprecated: interop_handler class is deprecated, use interop_handle instead with host-task [-Wdeprecated-declarations]
  void call(cl::sycl::interop_handler &h) { MFunc(h); }

/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/accessor.hpp:2268:7: warning: 'sycl_special_class' attribute ignored [-Wignored-attributes]
class __SYCL_SPECIAL_CLASS accessor<DataT, Dimensions, AccessMode,

Despite the warnings, this compiles and the program runs correctly.

When I use kernels with anonymous lambdas, however, I get an error:

clang++ -DNONAME -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include

[error:]

/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/kernel.hpp:42:3: error: static_assert failed due to requirement '!std::is_same<sycl::detail::auto_name, sycl::detail::auto_name>::value' "No kernel name provided without -fsycl-unnamed-lambda enabled!"

I tried with the obvious -fsycl-unnamed-lambda but to no avail, this is seemingly ignored:

clang++ -fsycl-unnamed-lambda -DNONAME -std=c++17 test.cxx -fsycl -fsycl-targets=nvptx64-cuda -fopenmp -LBASEDIR/buildopenmp/lib -lomp -fopenmp-targets=nvptx64-cuda -Xopenmp-target -march=sm_75 --libomptarget-nvptx-bc-path=BASEDIR/buildopenmp/lib -I/BASEDIR/syclcudagcc/include/sycl -I/BASEDIR/syclcudagcc/include

[same error:]

/home2/gcgt96/sycl_workspace/syclcudagcc/include/sycl/CL/sycl/kernel.hpp:42:3: error: static_assert failed due to requirement '!std::is_same<sycl::detail::auto_name, sycl::detail::auto_name>::value' "No kernel name provided without -fsycl-unnamed-lambda enabled!"
  1. Include code snippet as short as possible
  2. Specify the command which should be used to compile the program
  3. Specify the comment which should be used to launch the program
  4. Indicate what is wrong and what was expected

Environment (please complete the following information):

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcompilerCompiler related issuecudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions