Skip to content

[SYCL] Enable stack printing on crashes in post commit #7934

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 12 commits into from
Feb 9, 2023

Conversation

romanovvlad
Copy link
Contributor

@romanovvlad romanovvlad commented Jan 5, 2023

The initial attempt to enable LLVM stack printing on crashes(SEGFAULT) when running SYCL tests.
The motivation is to simplify analysis of sporadic fails in CI.

The patch:

  1. If DSYCL_ENABLE_STACK_PRINTING cmake var is defined:
    a. Links the libsycl.so with LLVMSupport which provides needed functionality
    b. Calls llvm::sys::PrintStackTraceOnErrorSignal, which registers signal handlers,
    in a couple of places in SYCL RT
    c. Adds llvm-sybmolizer to the dependencies of sycl-toolchain
  2. Modifies post-commit workflow to:
    a. build linux job (linux_default only) in RelWithDebInfo mode
    b. pass the cmake var to the configure.py
  3. Sets LLVM_SYMBOLIZER_PATH in the lit.cfg.py for SYCL to the llvm-sybmolizer binary from the build dir

@romanovvlad romanovvlad requested review from a team as code owners January 5, 2023 15:57
@romanovvlad
Copy link
Contributor Author

romanovvlad commented Jan 5, 2023

The initial attempt to enable LLVM stack printing on crashes(SEGFAULT) when running SYCL tests.
The patch:

  1. If DSYCL_ENABLE_STACK_PRINTING cmake var is defined:
    a. Links the libsycl.so with libLLVMSupport.so which provides needed functionality
    b. Calls llvm::sys::PrintStackTraceOnErrorSignal, which registers signal handlers,
    in a couple of places in SYCL RT
    c. Adds llvm-sybmolizer to the dependencies of sycl-toolchain
  2. Modifies post-commit workflow to:
    a. build linux job (linux_default only) in RelWithDebInfo mode
    b. pass the cmake var to the configure.py
  3. Sets LLVM_SYMBOLIZER_PATH in the lit.cfg.py for SYCL to the llvm-sybmolizer binary from the build dir

@romanovvlad romanovvlad temporarily deployed to aws January 5, 2023 16:33 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jan 5, 2023

  1. Links the libsycl.so with libLLVMSupport.so which provides needed functionality

Shouldn't we link with libLLVMSupport.a?

  • c. Adds llvm-sybmolizer to the dependencies of sycl-toolchain

I suppose it means that it's deployed as part of the DPC++ compiler to users. Do you think it's necessary? Maybe it's better to have a separate package with "debug tools". What do you think?

@romanovvlad
Copy link
Contributor Author

  1. Links the libsycl.so with libLLVMSupport.so which provides needed functionality

Shouldn't we link with libLLVMSupport.a?

We will link with either .a or .so depending on the LLVM build type.
Let me check if we have .a LLVMSupport when building LLVM in shared libraries mode.

  • c. Adds llvm-symbolizer to the dependencies of sycl-toolchain

I suppose it means that it's deployed as part of the DPC++ compiler to users. Do you think it's necessary? Maybe it's better to have a separate package with "debug tools". What do you think?

Currently it's not enabled by default, and there are no plans to do so. The patch adds this dependency for one post-commit task only.
We could add a separate target(or just build llvm-symbolizer), but it would require more changes in workflow scripts.

@bader
Copy link
Contributor

bader commented Jan 6, 2023

  1. Links the libsycl.so with libLLVMSupport.so which provides needed functionality

Shouldn't we link with libLLVMSupport.a?

We will link with either .a or .so depending on the LLVM build type. Let me check if we have .a LLVMSupport when building LLVM in shared libraries mode.

I'm okay if this library is linked statically for DPC++ default build configuration, which is distributed to users. I'd like to avoid shipping LLVM libraries in addition to DPC++ runtime library.

@romanovvlad romanovvlad temporarily deployed to aws January 7, 2023 12:24 — with GitHub Actions Inactive
@romanovvlad
Copy link
Contributor Author

romanovvlad commented Jan 9, 2023

  1. Links the libsycl.so with libLLVMSupport.so which provides needed functionality

Shouldn't we link with libLLVMSupport.a?

We will link with either .a or .so depending on the LLVM build type. Let me check if we have .a LLVMSupport when building LLVM in shared libraries mode.

I'm okay if this library is linked statically for DPC++ default build configuration, which is distributed to users. I'd like to avoid shipping LLVM libraries in addition to DPC++ runtime library.

I believe we cannot enable stack printing by default in the build configuration which is distributed to users. The behavior might be unexpected and undesirable. This patch enables stack printing in one job of the post-commit workflow, so, the possible additional .so dependency should not be a problem.

@romanovvlad romanovvlad force-pushed the private/vromanov/StackPrints branch from 29298f0 to 17cc118 Compare January 9, 2023 12:08
@romanovvlad romanovvlad temporarily deployed to aws January 9, 2023 13:32 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 9, 2023 15:55 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 10, 2023 11:28 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 10, 2023 13:28 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 11, 2023 00:48 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 12, 2023 05:04 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jan 13, 2023

@romanovvlad, please, merge with the head of the sycl branch to pull the fix for CI scripts. There was a bug, which hang the validation on NVIDIA GPU.

@romanovvlad romanovvlad temporarily deployed to aws January 13, 2023 12:32 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 13, 2023 13:11 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 25, 2023 15:09 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws January 25, 2023 15:40 — with GitHub Actions Inactive
Copy link
Contributor

@bader bader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.github/workflows/sycl_post_commit.yml changes look good to me.

@romanovvlad, is SYCL-CTS failure expected?

@bader
Copy link
Contributor

bader commented Jan 27, 2023

Failed Tests (1):
SYCL :: dword_atomic_smoke.cpp

This failure is known.

@romanovvlad
Copy link
Contributor Author

.github/workflows/sycl_post_commit.yml changes look good to me.

@romanovvlad, is SYCL-CTS failure expected?

No, but it shouldn't be related to the patch. From the logs:

2023-01-25T15:14:43.6537724Z terminate called after throwing an instance of 'sycl::_V1::runtime_error'
2023-01-25T15:14:43.6538045Z   what():  No device of requested type available. -1 (PI_ERROR_DEVICE_NOT_FOUND)
2023-01-25T15:14:43.7159749Z /__w/_temp/e0d232d2-d0e7-4510-93f0-28c2ed557d55.sh: line 30:  6680 Aborted                 (core dumped) ./build/bin/test_all --list-devices
2023-01-25T15:14:43.7189087Z ##[error]Process completed with exit code 134.

@bader
Copy link
Contributor

bader commented Jan 27, 2023

@romanovvlad, please, file a bug report if you think this failure is not related to the changes in your PR.

@bader
Copy link
Contributor

bader commented Feb 1, 2023

@romanovvlad, I suggest we sync with the tip of the sycl branch to get up-to-date CI results. I hope this will fix SYCL-CTS job.

@romanovvlad romanovvlad temporarily deployed to aws February 3, 2023 11:49 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws February 3, 2023 17:54 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws February 6, 2023 12:00 — with GitHub Actions Inactive
@romanovvlad romanovvlad temporarily deployed to aws February 6, 2023 12:32 — with GitHub Actions Inactive
@bader bader merged commit b8d955c into intel:sycl Feb 9, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants