Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Segfault after #3821 #3856

Open
wujingyue opened this issue Feb 8, 2025 · 4 comments
Open

Segfault after #3821 #3856

wujingyue opened this issue Feb 8, 2025 · 4 comments
Assignees

Comments

@wujingyue
Copy link
Collaborator

wujingyue commented Feb 8, 2025

I merged #3821 too quickly. The CI indeed showed the same error.

To reproduce this,

$ _bn && DEBUG_SERDE=debug pytest tests/python/test_python_frontend.py -s
Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2a420588)
==== backtrace (tid:1495132) ====
 0  /usr/local/ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x79a4e79ae614]
 1  /usr/local/ucx/lib/libucs.so.0(+0x3680c) [0x79a4e79ae80c]
 2  /usr/local/ucx/lib/libucs.so.0(+0x36a48) [0x79a4e79aea48]
 3  [0x2a420588]
=================================
Fatal Python error: Segmentation fault

Current thread 0x000079a4e9ab5300 (most recent call first):
  File "/opt/pytorch/nvfuser/nvfuser/__init__.py", line 73 in segment
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 268 in check_cpp_translation
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 477 in exec_nvfuser
  File "/opt/pytorch/nvfuser/tests/python/utils.py", line 410 in inner_fn
  File "/opt/pytorch/nvfuser/tests/python/test_python_frontend.py", line 2956 in test_issue1273
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3099 in wrapper
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3206 in _run_custom
  File "/usr/local/lib/python3.12/dist-packages/torch/testing/_internal/common_utils.py", line 3234 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/unittest.py", line 321 in runtest
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 172 in pytest_runtest_call
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 240 in <lambda>
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 340 in from_call
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 239 in call_and_report
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 134 in runtestprotocol
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 364 in pytest_runtestloop
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 339 in _main
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 285 in wrap_session
  File "/usr/local/lib/python3.12/dist-packages/_pytest/main.py", line 332 in pytest_cmdline_main
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103 in _multicall
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513 in __call__
  File "/usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py", line 174 in main
  File "/usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py", line 197 in console_main
  File "/usr/local/bin/pytest", line 8 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, jaxlib.cpu_feature_guard, psutil._psutil_linux, psutil._psutil_posix (total: 27)
[1]    1495132 segmentation fault (core dumped)  DEBUG_SERDE=debug pytest tests/python/test_python_frontend.py -s
@wujingyue wujingyue changed the title Segfault after https://github.com/NVIDIA/Fuser/pull/3821 Segfault after #3821 Feb 8, 2025
@cowanmeg
Copy link
Collaborator

Will look into this!

@wujingyue
Copy link
Collaborator Author

Thank you!

@cowanmeg
Copy link
Collaborator

Hmm... this bug is a bit strange since the segfault occurs in ucx. Just confirming these python_frontend tests only test single device behavior, right?

@wujingyue
Copy link
Collaborator Author

these python_frontend tests only test single device behavior, right?

That's right!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants