Skip to content

Segfault in futex_wait() on riscv64gc-unknown-linux-gnu with rustc 1.64.0 #102866

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
tommythorn opened this issue Oct 10, 2022 · 6 comments
Closed

Comments

@tommythorn
Copy link

tommythorn commented Oct 10, 2022

Note, this does not appear to be related to issue #102155 but I can't be sure.

I'm many levels deep in the original problem, but traced it down to autocfg not building (filed here: cuviper/autocfg#51); the repro is trivial.

The back trace however makes me suspect this is a deeper issue:

Thread 2 "cargo" received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x3ff7e0ffc0 (LWP 1094)]
syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=0, arg4=0, arg5=0, arg6=-1, arg7=274609471544) at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
27	../sysdeps/unix/sysv/linux/riscv/syscall.c: No such file or directory.
(gdb) bt
#0  syscall (syscall_number=98, arg1=<optimized out>, arg2=137, arg3=0, arg4=0, arg5=0, arg6=-1, arg7=274609471544) at ../sysdeps/unix/sysv/linux/riscv/syscall.c:27
#1  0x0000002aab379664 in std::sys::unix::futex::futex_wait () at library/std/src/sys/unix/futex.rs:62
#2  0x0000002aab37c50e in std::sys::unix::locks::futex_condvar::Condvar::wait_optional_timeout () at library/std/src/sys/unix/locks/futex_condvar.rs:51
#3  std::sys::unix::locks::futex_condvar::Condvar::wait () at library/std/src/sys/unix/locks/futex_condvar.rs:35
#4  0x0000002aab3504d4 in <jobserver::HelperState>::for_each_request::<jobserver::imp::spawn_helper::{closure#1}::{closure#0}> ()
#5  0x0000002aab350a60 in std::sys_common::backtrace::__rust_begin_short_backtrace::<jobserver::imp::spawn_helper::{closure#1}, ()> ()
#6  0x0000002aab350cb2 in _RINvNvNtCseOBki07ryB6_3std9panicking3try7do_callINtNtNtCsidPuqEqzKzv_4core5panic11unwind_safe16AssertUnwindSafeNCNCINvMNtB6_6threadNtB1T_7Builder16spawn_unchecked_NCNvNtCsGjmX1GWYch_9jobserver3imp12spawn_helpers_0uEs_00EuEB2H_.llvm.3138756864971081497 ()
#7  0x0000002aab350d4e in __rust_try.llvm.3138756864971081497 ()
#8  0x0000002aab351954 in <<std::thread::Builder>::spawn_unchecked_<jobserver::imp::spawn_helper::{closure#1}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} ()
#9  0x0000002aab37bdc0 in alloc::boxed::{impl#44}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1935
#10 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1935
#11 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108
#12 0x0000003ff7e7d450 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#13 0x0000003ff7ecaef2 in __thread_start () at ../sysdeps/unix/sysv/linux/riscv/clone.S:85

Unfortunately I don't have the expertise to debug this.

@tommythorn
Copy link
Author

The dmesg info may be useful:

[ 8935.594852] rustc[1472]: unhandled signal 11 code 0x1 at 0x0000000000000008 in librustc_driver-ac972a4e10c98556.so[3fab1a3000+6fb9000]                                                                                                                                                                               
[ 8935.594959] CPU: 1 PID: 1472 Comm: rustc Not tainted 5.17.0-1006-starfive #7-Ubuntu
[ 8935.594974] Hardware name: StarFive VisionFive V1 (DT)
[ 8935.594982] epc : 0000003faee8d1aa ra : 0000003faf093baa sp : 0000003faae67fc0
[ 8935.594991]  gp : 0000002adc331800 tp : 0000003faae87480 t0 : 0000000000002000
[ 8935.594999]  t1 : 0000000000000002 t2 : 0000003faae6b2b8 s0 : 0000003faae69a38
[ 8935.595008]  s1 : 0000000000000000 a0 : ffffffffffffe000 a1 : 0000003faae69a90
[ 8935.595016]  a2 : 00000000000012d0 a3 : 0000000000000000 a4 : 0000000000000000
[ 8935.595024]  a5 : 0000000000000000 a6 : 0000003faae6c0cb a7 : 0000000000000001
[ 8935.595032]  s2 : 0000003faae69a90 s3 : 0000000000001a70 s4 : 0000003faae6aefb
[ 8935.595041]  s5 : ffffffffffffe590 s6 : 0000000000000000 s7 : 0000003faae6aca3
[ 8935.595049]  s8 : 0000003faae6ac8b s9 : 0000003faae6ac73 s10: 0000003faae6ac5b
[ 8935.595057]  s11: 0000000000001000 t3 : 0000003faae6c0ab t4 : 0000003faae6b2f0
[ 8935.595065]  t5 : 0000003faae6b380 t6 : 0000003faae6b440
[ 8935.595072] status: 0000000200004020 badaddr: 0000000000000008 cause: 000000000000000d

@tommythorn
Copy link
Author

Aha, it's a regression. Doesn't happen with 1.58.0. I'll track it down to the exact version.

@saethlin
Copy link
Member

You might want to use https://github.com/rust-lang/cargo-bisect-rustc

@saethlin
Copy link
Member

Oh, that's not the right backtrace. I think gdb just halts on any signal, and that's a SIGUSR1 not a SIGSEGV. Everything is fine when the program is there. Perhaps this is helpful: https://peeterjoot.wordpress.com/2010/07/07/avoiding-gdb-signal-noise/

(I usually debug from core dumps, which is one way around this)

@tommythorn
Copy link
Author

Thanks Saethlin, yes, I should have known better than that. This looks more likely:

Thread 2 "rustc" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3ff05efd20 (LWP 2816)]
0x0000003ff45f61aa in <rustc_middle::arena::Arena>::alloc_from_iter::<rustc_middle::dep_graph::dep_node::DepKindStruct, rustc_arena::IsNotCopy, [rustc_middle::dep_graph::dep_node::DepKindStruct; 282]> () from /home/tommy/.rustup.riscv64-linux/toolchains/1.64.0-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
(gdb) bt
#0  0x0000003ff45f61aa in <rustc_middle::arena::Arena>::alloc_from_iter::<rustc_middle::dep_graph::dep_node::DepKindStruct, rustc_arena::IsNotCopy, [rustc_middle::dep_graph::dep_node::DepKindStruct; 282]> ()
   from /home/tommy/.rustup.riscv64-linux/toolchains/1.64.0-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
#1  0x0000003ff47fcbaa in rustc_query_impl::query_callbacks () from /home/tommy/.rustup.riscv64-linux/toolchains/1.64.0-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
#2  0x0000003ff125300a in <core::cell::once::OnceCell<_>>::get_or_try_init::outlined_call::<<core::cell::once::OnceCell<rustc_middle::ty::context::GlobalCtxt>>::get_or_init<rustc_interface::passes::create_global_ctxt::{closure#1}::{closure#0}>::{closure#0}, rustc_middle::ty::context::GlobalCtxt, !> ()
   from /home/tommy/.rustup.riscv64-linux/toolchains/1.64.0-riscv64gc-unknown-linux-gnu/bin/../lib/librustc_driver-ac972a4e10c98556.so
#3  0x0000003ff1663d14 in <core::cell::once::OnceCell<rustc_middle::ty::context::GlobalCtxt>>::get_or_init::<rustc_interface::passes::create_global_ctxt::{closure#1}::{closure#0}> ()
...

Note the fault was introduced post 1.63.0 and it doesn't reproduce on rust version 1.66.0-nightly (81f3919 2022-10-09), so we can probably close this (I wanted to capture the bug right away in case I didn't get time to dig deeper). I'll leave it for open in case somebody want me to run another experiment.

@tommythorn
Copy link
Author

Ok, clearly a dup of #102155

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants