Skip to content

Segfault during tests on rust 1.27.x on macOS (SIGSEGV: invalid memory reference) #52390

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
ejpcmac opened this issue Jul 14, 2018 · 19 comments
Closed
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ejpcmac
Copy link

ejpcmac commented Jul 14, 2018

Problem description

Running the tests for a program I’ve written leads to a segfault when compiled by Rust 1.27.0 or 1.27.1:

$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 0.09s
     Running target/debug/deps/diceware-c7f6b180dd3b52c6

running 3 tests
error: process didn't exit successfully: `/***/diceware/target/debug/deps/diceware-c7f6b180dd3b52c6` (signal: 11, SIGSEGV: invalid memory reference)

The tests work well on Rust 1.26.2, 1.28-beta.10 and today’s nightly. Only the 1.27.x releases seem impacted by this bug.

Meta

$ rustc --version --verbose
rustc 1.27.1 (5f2b325f6 2018-07-07)
binary: rustc
commit-hash: 5f2b325f64ed6caa7179f3e04913db437656ec7e
commit-date: 2018-07-07
host: x86_64-apple-darwin
release: 1.27.1
LLVM version: 6.0

Note that the debug and release binaries seem to work properly. Only the test runner seem to be impacted.

@Mark-Simulacrum
Copy link
Member

I can't reproduce on macOS or Ubuntu; what version of macOS do you have? Can you get a backtrace on the segfault? Is the segfault persistent, or is it transient?

@Mark-Simulacrum Mark-Simulacrum added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. C-bug Category: This is a bug. labels Jul 14, 2018
@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

I am on OS X 10.11.6. Running RUST_BACKTRACE=1 cargo test I get nothing more than the trace I’ve put in my first message. The segfault is persistent, at least on 100% of the 10-20 runs I’ve done.

@Mark-Simulacrum
Copy link
Member

You'll need to run with something like lldb or gdb to get the segfault or look for a core dump (I believe those are generated on macOS, though you may need to do something to get them).

I am testing on 10.13.5 so this is possibly related to #51828, though that claims to only be a problem on 10.10.

@kennytm
Copy link
Member

kennytm commented Jul 14, 2018

Could you run target/debug/deps/diceware-c7f6b180dd3b52c6 in lldb and get the stack trace from there?

@Mark-Simulacrum
Copy link
Member

I do get repeated assertion failures in makes_a_passphrase_with_special_char, but even those don't happen on every run.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

Running in lldb I get:

(lldb) run
Process 46736 launched: './diceware-c7f6b180dd3b52c6' (x86_64)

running 3 tests
Process 46736 stopped
* thread #2: tid = 0x959310, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237

Edit note: running multiple times gets to the very same failure.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 14, 2018

I do get repeated assertion failures in makes_a_passphrase_with_special_char, but even those don't happen on every run.

@Mark-Simulacrum Regarding this, I think it is due to the test itself lacking for a check. If the special char is a digit and inserted in a digit, it can match a dictionary word and the test fails. I have not yet figured how to write a better test to catch this. The problem seems to occur very early in the process, and in returns_an_error_if_number_of_words_is_zero.

@Mark-Simulacrum
Copy link
Member

Can you run where or maybe backtrace in lldb to get the full trace?

@kennytm
Copy link
Member

kennytm commented Jul 15, 2018

ptr.rs:237 is inside swap_nonoverlapping_bytes(). I'm pretty sure it is the same TLS issue, though I don't know why it affects 10.11.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 15, 2018

@Mark-Simulacrum Sorry for the delay, it was already late here in Europe.

Follows a new run with its backtrace:

(lldb) run
Process 48199 launched: './diceware-c7f6b180dd3b52c6' (x86_64)

running 3 tests
Process 48199 stopped
* thread #2: tid = 0x95c6e0, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237
(lldb) bt
* thread #2: tid = 0x95c6e0, 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237, name = 'tests::returns_an_error_if_number_of_words_is_zero', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001000cc9d2 diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66 at ptr.rs:237
    frame #1: 0x000000010003dec3 diceware-c7f6b180dd3b52c6`std::sys_common::backtrace::__rust_begin_short_backtrace::hdb024ac408a215ad (.llvm.10125672080075898413) + 563 at mod.rs:650
    frame #2: 0x0000000100054f78 diceware-c7f6b180dd3b52c6`std::panicking::try::do_call::hf7d4215061b5619b (.llvm.18293538534938544574) + 40 at mod.rs:409
    frame #3: 0x00000001000d865f diceware-c7f6b180dd3b52c6`__rust_maybe_catch_panic + 31 at lib.rs:105
    frame #4: 0x000000010003f7b5 diceware-c7f6b180dd3b52c6`_$LT$F$u20$as$u20$alloc..boxed..FnBox$LT$A$GT$$GT$::call_box::hbf42c8a8f5d699cc + 165 at panicking.rs:289
    frame #5: 0x00000001000c9018 diceware-c7f6b180dd3b52c6`std::sys_common::thread::start_thread::hf39c8bd91f08cd93 + 136 at boxed.rs:648
    frame #6: 0x00000001000b95d9 diceware-c7f6b180dd3b52c6`std::sys::unix::thread::Thread::new::thread_start::h27c7af0fa85baf64 + 9 at thread.rs:90
    frame #7: 0x00007fff93c5299d libsystem_pthread.dylib`_pthread_body + 131
    frame #8: 0x00007fff93c5291a libsystem_pthread.dylib`_pthread_start + 168
    frame #9: 0x00007fff93c50351 libsystem_pthread.dylib`thread_start + 13

@pnkfelix
Copy link
Member

tagging as T-compiler under assumption that this is in our wheel house, at least until we seen evidence to the contrary

@pnkfelix pnkfelix added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority and removed I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Jul 19, 2018
@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

If I can help in some way please let me know.

@kennytm
Copy link
Member

kennytm commented Jul 19, 2018

@ejpcmac In the debugger when it crashed, could you execute disas and reg read to show the disassembly and registry dump?

@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

@kennytm For sure. I’ll do it as soon as I’m home.

@ejpcmac
Copy link
Author

ejpcmac commented Jul 19, 2018

@kennytm Here I am:

(lldb) disas
diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a:
    0x1000cc990 <+0>:  pushq  %rbp
    0x1000cc991 <+1>:  movq   %rsp, %rbp
    0x1000cc994 <+4>:  subq   $0x10, %rsp
    0x1000cc998 <+8>:  leaq   0xebc09(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc99f <+15>: callq  *(%rdi)
    0x1000cc9a1 <+17>: cmpq   $0x1, (%rax)
    0x1000cc9a5 <+21>: jne    0x1000cc9b6               ; <+38>
    0x1000cc9a7 <+23>: leaq   0xebbfa(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9ae <+30>: callq  *(%rdi)
    0x1000cc9b0 <+32>: movq   0x8(%rax), %rcx
    0x1000cc9b4 <+36>: jmp    0x1000cc9d7               ; <+71>
    0x1000cc9b6 <+38>: movl   $0x1, %eax
    0x1000cc9bb <+43>: movd   %rax, %xmm0
    0x1000cc9c0 <+48>: movdqa %xmm0, -0x10(%rbp)
    0x1000cc9c5 <+53>: leaq   0xebbdc(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9cc <+60>: callq  *(%rdi)
    0x1000cc9ce <+62>: movaps -0x10(%rbp), %xmm0
->  0x1000cc9d2 <+66>: movaps %xmm0, (%rax)
    0x1000cc9d5 <+69>: xorl   %ecx, %ecx
    0x1000cc9d7 <+71>: leaq   0xebbca(%rip), %rdi       ; std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
    0x1000cc9de <+78>: callq  *(%rdi)
    0x1000cc9e0 <+80>: movq   %rcx, 0x8(%rax)
    0x1000cc9e4 <+84>: testq  %rcx, %rcx
    0x1000cc9e7 <+87>: setne  %al
    0x1000cc9ea <+90>: addq   $0x10, %rsp
    0x1000cc9ee <+94>: popq   %rbp
    0x1000cc9ef <+95>: retq   

(lldb) reg read
General Purpose Registers:
       rax = 0x0000000100500358
       rbx = 0x000000010180d008
       rcx = 0x0000000000000000
       rdx = 0x0000000000000000
       rdi = 0x00000001001b85a8  diceware-c7f6b180dd3b52c6`std::panicking::update_panic_count::PANIC_COUNT::__getit::__KEY::hdcfcbe1f636ae13f
       rsi = 0x0000000000000103
       rbp = 0x00007000006068e0
       rsp = 0x00007000006068d0
        r8 = 0x0000000101217358
        r9 = 0x0000000000a45e09
       r10 = 0x0000000101217360
       r11 = 0xffffffff00000000
       r12 = 0x0000000000000001
       r13 = 0x0000000000001003
       r14 = 0x0000000000000000
       r15 = 0x0000000101217380
       rip = 0x00000001000cc9d2  diceware-c7f6b180dd3b52c6`std::panicking::panicking::h01f4a398b1d2259a + 66
    rflags = 0x0000000000010202
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

@kennytm
Copy link
Member

kennytm commented Jul 19, 2018

Thanks @ejpcmac! As the debugger error shows, the segfault is indeed caused by unaligned TLS access (movaps %xmm0, (%rax) but %rax is not 16-byte-aligned).

As mentioned in the issue there's no bug in 1.28-beta. We may workaround in 1.27 by forcing a link_section in the thread_local! macro (not tested)...

diff --git a/src/libstd/thread/local.rs b/src/libstd/thread/local.rs
index a170abb262..79293acc2c 100644
--- a/src/libstd/thread/local.rs
+++ b/src/libstd/thread/local.rs
@@ -177,6 +177,7 @@ macro_rules! __thread_local_inner {
                     $crate::thread::__StaticLocalKeyInner::new();
 
                 #[thread_local]
+                #[cfg_attr(target_os = "macos", link_section = "__DATA,__thread_data")]
                 #[cfg(all(target_thread_local, not(target_arch = "wasm32")))]
                 static __KEY: $crate::thread::__FastLocalKeyInner<$t> =
                     $crate::thread::__FastLocalKeyInner::new();

... but this won't affect other #[thread_local] variables if existing (and also bloats the executable size since it can't be placed in the BSS), and I'm not sure if this worth a 1.27.3.

@Mark-Simulacrum
Copy link
Member

I don't think it's worth 1.27.3. We already decided that we're not going to backport the original TLS patch -- and 1.28 will be out in two weeks anyway.

@pnkfelix
Copy link
Member

visiting for triage. I concur with @Mark-Simulacrum 's assessment. Only question is whether to close this bug today, or close it after 1.28 is released.

@nikomatsakis
Copy link
Contributor

Visiting in compiler team meeting. Inclination is to close now in favor of using 1.28 (released on Aug 2)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants