Allow GDB to unwind HardFault callstacks #131

adamgreen · 2018-10-23T00:45:44Z

When I currently request GDB to dump a hard fault stack, I see
something like this:

(gdb) bt
#0  UserHardFault_ (ef=0x10001fb8) at /depots/cortex-m-rt/src/lib.rs:537
#1  0x08003fe6 in HardFault ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

GDB can't unwind past HardFault since the current implementation of
this function overwrites the Link Register (LR) value. This change
pushes LR and R0 (to maintain 8-byte stack alignment) to the stack
before transferring execution to UserHardFault().

After this change, I see a callstack like this from GDB:

(gdb) bt
#0  UserHardFault_ (ef=0x10001fb0) at /depots/cortex-m-rt/src/lib.rs:537
#1  0x08003fe8 in HardFault ()
#2  <signal handler called>
#3  0x08002820 in core::ptr::read_volatile (src=0x48001800) at libcore/ptr.rs:472
#4  0x080001a2 in main () at src/07-registers/src/main.rs:14

Notes:

This code uses 8 more stack bytes.
Increases the size of the HardFault handler by 2 narrow instructions
or 4 bytes. This could be decreased to 2 bytes by removing the pop
since UserHardFault() doesn't currently return but it just looks too
odd for me to do as an initial attempt.

When I currently request GDB to dump a hard fault stack, I see something like this: (gdb) bt #0 UserHardFault_ (ef=0x10001fb8) at /depots/cortex-m-rt/src/lib.rs:537 #1 0x08003fe6 in HardFault () Backtrace stopped: previous frame identical to this frame (corrupt stack?) GDB can't unwind past HardFault since the current implementation of this function overwrites the Link Register (LR) value. This change pushes LR and R0 (to maintain 8-byte stack alignment) to the stack before transferring execution to UserHardFault(). After this change, I see a callstack like this from GDB: (gdb) bt #0 UserHardFault_ (ef=0x10001fb0) at /depots/cortex-m-rt/src/lib.rs:537 #1 0x08003fe8 in HardFault () #2 <signal handler called> #3 0x08002820 in core::ptr::read_volatile (src=0x48001800) at libcore/ptr.rs:472 #4 0x080001a2 in main () at src/07-registers/src/main.rs:14 Notes: * This code uses 8 more stack bytes. * Increases the size of the HardFault handler by 2 narrow instructions or 4 bytes. This could be decreased to 2 bytes by removing the pop since UserHardFault() doesn't currently return but it just looks too odd for me to do as an initial attempt.

thejpster · 2018-10-23T02:19:48Z

Oooh, having spent today doing a Rust workshop and regularly having to work out why I was in the fault handler, I approve of this.

therealprof

LGTM, too.

therealprof · 2018-10-23T08:08:21Z

bors r+

131: Allow GDB to unwind HardFault callstacks r=therealprof a=adamgreen When I currently request GDB to dump a hard fault stack, I see something like this: ``` (gdb) bt #0 UserHardFault_ (ef=0x10001fb8) at /depots/cortex-m-rt/src/lib.rs:537 #1 0x08003fe6 in HardFault () Backtrace stopped: previous frame identical to this frame (corrupt stack?) ``` GDB can't unwind past HardFault since the current implementation of this function overwrites the Link Register (LR) value. This change pushes LR and R0 (to maintain 8-byte stack alignment) to the stack before transferring execution to UserHardFault(). After this change, I see a callstack like this from GDB: ``` (gdb) bt #0 UserHardFault_ (ef=0x10001fb0) at /depots/cortex-m-rt/src/lib.rs:537 #1 0x08003fe8 in HardFault () #2 <signal handler called> #3 0x08002820 in core::ptr::read_volatile (src=0x48001800) at libcore/ptr.rs:472 #4 0x080001a2 in main () at src/07-registers/src/main.rs:14 ``` Notes: * This code uses 8 more stack bytes. * Increases the size of the HardFault handler by 2 narrow instructions or 4 bytes. This could be decreased to 2 bytes by removing the pop since UserHardFault() doesn't currently return but it just looks too odd for me to do as an initial attempt. Co-authored-by: Adam Green <adamgreen@users.noreply.github.com>

bors · 2018-10-23T08:18:49Z

Build succeeded

continuous-integration/travis-ci/push

adamgreig · 2018-10-23T09:36:42Z

This is great but I really don't think we need the pop at the end there -- UserHardFault really won't ever return.

thejpster · 2018-10-23T14:52:42Z

When is the next release scheduled? I'd love to be able to tell my Rust embedded training workshop when they can get backtraces in fault handlers.

adamgreig · 2018-10-23T14:59:47Z

I'd like to do one soon to get #[interrupt] out there too. It might be worth revisiting some of the pending issues and PRs first.

adamgreen · 2018-10-23T17:30:58Z

Thanks for reviewing and merging! Much appreciated!

Some of my notes on debugging faults on Cortex-M devices with GDB:

Even without this commit, it should be possible to place a breakpoint on HardFault to halt the CPU before LR is overwritten with the call to UserHardFault. The change in this commit is useful when you don't have a breakpoint set and you break into the UserHardFault loop and want to see what got you there. The Discovery book that I was reading through at the time when I hit this places the breakpoint on UserHardFault so that is why I saw it.
This commit should just work for Cortex-M0, Cortex-M3, and other Cortex-M devices without a FPU. Once you have a FPU, GDB itself doesn't know how to unwind exception stacks containing the volatile FPU registers. It also has issues with unrolling stacks when PSP is used by the faulting code. I have been patching GDB on my machine for a few years now to fix this issue. I recently sent a 64-bit Linux version of this patched GDB to the Smoothie CNC firmware guys. The diff I applied to the recent GDB sources to create this binary is also up on their GitHub repo.
GDB has had this broken Cortex-M exception unwinding code for almost a decade now. I would love to get a variant of my fixes rolled into the official GDB sources but there a few things stopping me:
- I have no experience with submitting patches to GDB. Does anyone here have experience at upstreaming fixes to GDB that could help me through the process?
- People have tried to submit patches for this in the past (mine is actually based on one of those past attempts) but they don't make it through. People don't like the use of the user_reg_map_name_to_regnum() function and it ends up turning into a conversation about how the required register set for Cortex-M devices should include MSP and PSP but don't currently. It's not an invalid argument but should really be addressed by someone more senior in the GDB community than the people like me who are just trying to fix this broken exception unwind functionality. It would be best if GDB just took a patch like this and then refactor it later once they make this architectural update (if they ever actually do).

adamgreig · 2018-10-23T17:37:19Z

@adamgreen I haven't had a chance to test yet but a couple of thoughts..

If we replaced bl UserHardFault with b UserHardFault, avoiding overwriting LR, would that also resolve the issue without needing to push onto the stack?
I still think we should remove the pop afterwards since it is dead code
Does the FPU problem happen if you're using the hard EABI where FP arguments are passed on the FPU registers instead?

adamgreen · 2018-10-23T17:50:20Z

@adamgreig

I figured that bl UserHardFault was used instead of b UserHardFault since the latter doesn't have as large of a offset capability on ARMv6-M.
I can remove the the pop but it isn't that safe. While UserHardFault doesn't currently return, I don't see anything that would stop the linker from linking in a version which does return since it is using unmangled C function naming. Maybe there is something else enforcing this?
The FPU problem doesn't have anything to do with the calling convention. It is whether the volatile FPU registers are stacked upon exception entry or not. Usually they are so that ISRs can safely make use of floating point instructions.

japaric · 2018-10-23T18:00:15Z

@adamgreen Thanks for fixing this! I was about to file an issue about this today :-).

I figured that bl UserHardFault was used instead of b UserHardFault since the latter doesn't have as large of a offset capability on ARMv6-M.

Indeed that's the case. We had linker failures with b UserHardFault on ARMv6-M because the linker is free to put UserHardFault far away from HardFault (IDK if there's a way to put them close; perhaps using KEEP would be fine in this situation).

While UserHardFault doesn't currently return, I don't see anything that would stop the linker from linking in a version which does return since it is using unmangled C function naming.

The #[exception] attribute type checks that UserHardFault is a diverging function (fn() -> !) so it can't return.

Directly using no_mangle, export_name or link_section is very unsafe (though the compiler doesn't warn about it) so one should use the safe attributes instead of directly using those three.

If you directly use no_mangle making the function non-diverging is the least of your problems. You can also write:

#[no_mangle]
fn UserHardFault(i_like_boxes: Box<i32>, and_vectors: Vec<i32>) -> ! { .. }

and generate invalid references, types, etc.

adamgreen · 2018-10-23T18:08:54Z

Cool! I will submit a PR today to remove the extraneous pop.

Thanks again all for the feedback.

132: Remove extraneous POP from my prev commit r=therealprof a=adamgreen Based on the great feedback to PR #131, I have removed the POP instruction that I added in my previous commit since UserHardFault never returns. Co-authored-by: Adam Green <adamgreen@users.noreply.github.com>

therealprof · 2018-10-24T08:21:38Z

@thejpster https://github.com/rust-embedded/cortex-m-rt/blob/master/CHANGELOG.md#v065---2018-10-23

adamgreen requested a review from a team as a code owner October 23, 2018 00:45

thejpster approved these changes Oct 23, 2018

View reviewed changes

therealprof approved these changes Oct 23, 2018

View reviewed changes

bors bot merged commit f501d5a into rust-embedded:master Oct 23, 2018

adamgreen deleted the cleanupHardFaultStackFrame branch October 23, 2018 17:31

adamgreen mentioned this pull request Oct 23, 2018

Remove extraneous POP from my prev commit #132

Merged

datdenkikniet mentioned this pull request Feb 13, 2023

cortex-m-rt: ensure the stack is 8-byte aligned. rust-embedded/cortex-m#463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow GDB to unwind HardFault callstacks #131

Allow GDB to unwind HardFault callstacks #131

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

thejpster commented Oct 23, 2018

Uh oh!

therealprof left a comment

Uh oh!

therealprof commented Oct 23, 2018

Uh oh!

bors bot commented Oct 23, 2018

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

thejpster commented Oct 23, 2018

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

japaric commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

therealprof commented Oct 24, 2018

Uh oh!

Uh oh!

Allow GDB to unwind HardFault callstacks #131

Allow GDB to unwind HardFault callstacks #131

Uh oh!

Conversation

adamgreen commented Oct 23, 2018

Uh oh!

thejpster commented Oct 23, 2018

Uh oh!

therealprof left a comment

Choose a reason for hiding this comment

Uh oh!

therealprof commented Oct 23, 2018

Uh oh!

bors bot commented Oct 23, 2018

Build succeeded

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

thejpster commented Oct 23, 2018

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

adamgreig commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

japaric commented Oct 23, 2018

Uh oh!

adamgreen commented Oct 23, 2018

Uh oh!

therealprof commented Oct 24, 2018

Uh oh!

Uh oh!