-
-
Notifications
You must be signed in to change notification settings - Fork 75
Stack painting with --measure-stack
is slow
#258
Comments
Contextthe measurement consists of two steps:
note that the search has to start at the "end" of the stack. in the case of the ARM ISA that would be the lowest address Solutionhere's how to make those two steps (hopefully) faster:
these two operations can be prototyped outside these two alternative approaches should be timed before being integrated into More contextmore details on loading and executing the program on the target: How to write the subroutine?the Where to load the subroutine?after that, the question is where to load the subroutine: I would suggest loading it to RAM because that's easier than writing to Flash and that way there's no risk it'll collide with program we want to run on the target. How to run the subroutine?to run the subroutine it should suffice to set the program counter (PC) register to the start of it and resume the target |
302: Make stack painting fast again! 🇪🇺 r=Urhengulas a=Urhengulas This PR implements the first one of improvements outlined in #258. Fixes #258. ## But what is "stack painting" anyways? The idea is to write a specific byte pattern to (part of) the stack before the program is getting executed. After the program finished, either because it is done with its task, or because there was an error, we read out the previously painted area and check how much of it is still intact. If the pattern is still the same, we can be rather certain that the program didn't write to this part of the stack. This information helps to either know if there was a stack overflow, or just to measure how much of the stack was used. So far both reading and writing of the memory was done via the probe. While this works it is also rather slow, because the host and probe communicate via USB which takes time. The new approach is writing a subroutine to the MCU, which will paint the memory from within. ## Mesurements In following table you can see the measurement how much time the old and new approach take for memory from 8 to 256KiB. ![data](https://user-images.githubusercontent.com/37087391/154973187-c17e66f7-cb22-4e56-8dff-a9798ab3a39a.png) The results are pretty impressive. The new approach is about 170 times faster! ## Further work - A similar approach can also be applied to reading out the stack after the program finished. - Additionally the stack canary can be simplified quite a lot. So far we are not painting the whole stack, except the user asks for it, because this _was_ slow. Because it is fast now we can always paint all of it, which simplifies the code and removes the need for the `--measure-stack` flag. Co-authored-by: Johann Hemmann <johann.hemmann@code.berlin>
Reopening because only part of it is fixed so far. |
With
--measure-stack
, added in #254, we paint the whole area the stack could occupy with a bit pattern, and then read it back to determine the program's stack usage. This can write and read hundreds of KBs of RAM, which takes several seconds, so it would be great to speed this up.One idea for speeding this up was to essentially run
memset
on the MCU, but probe-rs does not seem to expose an API for this (if this is even possible at all, with the vendor-provided on-device algorithms).The text was updated successfully, but these errors were encountered: