-
-
Notifications
You must be signed in to change notification settings - Fork 75
Add support for measuring the program's stack usage #254
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I left some inline comments with nits around a data structure and the number representation.
Could you please open a few follow-up perf investigation issue? one for 'binary search read' and one for 'speed up canary write by having MCU do the memset'
Other than that, feel free to send this to bors 🚀
} | ||
let address = *stack_range.start(); | ||
let canary = vec![CANARY_VALUE; size]; | ||
core.write_8(address, &canary)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this ahh, nvm I see that the write_8
be inside the if measure_stack
guard?size
value depends on measure_stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need to be done in this PR but it would be good add e.g. trace
level logs that report the time it takes to write the canary, read the canary and write the program to flash. Just so that we have some measurements in place for when we look into optimizing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other thing that I noticed was that number of bytes reported was an odd number. now I see that CANARY_VALUE
is a byte. I was kind of expecting a word since the stack pointer is usually moved 4 or 8 bytes at a time. I don't think a 1-byte canary value is a problem right now given that we are scanning the stack linearly but if we switch to binary search it may result in false positives so the sampling done in the binary search should be several bytes (maybe 24 bytes at a time or so)
if let Some(pos) = canary.iter().position(|b| *b != CANARY_VALUE) { | ||
let touched_address = self.address + pos as u32; | ||
log::debug!("canary was touched at {:#010X}", touched_address); | ||
let min_stack_usage = match canary.iter().position(|b| *b != CANARY_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need to be addressed in this PR but this could be sped up (+) by doing a binary search in the measure_stack
case. Though the above read_8
will eat most of runtime so, instead of reading the whole self.size
at once, this part should sample the stack at sizeof(CANARY_VALUE)
chunks at a time and use that to perform a binary search.
(+) before doing any hypothetical perf improvement let's measure how long this takes vs how long the write takes. it may be that this is negligible compared to the write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the binary search optimization is going to work correctly, since the program won't touch every word in the used area of the stack, so the search might run into a CANARY_VALUE
and incorrectly skip used areas of the stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree a 1-byte look-up is error prone hence binary search should read a bigger chunk at a time, see #254 (comment) . Or maybe instead of considering a single partition point to decide if one half is used or not we could use 2 or more partition points ("add more depth to the binary partition tree"). Hard to explain without a diagram; will think more about it and file an issue if what I'm thinking of even makes sense.
bors r+ |
Build succeeded: |
This cleans up the stack canary code a bit, and adds a
--measure-stack
command that will paint the entire available stack region with a known bit pattern, and then report how much of the available stack space was used by the program.Closes #188