-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Micro-optimize the __morestack fast path #3565
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
When all does There has also been a bunch of discussion about possibly ditching segmented stacks? |
It's added to every single function, and LLVM does accounting of stack space and growth for us through our |
visiting for triage, email from 2013-09-09 Right now split-stacks are turned off since they are not supported in the newrt. But I imagine most/all of the suggestions above could be applicable in the next implementation, unless we switch to an entirely new strategy (like using guard pages as suggested by thestinger) |
In today's meeting we have decided to jettison segmented stacks. |
We only use |
* Implement Serialize on IgnoreList * Add a test for rust-lang#3536
re-organize libc tests And share some more things across unices
This is very performance critical code used for growing the stack, and it currently wastes a lot of instructions on the non-allocating fast path. There are a number of distinct optimizations we can identify.
Here's what happens after calling into
__morestack
, on the fast pathupcall_new_stack
clobbers them__morestack
custom calling convention registers to the C calling convention registers used byupcall_new_stack
upcall_new_stack
, through the indirection of the dynamic linkerget_sp_limit
, an entire assembly function consisting ofmovq %fs:112, %rax
sp_limit
to 0 and don't branch to therust_get_current_task
slow path. This branch always makes the same decision during a__morestack
call.task
pointer from the stack limittask->stk->next
is a big enough stack segment to usereuse_valgrind_stack
to give valgrind hintsrecord_stack_limit
to execute another single instruction__morestack
And returning from the segment:
upcall_del_stack
through the dynamic linkerget_sp_limit
, an entire function consisting ofmovq %fs:112, %rax
sp_limit
to 0, etc.record_stack_limit
Potential optimizations:
get_sp_limit
,record_stack_limit
(Inline get_sp_limit, set_sp_limit, get_sp runtime functions #2521)upcall_new_stack
andupcall_del_stack
, hitting new dynamically linked upcalls for the slow pathrust_get_current_task
that doesn't have a fallback path for the case when the task pointer can't be retrieved from the stack segment. Use it from upcall_new_stack/del_stack.upcall_new_stack
doesn't use xmm registers and remove the xmm saves and restores in__morestack
Stop saving floating point registers in __morestack #2043upcall_del_stack
into__morestack
The text was updated successfully, but these errors were encountered: