-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
JIT: Expand runtime lookups in a late phase #81635
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsCloses #35551 and is a preparation for #81432 The main advantage of the late expansion is that we will be able to do CSE/Hoisting for such helpers, e.g.: static void Test<T>()
{
for (int i = 0; i < 100; i++)
Callee<T>();
} Current codegen: ; Assembly listing for method Prog:Test[System.__Canon]()
57 push rdi
56 push rsi
53 push rbx
4883EC30 sub rsp, 48
48894C2428 mov qword ptr [rsp+28H], rcx
488BF1 mov rsi, rcx
33FF xor edi, edi
488B5E10 mov rbx, qword ptr [rsi+10H]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop start
G_M000_IG03:
488B4B10 mov rcx, qword ptr [rbx+10H]
4885C9 test rcx, rcx
7402 je SHORT G_M000_IG05
EB15 jmp SHORT G_M000_IG06
G_M000_IG05:
488BCE mov rcx, rsi
48BAB80C584EFC7F0000 mov rdx, 0x7FFC4E580CB8
E87E36A95F call CORINFO_HELP_RUNTIMEHANDLE_METHOD
488BC8 mov rcx, rax
G_M000_IG06:
FF159D932800 call [Prog:Callee[System.__Canon]():bool]
FFC7 inc edi
83FF64 cmp edi, 100
7CD3 jl SHORT G_M000_IG03
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop end
4883C430 add rsp, 48
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
; Total bytes of code 74 We were not able to hoist the runtime lookup stuff out of loop while this PR fixes that.
|
This comment was marked as outdated.
This comment was marked as outdated.
Right, they are not pure. |
They might not be "pure" for a very strict sense of pure, but like the comment says, can they not be optimized as such from the JIT's perspective? I.e. they are idempotent. |
Well, there are (were?) issues like #40298. I vaguely recall some other problem but haven't pinned it down yet. |
Hmm. Maybe the other thing was just making sure we don't mark some of those indirs in the early expansions as invariant. |
The dictionary lookup as a whole is pure. The helper call to populate the dictionary slot lazily is not pure (ie it has observable side-effects). |
Fun fact about this PR - non-inlined runtime lookups lead to test timeouts. Initially I suspected some freezes/crashes but it turned out the code is just slow 😐 E.g. |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Anything else? New diffs - TP is even better than previously for Tier0, thanks @jakobbotsch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few questions....
BasicBlockFlags originalFlags = block->bbFlags; | ||
BasicBlock* prevBb = block; | ||
|
||
if (stmt == block->firstStmt()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checking for the case where gtSplitTree
didn't do anything?
Perhaps deserves a comment to that effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment in a follow up if you don't mind to avoid re-running CI for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really about gtSplitTree
? Isn't it here because we don't have a fgSplitBlockBefore
?
gtSplitTree
can make changes even without introducing new statements -- the return value needs to be used for that kind of check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's here just because I need basically fgSplitBlockBefore
and only have fgSplitBlockAfter
API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when it's not the first statement in the current block I do fgSplitBlockAfter (stmt->PrevStmt())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I should look at what gtSplitTree
does, but I'm still confused exactly what this method is supposed to be doing. Maybe an example would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I should look at what
gtSplitTree
does, but I'm still confused exactly what this method is supposed to be doing. Maybe an example would help?
The general idea that we want to split a block (e.g. BB0
) into two (say, BBa
and BBb
) at a specific node (e.g. callX
) and make sure that all side-effects are moved to BBa
and the actual callX
is now in BBb
.
We have a phase where we insert GC safe points after specific call
nodes - that one didn't have to care about any execution ordering since we just wanted to make sure GC is polled (it doesn't even matter whether to emit the poll before or after the calls). The runtime lookup case is a lot more complicated since we needed to insert verbose tree in front of the call and re-use its arguments, respect all kinds of complex COMMAs,
|
||
assert((lastIndOfTree != nullptr) && (pRuntimeLookup->indirections > 0)); | ||
impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("bubbling QMark0")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit to deferring expansion of the rest of this in a manner similar to the case above? Seems like this is also creating a hoistable/cseable complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't touch this path because it wasn't clear for me when it's hit. From what I see it's not used by R2R and NAOT. Perhaps some dynamic context
In a follow I'll see if it's worth supporting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testForFixup
was used by fragile NGen. It is not used anymore. I think it is fine to delete it - both on the JIT/EE interface and the supporting code in the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testForFixup
was used by fragile NGen. It is not used anymore. I think it is fine to delete it - both on the JIT/EE interface and the supporting code in the JIT.
Thanks, will delete in #83430
Closes #35551 and is a preparation for #81432
The main advantage of the late expansion is that we will be able to do CSE/Hoisting for such helpers, e.g.:
Current codegen:
We were not able to hoist the runtime lookup stuff out of loop while this PR fixes that.
New codegen:
The call is correctly hoisted
This is also required to do proper fix for @jkotas's suggestion in #81432 (comment)
Quite nice diffs