-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
JIT: Optimize struct parameter register accesses in the backend #110819
JIT: Optimize struct parameter register accesses in the backend #110819
Conversation
This PR adds an optimization in lowering to utilize the new parameter register to local mappings added in dotnet#110795. The optimization detects IR that is going to result in stack spills/loads and instead replaces them with scalar locals that will be able to stay in registers. Physical promotion benefits especially from this as it creates the kind of IR that the optimization ends up kicking in for. The heuristics of physical promotion are updated to account for the fact that the backend is now able to do this optimization, making physical promotion more likely to promote struct parameters.
Does this fix #89374? |
No -- currently I'm just looking at ensuring that physical promotion is able to handle the cases that old promotion can handle. This change should only result in a small number of diffs since old promotion handles the vast majority of structs passed in registers today, but it gets us closer to removing old promotion entirely. |
For GC refs in structs these have to be zeroed anyway, so we might as well just spill to the original parameter as well as the new mapping. This means we can avoid the manually inserted spill in the init block, and we retain the property that parameters are fully defined by the prolog.
If they are DNER they definitely will not stay in a register
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
5a6e433
to
95d2b04
Compare
cc @dotnet/jit-contrib PTAL @kunalspathak Diffs, diffs with old promotion disabled. Mostly improvements, both in size and perfscore. Some regressions for a number of different reasons:
void Foo(Memory<T> mem)
{
Memory<T> local = mem;
Bar(local);
} some platforms will pass I expect to improve this in the future by improving the support for
|
@@ -10412,7 +10412,7 @@ JITDBGAPI void __cdecl dVN(ValueNum vn) | |||
cVN(JitTls::GetCompiler(), vn); | |||
} | |||
|
|||
JITDBGAPI void __cdecl dRegMask(regMaskTP mask) | |||
JITDBGAPI void __cdecl dRegMask(const regMaskTP& mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was not working for me for ARM64 because regMaskTP
is a struct, and struct args are not supported in VS's debugger eval.
@@ -5751,6 +5772,10 @@ void CodeGen::genFnProlog() | |||
#else | |||
genEnregisterOSRArgsAndLocals(); | |||
#endif | |||
// OSR functions take no parameters in registers. Ensure no mappings | |||
// are present. | |||
// assert((compiler->m_paramRegLocalMappings == nullptr) || compiler->m_paramRegLocalMappings->Empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't recall why I commented this, will uncomment and make sure it doesn't trigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually let me do that in a follow-up. The problem is that LSRA is expecting these mappings to exist even for OSR functions, since it gets used to pick an initial preferred register. That doesn't really make sense for OSR functions, but changing that will come with diffs, so I don't think it should be done here.
if (comp->opts.OptimizationEnabled()) | ||
{ | ||
MapParameterRegisterLocals(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem with changing IR after lowering has run because some nodes may have made containment decisions based on seeing a LCL_FLD
of a particular size. I think the problem I was seeing was for compares on xarch.
I moved it to happen before lowering for now, but this may need to be moved back to happen after in the future (I think to fix #112138 we'll need that).
// is frequently created by physical promotion. | ||
for (GenTree* node : LIR::AsRange(comp->fgFirstBB)) | ||
{ | ||
hasRegisterKill |= node->IsCall(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be any other stores, but the checks below should filter the ones that aren't relevant out
|
||
if (storedToLocals.Lookup(fld->GetLclNum())) | ||
{ | ||
// LCL_FLD does not necessarily take the value of the parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should they be removed from storedToLocals
eventually before the call to TryReuseLocalForParameterAccess
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, perhaps such locals will never be looked at from storedToLocals
inside TryReuseLocalForParameterAccess()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, when we see something like
STORE_LCL_VAR V42 (LCL_FLD V00 [+8])
we are interested in validating that
V00
still has the value of the parameter, which this check is checking for- If we optimize
V42
to be mapped directly from the parameter register, then nothing has overwritten its value when we reach this point (what the check inTryReuseLocalForParameterAccess
is checking for)
src/coreclr/jit/lsrabuild.cpp
Outdated
LclVarDsc* argDsc = compiler->lvaGetDesc(mappedLclNum); | ||
if (argDsc->lvTracked && !compiler->compJmpOpUsed && (argDsc->lvRefCnt() == 0) && | ||
!compiler->opts.compDbgCode) | ||
JITDUMP("Arg V%02u in reg %s\n", mapping != nullptr ? mapping->LclNum : lclNum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also include the status of isLive
in the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* main: [Android] Run CoreCLR functional tests on Android (dotnet#112283) [LoongArch64] Fix some assertion failures for Debug ILC building Debug NativeAOT testcases. (dotnet#112229) Fix suspicious code fragments (dotnet#112384) `__ComObject` doesn't support dynamic interface map (dotnet#112375) Native DLLs: only load imported DLLs from System32 (dotnet#112359) [main] Update dependencies from dotnet/roslyn (dotnet#112314) Update SVE instructions that writes to GC regs (dotnet#112389) Bring up android+coreclr windows build. (dotnet#112256) Never use heap for return buffers (dotnet#112060) Wait to complete the test before releasing the agile reference. (dotnet#112387) Prevent returning disposed HTTP/1.1 connections to the pool (dotnet#112383) Fingerprint dotnet.js if writing import map to html is enabled (dotnet#112407) Remove duplicate definition of CORECLR_HOSTING_API_LINKAGE (dotnet#112096) Update the exception message to reflect current behavior. (dotnet#112355) Use enum for frametype not v table (dotnet#112166) Enable AltJits build for LoongArch64 and RiscV64 (dotnet#110282) Guard members of MonoType union & fix related bugs (dotnet#111645) Add optional hooks for debugging OpenSSL memory allocations (dotnet#111539) JIT: Optimize struct parameter register accesses in the backend (dotnet#110819) NativeAOT: Cover more opcodes in type preinitializer (dotnet#112073)
This PR adds an optimization in lowering to utilize the new parameter
register to local mappings added in #110795. The optimization detects IR
that is going to result in stack spills/loads and instead replaces them
with scalar locals that will be able to stay in registers.
Physical promotion benefits especially from this as it creates the kind
of IR that the optimization ends up kicking in for. The heuristics of
physical promotion are updated to account for the fact that the backend
is now able to do this optimization, making physical promotion more
likely to promote struct parameters.