Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

JIT: Optimize struct parameter register accesses in the backend #110819

Merged
merged 40 commits into from
Feb 11, 2025

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Dec 18, 2024

This PR adds an optimization in lowering to utilize the new parameter
register to local mappings added in #110795. The optimization detects IR
that is going to result in stack spills/loads and instead replaces them
with scalar locals that will be able to stay in registers.

Physical promotion benefits especially from this as it creates the kind
of IR that the optimization ends up kicking in for. The heuristics of
physical promotion are updated to account for the fact that the backend
is now able to do this optimization, making physical promotion more
likely to promote struct parameters.

This PR adds an optimization in lowering to utilize the new parameter
register to local mappings added in dotnet#110795. The optimization detects IR
that is going to result in stack spills/loads and instead replaces them
with scalar locals that will be able to stay in registers.

Physical promotion benefits especially from this as it creates the kind
of IR that the optimization ends up kicking in for. The heuristics of
physical promotion are updated to account for the fact that the backend
is now able to do this optimization, making physical promotion more
likely to promote struct parameters.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2024
@MichalPetryka
Copy link
Contributor

Does this fix #89374?

@jakobbotsch
Copy link
Member Author

Does this fix #89374?

No -- currently I'm just looking at ensuring that physical promotion is able to handle the cases that old promotion can handle. This change should only result in a small number of diffs since old promotion handles the vast majority of structs passed in registers today, but it gets us closer to removing old promotion entirely.
The representation should be flexible enough to allow us to improve cases like #89374 in the future though.

For GC refs in structs these have to be zeroed anyway, so we might as
well just spill to the original parameter as well as the new mapping.
This means we can avoid the manually inserted spill in the init block,
and we retain the property that parameters are fully defined by the
prolog.
@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch jakobbotsch force-pushed the physical-promotion-parameters-abi branch from 5a6e433 to 95d2b04 Compare February 9, 2025 21:52
@jakobbotsch jakobbotsch marked this pull request as ready for review February 10, 2025 13:03
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @kunalspathak

Diffs, diffs with old promotion disabled.

Mostly improvements, both in size and perfscore. Some regressions for a number of different reasons:

  • Physically promoting part of a struct parameter can mean that copy prop no longer kicks in which can result in new dependent promotion. E.g. for code like
void Foo(Memory<T> mem)
{
  Memory<T> local = mem;
  Bar(local);
}

some platforms will pass mem in two registers, while it has three fields. On those platforms we would not previously promote mem, but we would promote local with old promotion. If the JIT copy propagates mem into the the argument to Bar, then local can stay independently promoted. If it does not then local ends up dependently promoted. That can sometimes cause some regressions since physically promoting mem can block this copy prop.

I expect to improve this in the future by improving the support for FIELD_LIST arguments in calls to support combining fields via bitwise operations.

  • Sometimes physical promotion does not introduce the LCL_FLD uses of the parameter's field early enough for them to be optimized by the backend. For example, it introduces the LCL_FLD after a call in the first block that later gets expanded to control flow. I will improve this in a follow-up.
  • On arm32, this change often means that LSRA picks LR instead of an argument register for various operations, but LR has a larger encoding than many other registers. This seems to be the primary reason for the code size regressions (we still see a perfscore improvement). I also added a change to PUTARG_SPLIT building to improve things a little bit.
  • Keeping things in registers sometimes means we need more callee saves, which comes with extra prolog/epilog cost. We might be able to improve this in a follow-up, by avoiding the optimization for single-use LCL_FLDs if we've seen a call that will kill the register.
  • We partially promote struct parameters somewhat oftehn because we do not yet have the necessary support to unpack fields from registers via bitwise operations. Typically this can result in unnecessary stores in the prolog/epilog into the struct local.

@@ -10412,7 +10412,7 @@ JITDBGAPI void __cdecl dVN(ValueNum vn)
cVN(JitTls::GetCompiler(), vn);
}

JITDBGAPI void __cdecl dRegMask(regMaskTP mask)
JITDBGAPI void __cdecl dRegMask(const regMaskTP& mask)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was not working for me for ARM64 because regMaskTP is a struct, and struct args are not supported in VS's debugger eval.

@@ -5751,6 +5772,10 @@ void CodeGen::genFnProlog()
#else
genEnregisterOSRArgsAndLocals();
#endif
// OSR functions take no parameters in registers. Ensure no mappings
// are present.
// assert((compiler->m_paramRegLocalMappings == nullptr) || compiler->m_paramRegLocalMappings->Empty());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't recall why I commented this, will uncomment and make sure it doesn't trigger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually let me do that in a follow-up. The problem is that LSRA is expecting these mappings to exist even for OSR functions, since it gets used to pick an initial preferred register. That doesn't really make sense for OSR functions, but changing that will come with diffs, so I don't think it should be done here.

Comment on lines +7904 to +7908
if (comp->opts.OptimizationEnabled())
{
MapParameterRegisterLocals();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with changing IR after lowering has run because some nodes may have made containment decisions based on seeing a LCL_FLD of a particular size. I think the problem I was seeing was for compares on xarch.
I moved it to happen before lowering for now, but this may need to be moved back to happen after in the future (I think to fix #112138 we'll need that).

// is frequently created by physical promotion.
for (GenTree* node : LIR::AsRange(comp->fgFirstBB))
{
hasRegisterKill |= node->IsCall();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasRegisterKill

are there stores that are outside of struct parameters copying that we should not be visiting in first block? Or are the stores always guaranteed to be coming from struct params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be any other stores, but the checks below should filter the ones that aren't relevant out


if (storedToLocals.Lookup(fld->GetLclNum()))
{
// LCL_FLD does not necessarily take the value of the parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they be removed from storedToLocals eventually before the call to TryReuseLocalForParameterAccess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hhm, perhaps such locals will never be looked at from storedToLocals inside TryReuseLocalForParameterAccess().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, when we see something like

STORE_LCL_VAR V42 (LCL_FLD V00 [+8])

we are interested in validating that

  1. V00 still has the value of the parameter, which this check is checking for
  2. If we optimize V42 to be mapped directly from the parameter register, then nothing has overwritten its value when we reach this point (what the check in TryReuseLocalForParameterAccess is checking for)

LclVarDsc* argDsc = compiler->lvaGetDesc(mappedLclNum);
if (argDsc->lvTracked && !compiler->compJmpOpUsed && (argDsc->lvRefCnt() == 0) &&
!compiler->opts.compDbgCode)
JITDUMP("Arg V%02u in reg %s\n", mapping != nullptr ? mapping->LclNum : lclNum,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also include the status of isLive in the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@jakobbotsch jakobbotsch merged commit 8a22b87 into dotnet:main Feb 11, 2025
111 of 114 checks passed
@jakobbotsch jakobbotsch deleted the physical-promotion-parameters-abi branch February 11, 2025 09:32
grendello added a commit to grendello/runtime that referenced this pull request Feb 12, 2025
* main:
  [Android] Run CoreCLR functional tests on Android (dotnet#112283)
  [LoongArch64] Fix some assertion failures for Debug ILC building Debug NativeAOT testcases. (dotnet#112229)
  Fix suspicious code fragments (dotnet#112384)
  `__ComObject` doesn't support dynamic interface map (dotnet#112375)
  Native DLLs: only load imported DLLs from System32 (dotnet#112359)
  [main] Update dependencies from dotnet/roslyn (dotnet#112314)
  Update SVE instructions that writes to GC regs (dotnet#112389)
  Bring up android+coreclr windows build.  (dotnet#112256)
  Never use heap for return buffers (dotnet#112060)
  Wait to complete the test before releasing the agile reference. (dotnet#112387)
  Prevent returning disposed HTTP/1.1 connections to the pool (dotnet#112383)
  Fingerprint dotnet.js if writing import map to html is enabled (dotnet#112407)
  Remove duplicate definition of CORECLR_HOSTING_API_LINKAGE (dotnet#112096)
  Update the exception message to reflect current behavior. (dotnet#112355)
  Use enum for frametype not v table (dotnet#112166)
  Enable AltJits build for LoongArch64 and RiscV64 (dotnet#110282)
  Guard members of MonoType union & fix related bugs (dotnet#111645)
  Add optional hooks for debugging OpenSSL memory allocations (dotnet#111539)
  JIT: Optimize struct parameter register accesses in the backend (dotnet#110819)
  NativeAOT: Cover more opcodes in type preinitializer (dotnet#112073)
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants