-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Remove JitGenericHandleCache #106843
Remove JitGenericHandleCache #106843
Conversation
davidwrighton
commented
Aug 22, 2024
•
edited
Loading
edited
- It was only used for overflow scenarios from the generic dictionary (which don't happen), virtual resolution scenarios for creating delegates, and a few other rare R2R scenarios
- Replace the virtual resolution scenarios with a cache of the affected data in managed code, and move the helpers to managed
- Use the GenericCache type which was previously only used on NativeAOT for Generic Virtual method resolution paths.
- Just remove the pointless checks from within the various normal generic lookup paths, and since they no longer do anything interesting in their hot paths except erect a helper method frame and call a worker routine, move those jit helpers to managed as well.
Tagging subscribers to this area: @mangod9 |
private static unsafe IntPtr VirtualFunctionPointerSlowpath(object obj, IntPtr classHandle, IntPtr methodHandle) | ||
{ | ||
IntPtr result = JIT_ResolveVirtualFunctionPointer(ObjectHandleOnStack.Create(ref obj), classHandle, methodHandle); | ||
lock(s_virtualFunctionPointerCache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular locks can call back into user code through SynchronizationManager. It would be better for this to use Lock with useTrivialWaits=true
.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
6bd1b55
to
7de3762
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
- It was only used for overflow scenarios from the generic dictionary (which don't happen), virtual resolution scenarios for creating delegates, and a few other rare R2R scenarios - Replace the virtual resolution scenarios with a cache of the affected data in managed code, and move the helpers to managed - Just remove the pointless checks from within the various normal generic lookup paths Swap to using GenericCache - Make FlushCurrentCache public on GenericCache, as we need to clear this when we clean up a LoaderAllocator - Tweak algorithm for computing out of bound slots on the DictionaryLayout Update crsttypes
f774050
to
a752b1b
Compare
@EgorBot -intel -arm64 using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
namespace VirtualDispatchChecks
{
class _1
{
public virtual string Method() => "A";
}
class _2 : _1
{
public override string Method() => "B";
}
class _3 : _1
{
public override string Method() => "C";
}
class _4 : _1
{
public override string Method() => "D";
}
class _5 : _1
{
public override string Method() => "E";
}
class _6 : _1
{
public override string Method() => "F";
}
class _7 : _1
{
public override string Method() => "F";
}
class _8 : _1
{
public override string Method() => "F";
}
class _9 : _1
{
public override string Method() => "F";
}
class _10 : _1
{
public override string Method() => "F";
}
public class ResolveVirtualRepeatedly
{
static _1 _1 = new _1();
static _1 _2 = new _2();
static _1 _3 = new _3();
static _1 _4 = new _4();
static _1 _5 = new _5();
static _1 _6 = new _6();
static _1 _7 = new _7();
static _1 _8 = new _8();
static _1 _9 = new _9();
static _1 _10 = new _10();
[MethodImpl(MethodImplOptions.NoInlining)]
static _1 GetA(int index)
{
switch (index % 10)
{
case 0:
return _10;
case 1: return _1;
case 2: return _2;
case 3: return _3;
case 4: return _4;
case 5: return _5;
case 6: return _6;
case 7: return _7;
case 8: return _8;
case 9: return _9;
}
return null!;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Func<string> GetAction(int index)
{
return GetA(index).Method;
}
[Benchmark]
public void Get1Virtual()
{
GetAction(0);
}
[Benchmark]
public void Get10Virtual()
{
GetAction(0);
GetAction(1);
GetAction(2);
GetAction(3);
GetAction(4);
GetAction(5);
GetAction(6);
GetAction(7);
GetAction(8);
GetAction(9);
}
}
} |
Benchmark results on Intel
|
Benchmark results on Arm64
|
PS: I have fixed the patch applying so commit squashing is no longer needed. Also, you can use @EgorBot -arm64 -perf using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
namespace VirtualDispatchChecks
{
class _1
{
public virtual string Method() => "A";
}
class _2 : _1
{
public override string Method() => "B";
}
class _3 : _1
{
public override string Method() => "C";
}
class _4 : _1
{
public override string Method() => "D";
}
class _5 : _1
{
public override string Method() => "E";
}
class _6 : _1
{
public override string Method() => "F";
}
class _7 : _1
{
public override string Method() => "F";
}
class _8 : _1
{
public override string Method() => "F";
}
class _9 : _1
{
public override string Method() => "F";
}
class _10 : _1
{
public override string Method() => "F";
}
public class ResolveVirtualRepeatedly
{
static _1 _1 = new _1();
static _1 _2 = new _2();
static _1 _3 = new _3();
static _1 _4 = new _4();
static _1 _5 = new _5();
static _1 _6 = new _6();
static _1 _7 = new _7();
static _1 _8 = new _8();
static _1 _9 = new _9();
static _1 _10 = new _10();
[MethodImpl(MethodImplOptions.NoInlining)]
static _1 GetA(int index)
{
switch (index % 10)
{
case 0:
return _10;
case 1: return _1;
case 2: return _2;
case 3: return _3;
case 4: return _4;
case 5: return _5;
case 6: return _6;
case 7: return _7;
case 8: return _8;
case 9: return _9;
}
return null!;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Func<string> GetAction(int index)
{
return GetA(index).Method;
}
[Benchmark]
public void Get10Virtual()
{
GetAction(0);
GetAction(1);
GetAction(2);
GetAction(3);
GetAction(4);
GetAction(5);
GetAction(6);
GetAction(7);
GetAction(8);
GetAction(9);
}
}
} |
Benchmark results on Arm64
Flame graphs: Main vs PR 🔥 For clean |
An analysis of these results indicates that
|
This comment was marked as resolved.
This comment was marked as resolved.
@@ -34,6 +34,8 @@ internal sealed partial class LoaderAllocatorScout | |||
if (m_nativeLoaderAllocator == IntPtr.Zero) | |||
return; | |||
|
|||
VirtualDispatchHelpers.ClearCache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a subtle race condition:
- the loader allocator is still in use when
VirtualDispatchHelpers.ClearCache
run - the thread is rescheduled right
VirtualDispatchHelpers.ClearCache
- the loader allocator becomes unreferenced
- Destroy succeeds
I think we will end up with stale entries in the hashtable.
I am wondering whether this is the reason for the casting cache flush to be done while the runtime is suspended: https://github.com/dotnet/runtime/pull/106843/files#diff-060f68aac95f40dd380ee5a82dd047222232b25646f908c1ea394438e18a8709R604
…parameters for function pointer resolution - This will hopefully catch up performance with the previous implementation, and indicate whether such a change is worthwhile Move cache clearing to native code to address concern from Jan around LoaderAllocator cleanup
@EgorBot -intel -arm64 using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
namespace VirtualDispatchChecks
{
class _1
{
public virtual string Method() => "A";
}
class _2 : _1
{
public override string Method() => "B";
}
class _3 : _1
{
public override string Method() => "C";
}
class _4 : _1
{
public override string Method() => "D";
}
class _5 : _1
{
public override string Method() => "E";
}
class _6 : _1
{
public override string Method() => "F";
}
class _7 : _1
{
public override string Method() => "F";
}
class _8 : _1
{
public override string Method() => "F";
}
class _9 : _1
{
public override string Method() => "F";
}
class _10 : _1
{
public override string Method() => "F";
}
public class ResolveVirtualRepeatedly
{
static _1 _1 = new _1();
static _1 _2 = new _2();
static _1 _3 = new _3();
static _1 _4 = new _4();
static _1 _5 = new _5();
static _1 _6 = new _6();
static _1 _7 = new _7();
static _1 _8 = new _8();
static _1 _9 = new _9();
static _1 _10 = new _10();
[MethodImpl(MethodImplOptions.NoInlining)]
static _1 GetA(int index)
{
switch (index % 10)
{
case 0:
return _10;
case 1: return _1;
case 2: return _2;
case 3: return _3;
case 4: return _4;
case 5: return _5;
case 6: return _6;
case 7: return _7;
case 8: return _8;
case 9: return _9;
}
return null!;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Func<string> GetAction(int index)
{
return GetA(index).Method;
}
[Benchmark]
public void Get1Virtual()
{
GetAction(0);
}
[Benchmark]
public void Get10Virtual()
{
GetAction(0);
GetAction(1);
GetAction(2);
GetAction(3);
GetAction(4);
GetAction(5);
GetAction(6);
GetAction(7);
GetAction(8);
GetAction(9);
}
}
} |
…ion for creating delegates to virtual methods. If we need to bring it back, its all isolated to this one commit
Make the ordering of parameters correct for JIT_GenericHandleWorker
…time into remove_jitgenericcache
I am not able to reproduce this regression on Dev Box Amd. The PR is ~7% faster compared to baseline on that machine. |
|
||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
[DebuggerHidden] | ||
private static unsafe IntPtr VirtualFunctionPointerSlowpath(object obj, IntPtr classHandle, IntPtr methodHandle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static unsafe IntPtr VirtualFunctionPointerSlowpath(object obj, IntPtr classHandle, IntPtr methodHandle) | |
private static unsafe IntPtr VirtualFunctionPointerSlow(object obj, IntPtr classHandle, IntPtr methodHandle) |
Nit: It is the common naming convention.
(Alternatively, path
should be Path
.)
src/coreclr/vm/qcallentrypoints.cpp
Outdated
@@ -476,6 +477,8 @@ static const Entry s_QCall[] = | |||
DllImportEntry(EHEnumNext) | |||
DllImportEntry(AppendExceptionStackFrame) | |||
#endif // FEATURE_EH_FUNCLETS | |||
DllImportEntry(JIT_ResolveVirtualFunctionPointer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would drop the JIT_
prefix. It is not used with any JIT helper that was converted to managed.
public int HashCode; | ||
public MethodTable* MethodTable; | ||
public IntPtr ClassHandleTargetMethod; | ||
public IntPtr MethodHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public int HashCode; | |
public MethodTable* MethodTable; | |
public IntPtr ClassHandleTargetMethod; | |
public IntPtr MethodHandle; | |
public int _hashCode; | |
public MethodTable* _methodTable; | |
public IntPtr _classHandleTargetMethod; | |
public IntPtr _methodHandle; |
Use the BCL naming convention? Also, the BCL convention is to have fields first in the type.
private static unsafe IntPtr VirtualFunctionPointerSlowpath(object obj, IntPtr classHandle, IntPtr methodHandle) | ||
{ | ||
IntPtr result = JIT_ResolveVirtualFunctionPointer(ObjectHandleOnStack.Create(ref obj), classHandle, methodHandle); | ||
s_virtualFunctionPointerCache.TrySet(new VirtualResolutionData(RuntimeHelpers.GetMethodTable(obj), classHandle, methodHandle), result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a bit suspect that we are passing obj
into the resolve worker, but then only use RuntimeHelpers.GetMethodTable(obj)
as the cache key. Do we have correct behavior for dynamic castable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NativeAOT follows this rule around not caching results (for interface dispatch, but appears to have the same problem for virtual function resolution) CoreCLR does not appear to me to have logic in its dispatch behavior which avoids caching. Possibly we intended to build it, but I'm pretty sure we didn't. I can build a test case to validate my expectation if you'd like. The performance of the existing IDynamicInterfaceCastable implementation would likely be highly impacted if we change the rules here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of this is a useable model, although it's not well defined in behavior. The result is that if any type uses IDynamicInterfaceCastable
... the implementor of IDynamicInterfaceCastable.GetInterfaceImplementaiton
really should provide an idempotent implementation for any case where it will successfully return an implementation, or they may have surprising behavior. However, we DO have the correct cache avoidance at cast time. This means that users of types which use IDynamicInterfaceCastable
experience the expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NativeAOT follows this rule around not caching results (for interface dispatch
We may want to fix NativeAOT to match CoreCLR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've opened #107999 to track this issue
src/coreclr/vm/jithelpers.cpp
Outdated
{ | ||
// We can't use GetCurrentStaticAddress, as that may throw, since it will attempt to | ||
// allocate memory for statics if that hasn't happened yet. But, since we force the | ||
// statics memory to be allocated before initializing g_VirtualCache or g_hVirtualCache2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// statics memory to be allocated before initializing g_VirtualCache or g_hVirtualCache2 | |
// statics memory to be allocated before initializing g_VirtualCache |
src/coreclr/vm/jithelpers.cpp
Outdated
_ASSERTE(!staticTH.IsCanonicalSubtype()); | ||
} | ||
} | ||
|
||
pStaticMD->CheckRestore(); | ||
|
||
// MDIL: If IL specifies callvirt/ldvirtftn it remains a "virtual" instruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment & code be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're supposed to support this sort of adjustment as part of ReadyToRun (since we support converting a non-virtual method into a virtual method); however, I just checked, and we don't actually have testing for the case where we construct a delegate from the target method, and the behavior at runtime is incorrect in this scenario. (We do have tests for actual dispatch conversion into a virtual method.) I intend to update the comment here, but leave the logic in place, and file a bug about the R2R behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh. Nevermind my previous analysis.
- Converting from a non-virtual to a virtual in ReadyToRun is not a safe change from the point of view of ReadyToRun. See https://learn.microsoft.com/en-us/dotnet/core/compatibility/library-change-rules for the rules that we actively support, so an affirmative test in C# should not be written.
- However, we still need to ensure that code doesn't fail in surprising ways with ReadyToRun, so we still need this logic if a customer converts from Virtual to Non-Virtual, which is ALSO a breaking change, but we can hit it if a customer fails to follow all of the rules in the above document. Keeping the existing behavior is done so that failure modes of ReadyToRun generated code are still broadly similar to the failure modes of non ReadyToRun code.
src/coreclr/vm/jithelpers.cpp
Outdated
// Factored out most of the body of JIT_GenericHandle so it could be called easily from the CER reliability code to pre-populate the | ||
// cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Factored out most of the body of JIT_GenericHandle so it could be called easily from the CER reliability code to pre-populate the | |
// cache. |
src/coreclr/vm/jithelpers.cpp
Outdated
HashDatum res; | ||
if (g_pJitGenericHandleCache->GetValueSpeculative(&key, &res)) | ||
return (CORINFO_GENERIC_HANDLE)(DictionaryEntry)res; | ||
Volatile<FieldDesc*> g_VirtualCache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Volatile<FieldDesc*> g_VirtualCache; | |
Volatile<FieldDesc*> g_pVirtualFunctionPointerCache; |
} | ||
public int HashCode; | ||
public MethodTable* MethodTable; | ||
public IntPtr ClassHandleTargetMethod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public IntPtr ClassHandleTargetMethod; | |
public IntPtr ClassHandle; |
JIT_ResolveVirtualFunctionPointer
and most other places use classHandle
. It would be nice to use the same name throughout.
If you would like to add Target
to the name to disambiguate it from MethodTable, I would add it to both method handle and class handle args to make it clear that they go together, like _targetMethodHandle
and _targetMethodClassHandle
.
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public bool Equals(VirtualResolutionData other) => | ||
HashCode == other.HashCode && | ||
MethodTable == other.MethodTable && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashCode should filter out nearly all mismatches, so the rest of the method can assume that the equality check is most likely going to return true. It may help on some architectures to turn the rest of the method to just one conditional branch, something like:
public bool Equals(VirtualResolutionData other) =>
HashCode == other.HashCode &&
(((nint)MethodTable - (nint)other.MethodTable) |
(ClassHandleTargetMethod - other.ClassHandleTargetMethod) |
(MethodHandle - other.MethodHandle)) == 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
- It was only used for overflow scenarios from the generic dictionary (which don't happen), virtual resolution scenarios for creating delegates, and a few other rare R2R scenarios - Replace the virtual resolution scenarios with a cache of the affected data in managed code, and move the helpers to managed - Use the GenericCache type which was previously only used on NativeAOT for Generic Virtual method resolution paths. - Just remove the pointless checks from within the various normal generic lookup paths, and since they no longer do anything interesting in their hot paths except erect a helper method frame and call a worker routine, move those jit helpers to managed as well.