-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[mono] Check for Vector{64,128,256}<T>
element type validity before emitting unchecked intrinsic IR for AsByte
etc.
#53707
Conversation
… emitting unchecked intrinsic IR for `AsByte` etc.
@@ -560,8 +559,12 @@ emit_sri_vector (MonoCompile *cfg, MonoMethod *cmethod, MonoMethodSignature *fsi | |||
case SN_AsSingle: | |||
case SN_AsUInt16: | |||
case SN_AsUInt32: | |||
case SN_AsUInt64: | |||
case SN_AsUInt64: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the relevant HWIntrinsic tests covering this still disabled for Mono?
I would've expected one or more of the tests under https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/General/NotSupported to have caught this. https://github.com/dotnet/runtime/blob/main/src/tests/JIT/HardwareIntrinsics/General/NotSupported/Vector128BooleanAsGeneric_UInt64.cs is the one covering the specific example from the related issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They've been enabled since #44930. These tests pass because we currently implement operator ==
for Type
via an intrinsic. This, combined with the fact that we currently link Mono with LLVM built with assertions disabled, means that this:
define dso_local monocc i128 @"System.Runtime.Intrinsics.Vector128:WithLower<bool> (System.Runtime.Intrinsics.Vector128`1<bool>,System.Runtime.Intrinsics.Vector64`1<bool>)"([2 x i64] %arg_vector, [1 x i64] %arg_value) #0 gc "coreclr" {
BB0:
%0 = alloca %"System.Runtime.Intrinsics.Vector64`1<bool>", align 8
%1 = alloca %"System.Runtime.Intrinsics.Vector128`1<bool>", align 8
%2 = alloca %"System.Runtime.Intrinsics.Vector128`1<bool>", align 8
%3 = alloca %"System.Runtime.Intrinsics.Vector128`1<bool>", align 8
%4 = alloca <2 x i64>, align 16
%5 = alloca %"System.Runtime.Intrinsics.Vector64`1<bool>", align 8
%6 = alloca %"System.Runtime.Intrinsics.Vector128`1<bool>", align 8
%7 = alloca %"System.Runtime.Intrinsics.Vector128`1<bool>", align 8
%8 = bitcast %"System.Runtime.Intrinsics.Vector128`1<bool>"* %1 to [2 x i64]*
store [2 x i64] %arg_vector, [2 x i64]* %8
%9 = bitcast %"System.Runtime.Intrinsics.Vector64`1<bool>"* %0 to [1 x i64]*
store [1 x i64] %arg_value, [1 x i64]* %9
br label %BB3
BB3: ; preds = %BB0
br label %BB2
BB2: ; preds = %BB3
br i1 false, label %BB4, label %BB5
BB4: ; preds = %BB2
%10 = load i128 ([2 x i64], [1 x i64])*, i128 ([2 x i64], [1 x i64])** @"[tramp_4428] System.Runtime.Intrinsics.Vector128:<WithLower>g__SoftwareFallback|73_0<bool> (System.Runtime.Intrinsics.Vector128`1<bool>,System.Runtime.Intrinsics.Vector64`1<bool>)"
%11 = bitcast %"System.Runtime.Intrinsics.Vector128`1<bool>"* %1 to [2 x i64]*
%12 = load [2 x i64], [2 x i64]* %11
%13 = bitcast %"System.Runtime.Intrinsics.Vector64`1<bool>"* %0 to [1 x i64]*
%14 = load [1 x i64], [1 x i64]* %13
%15 = notail call monocc i128 %10([2 x i64] %12, [1 x i64] %14), !managed_name !0
%16 = bitcast %"System.Runtime.Intrinsics.Vector128`1<bool>"* %2 to i128*
store i128 %15, i128* %16
br label %BB1
BB5: ; preds = %BB2
%17 = bitcast [2 x i64] %arg_vector to <2 x i64>
%18 = bitcast [1 x i64] %arg_value to <1 x i64>
%extract = extractelement <1 x i64> %18, i32 0
%xinsert = insertelement <2 x i64> %17, i64 %extract, i32 0
store <2 x i64> %xinsert, <2 x i64>* %4
br label %BB9
BB1: ; preds = %BB16, %BB4
%19 = phi %"System.Runtime.Intrinsics.Vector128`1<bool>"* [ %3, %BB16 ], [ %2, %BB4 ]
%20 = bitcast %"System.Runtime.Intrinsics.Vector128`1<bool>"* %19 to i128*
%21 = load i128, i128* %20
ret i128 %21
BB9: ; preds = %BB5
br label %BB14
BB14: ; preds = %BB9
br i1 false, label %BB16, label %BB17
BB16: ; preds = %BB25, %BB24, %BB23, %BB22, %BB21, %BB20, %BB19, %BB18, %BB17, %BB14
%22 = ptrtoint <2 x i64>* %4 to i64
%23 = add i64 %22, 0
%24 = inttoptr i64 %23 to i8*
%25 = bitcast %"System.Runtime.Intrinsics.Vector128`1<bool>"* %3 to i8*
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %25, i8* %24, i32 16, i1 false)
br label %BB1
BB17: ; preds = %BB14
br i1 false, label %BB16, label %BB18
BB18: ; preds = %BB17
br i1 false, label %BB16, label %BB19
BB19: ; preds = %BB18
br i1 false, label %BB16, label %BB20
BB20: ; preds = %BB19
br i1 false, label %BB16, label %BB21
BB21: ; preds = %BB20
br i1 false, label %BB16, label %BB22
BB22: ; preds = %BB21
br i1 false, label %BB16, label %BB23
BB23: ; preds = %BB22
br i1 false, label %BB16, label %BB24
BB24: ; preds = %BB23
br i1 false, label %BB16, label %BB25
BB25: ; preds = %BB24
%26 = call i1 @llvm.expect.i1(i1 false, i1 true)
br i1 %26, label %BB16, label %BB26
BB26: ; preds = %BB25
%27 = load void (i32)*, void (i32)** @"[tramp_4429] System.ThrowHelper:ThrowNotSupportedException (System.ExceptionResource)"
notail call monocc void %27(i32 64), !managed_name !1
unreachable
}
is optimized into this:
define dso_local monocc i128 @"System.Runtime.Intrinsics.Vector128:WithLower<bool> (System.Runtime.Intrinsics.Vector128`1<bool>,System.Runtime.Intrinsics.Vector64`1<bool>)"([2 x i64] %arg_vector, [1 x i64] %arg_value) #0 gc "coreclr" {
BB26:
%0 = load void (i32)*, void (i32)** @"[tramp_4429] System.ThrowHelper:ThrowNotSupportedException (System.ExceptionResource)", align 8
%1 = load i64, i64* inttoptr (i64 281473709282408 to i64*)
%2 = icmp eq i64 %1, 0
br i1 %2, label %gc.safepoint_poll.exit, label %gc.safepoint_poll.poll.i, !prof !1
gc.safepoint_poll.poll.i: ; preds = %BB26
%3 = load void ()*, void ()** @mono_threads_state_poll
call void %3() #2
br label %gc.safepoint_poll.exit
gc.safepoint_poll.exit: ; preds = %BB26, %gc.safepoint_poll.poll.i
notail call monocc void %0(i32 64), !managed_name !2
unreachable
}
Because we don't run tests with LLVM assertions enabled, we won't fail-fast when emitting an invalid bitcast.
#52791 changes this; we don't inline this managed implementation of operator ==
, and the invalid bitcasts aren't elided. So the test later fails when attempting to do isel for a bogus bitcast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests pass because we currently implement operator == for Type via an intrinsic.
Sorry, I'm a bit confused here as to why this means the tests "pass". The tests in question are doing code similar to the following:
bool succeeded = false;
try
{
Vector128<ulong> result = default(Vector128<bool>).As<bool, ulong>();
}
catch (NotSupportedException)
{
succeeded = true;
}
if (!succeeded)
{
TestLibrary.TestFramework.LogInformation($"Vector128BooleanAsGeneric_UInt64: RunNotSupportedScenario failed to throw NotSupportedException.");
TestLibrary.TestFramework.LogInformation(string.Empty);
throw new Exception("One or more scenarios did not complete as expected.");
}
Given this, I'd expect NotSupportedException
to be throw for any invalid T
being used with As
. However, as seen with #53132 and with what this PR is fixing, unsupported T
were being allowed and invalid IR generated.
That is, I'm not connecting the dots here on why these tests "pass" but public static Vector128<T> WithLower<T>(this Vector128<T> vector, Vector64<T> value
was getting invalid LLVM IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the definition of WithLower
:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static Vector128<T> WithLower<T>(this Vector128<T> vector, Vector64<T> value)
where T : struct
{
if (AdvSimd.IsSupported)
{
return AdvSimd.InsertScalar(vector.AsUInt64(), 0, value.AsUInt64()).As<ulong, T>();
}
return SoftwareFallback(vector, value);
static Vector128<T> SoftwareFallback(Vector128<T> vector, Vector64<T> value)
{
ThrowHelper.ThrowForUnsupportedIntrinsicsVectorBaseType<T>();
Vector128<T> result = vector;
Unsafe.As<Vector128<T>, Vector64<T>>(ref result) = value;
return result;
}
}
Today, on arm64, for function parameters, we map Vector128<T>
to [2 x i64]
and Vector64<T>
to [1 x i64]
, regardless of the validity of the element type.
On function entry, when T
is supported, we construct an LLVM vector of the right shape via a sequence of load
s and insertelement
s, and then we associate the resulting LLVM vector value with the mini IR register that represents that function parameter. The rest of our intrinsics-related code assumes that vectors are normal LLVM values.
When T
is unsupprted, we manipulate it using our generic "value type" convention, which represents instances of value types via an indirection and uses load
s and store
s to manipulate different fields. In this case the mini IR registers associated with each function parameter are mapped directly to these array-typed LLVM function parameters (these are promptly store
d in alloca
d memory for later "value type" use). My previous PR added a hack to stub in an LLVM undef value for .NET vector types with unsupported element types.
On arm64, AdvSimd.IsSupported
is implemented via an intrinsic that binds a constant 1
to a mini IR register.
Vector128:AsUInt64
, before this change, was implemented as an intrinsic that emits an unchecked mini IR OP_XCAST
, even for unsupported T
. This is translated to an LLVM bitcast, under the assumption that the source mini IR register in the originating OP_XCAST
is associated with a valid LLVM vector value. This is the source of
%17 = bitcast [2 x i64] %arg_vector to <2 x i64>
%18 = bitcast [1 x i64] %arg_value to <1 x i64>
in the unoptimized LLVM IR in my previous comment.
As
is later inlined. We (contrary to my incorrect PR description in my now-closed PR) don't intrinsify As
. So we then translate two inlined invocations of ThrowForUnsupportedIntrinsicsVectorBaseType
. Because we happen to be JIT-compiling this System.Private.CoreLib.dll code (rather than AOT compiling it--the only things AOTed right now in the llvmaot test lanes are the individual runtime test assemblies), the exact addresses of all associated Type
instances are "JIT-statically" known in each invocation of ThrowForUnsupportedIntrinsicsVectorBaseType
. The intrinsified inequality comparisons between JIT-time constants are all true and are "optimized" via mini IR-specific passes that run before translation to LLVM IR. This is the source of
BB17: ; preds = %BB14
br i1 false, label %BB16, label %BB18
BB18: ; preds = %BB17
br i1 false, label %BB16, label %BB19
BB19: ; preds = %BB18
br i1 false, label %BB16, label %BB20
BB20: ; preds = %BB19
br i1 false, label %BB16, label %BB21
BB21: ; preds = %BB20
br i1 false, label %BB16, label %BB22
BB22: ; preds = %BB21
br i1 false, label %BB16, label %BB23
BB23: ; preds = %BB22
br i1 false, label %BB16, label %BB24
BB24: ; preds = %BB23
br i1 false, label %BB16, label %BB25
BB25: ; preds = %BB24
%26 = call i1 @llvm.expect.i1(i1 false, i1 true)
br i1 %26, label %BB16, label %BB26
BB26: ; preds = %BB25
%27 = load void (i32)*, void (i32)** @"[tramp_4429] System.ThrowHelper:ThrowNotSupportedException (System.ExceptionResource)"
notail call monocc void %27(i32 64), !managed_name !1
unreachable
in the unoptimized LLVM IR in my previous comment.
The passes that we've enabled for LLVM as a JIT then apparently elide all side-effect-free instructions (including the invalid AsUInt64
bitcasts), even if they're located in basic blocks that dominate the basic block that contains the invocation of System.ThrowHelper:ThrowNotSupportedException
. So we've been generating invalid IR for a while but because our implementation of Type
operator ==
is an intrinsic (and because we're JIT compiling functions from System.Private.CoreLib.dll, which I should have mentioned in my previous comment), this invalid IR has been getting optimized away. This happened to make these tests "pass".
If we were running these tests with an assertion-enabled LLVM, this would have been caught much earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, on arm64, for function parameters, we map Vector128 to [2 x i64] and Vector64 to [1 x i64], regardless of the validity of the element type.
This seems potentially problematic for ABI handling since vector types get special handling. In particular Vector128<T>
, for a valid T
, is contracted to map to the __m128
ABI type on x86/x64 (there are corresponding ABI type for Vector256
and similar types for Vector64/Vector128 on ARM32/ARM64). This can also be impactful for understanding of things like HFA
vs HVA
as struct S { float x, y, z, w; }
and __m128
are different types on some platforms, as are 2 x i64
vs 4 x float
.
Is there any particular reason this isn't just specially handling valid types as a n x ...
element vector and handling unsupported types as a regular struct definition, so that things "work" and match the type and ABI handling that RyuJIT has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea why our calling convention bounces Vector{64,128,256}<T>
through GPRs or through memory. This is something I'd like to improve.
EDIT: Probably most accurate to say that the work to make the CC computation code SIMD-aware just hasn't happened yet, so all the existing code piggybacks off the existing support for "value types".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The passing/receiving of simd types as simd is just not implemented right now, there is no particular reason for it afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was mostly trying to figure out why it was that things were handled as SIMD at all for unrecognized types.
In RyuJIT, we specially recognize a few T
and everything else just falls back to as if it were any plain old struct type.
No description provided.