[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

jandupej · 2023-04-21T14:08:26Z

This adds Vector128 conversions that maintain element width as intrinsic on arm64. These emit a single instruction.

Macros for ucvtf,scvtf, fcvtns, fcvtnu are converted to the general variant.

Contributes to #80566

jandupej · 2023-04-24T13:01:38Z

It seems that arm64 saturates when doing f->i conversions, but this is inconsistent with how our scalar conversions work, this is the cause of the CI errors. E.g. a vector conversion of (float)Int32.MaxValue to int and back yields -3.4028235E+38 -> -2147483648 -> -2.1474836E+09, while in scalar code the results are -3.4028235E+38 -> 0 -> 0. Also strangely, 3.4028235E+38 yields -1 then converting to int in scalar code. In-range conversions look all right. @vargaz @tannergooding :

Do we have a defined behavior for converting out-of-range float to int? Brief look over ECMA turned up only the rounding mode.
Do we want to replicate the scalar overflow behavior in vector code? This would yield much longer and slower code for conversions.

vargaz · 2023-04-24T13:03:41Z

This worked with llvm, so we should generate the same opcodes that llvm does.

jandupej · 2023-04-24T14:06:11Z

This worked with llvm, so we should generate the same opcodes that llvm does.

I think I see what's going on. Looking at the disassembly:

private static Vector128<float> W(Vector128<float> y)
{
    Vector128<int> z = Vector128.ConvertToInt32(y);
    return Vector128.ConvertToSingle(z); 
}

generates

...
0000000000000014        fcvtzs.4s       v0, v0
0000000000000018        scvtf.4s        v0, v0
...

While the scalar case

private static float Z(float y) => (float)(int)y;

emits

...
000000000000001c        fcvtzs  x0, s0
0000000000000020        sxtw    x0, w0
0000000000000024        scvtf   s0, w0
...

Note the operand x0 in the scalar case at fcvtz. This converts a float into int64, then takes the lower 32 bits and sign-extends them to 64 bits. This is distinct from directly converting to a 32-bit int in the overflow cases. I would expect a direct conversion from float to int do what the vector code here does - convert directly to int32. I did not check what LLVM does, but since the test passes there, it seems that its scalar and vector implementations are consistent. A quick check indicates that CoreCLR behaves as the proposed vector implementation also.

tannergooding · 2023-04-24T15:28:31Z

It seems that arm64 saturates when doing f->i conversions, but this is inconsistent with how our scalar conversions work, this is the cause of the CI errors. E.g. a vector conversion of (float)Int32.MaxValue to int and back yields -3.4028235E+38 -> -2147483648 -> -2.1474836E+09, while in scalar code the results are -3.4028235E+38 -> 0 -> 0. Also strangely, 3.4028235E+38 yields -1 then converting to int in scalar code. In-range conversions look all right. @vargaz @tannergooding :

The TL;DR is that we want to saturate on overflow here.

The longer explanation is that you have to be careful when testing the conversions as there are 3 potentially different behaviors you can see:

C# constant folding
JIT constant folding
Platform specific behavior

In general, overflow caused by conversion of floating-point to integral values is undefined behavior. C# currently constant folds to 0. The JIT generally defers to the C compiler implementation, except for on xarch where it has a "quirk" that was meant to work around a very old MSVC bug (which no longer exists) and where it go the behavior wrong for double->uint64_t.

The general desire, long term, is for us to normalize our behavior to be more consistent. Newer platforms are moving towards "saturation" as the correct approach for this and we previously reviewed and approved the "break" for .NET to make the same transition, it just hasn't happened yet and may require coordination with Roslyn to end up consistent everywhere.

This change will allow most platforms (Wasm, Arm64, etc) to be much more efficient and emit a "single instruction". It will also allow us to match the many specs that do require or implement saturating behavior. It will slightly pessimize x64, but we approved some platform specific casting methods for where perf really does matter, so those will be available to use still.

tannergooding · 2023-04-24T15:30:15Z

#61761 was the initial attempt to normalize the behavior, but it was blocked by some needed Mono work and not picked back up, as other work became higher priority.

jandupej · 2023-04-26T14:03:31Z

All CI errors are now explained. Merging.

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64.

81ab5d8

jandupej added the area-Codegen-JIT-mono label Apr 21, 2023

jandupej added this to the 8.0.0 milestone Apr 21, 2023

jandupej requested a review from fanyang-mono April 21, 2023 14:08

jandupej self-assigned this Apr 21, 2023

jandupej requested review from vargaz, lambdageek and SamMonoRT as code owners April 21, 2023 14:08

This was referenced Apr 21, 2023

[wasm] interpreter timeouts when WebSocket closes unexpectedly #84101

Closed

[wasm] DebuggerTests.EvaluateOnCallFrameTests timeouts on CI #85168

Closed

vargaz approved these changes Apr 21, 2023

View reviewed changes

Changed rounding model on f->i conversion.

e70660a

jandupej mentioned this pull request Apr 25, 2023

[mono][jit] Fix float to int32 casting overflow behavior #85316

Closed

jandupej added 2 commits April 25, 2023 15:08

Disabled f32->i32 casting test.

c882ea9

Disabled all of failing JIT tests.

ba466c1

build-analysis bot mentioned this pull request Apr 25, 2023

Various WASM timeouts on CI #85304

Closed

jandupej mentioned this pull request Apr 26, 2023

[mono][wasm] Unable to find RazorClassLibrary.dll to be lazy loaded later. #85395

Closed

jandupej merged commit a9e7717 into dotnet:main Apr 26, 2023

jandupej deleted the arm64-simd-cvt branch April 26, 2023 14:03

kotlarmilos mentioned this pull request May 4, 2023

[Perf] Linux/arm64: 16 Improvements on 4/26/2023 2:24:59 PM dotnet/perf-autofiling-issues#17314

Closed

ghost locked as resolved and limited conversation to collaborators May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

jandupej commented Apr 21, 2023

jandupej commented Apr 24, 2023

vargaz commented Apr 24, 2023

jandupej commented Apr 24, 2023

tannergooding commented Apr 24, 2023

tannergooding commented Apr 24, 2023

jandupej commented Apr 26, 2023

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

Conversation

jandupej commented Apr 21, 2023

jandupej commented Apr 24, 2023

vargaz commented Apr 24, 2023

jandupej commented Apr 24, 2023

tannergooding commented Apr 24, 2023

tannergooding commented Apr 24, 2023

jandupej commented Apr 26, 2023