Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163

Merged
merged 4 commits into from
Apr 26, 2023

Conversation

jandupej
Copy link
Member

This adds Vector128 conversions that maintain element width as intrinsic on arm64. These emit a single instruction.

Macros for ucvtf,scvtf, fcvtns, fcvtnu are converted to the general variant.

Contributes to #80566

@jandupej
Copy link
Member Author

It seems that arm64 saturates when doing f->i conversions, but this is inconsistent with how our scalar conversions work, this is the cause of the CI errors. E.g. a vector conversion of (float)Int32.MaxValue to int and back yields -3.4028235E+38 -> -2147483648 -> -2.1474836E+09, while in scalar code the results are -3.4028235E+38 -> 0 -> 0. Also strangely, 3.4028235E+38 yields -1 then converting to int in scalar code. In-range conversions look all right. @vargaz @tannergooding :

  • Do we have a defined behavior for converting out-of-range float to int? Brief look over ECMA turned up only the rounding mode.
  • Do we want to replicate the scalar overflow behavior in vector code? This would yield much longer and slower code for conversions.

@vargaz
Copy link
Contributor

vargaz commented Apr 24, 2023

This worked with llvm, so we should generate the same opcodes that llvm does.

@jandupej
Copy link
Member Author

This worked with llvm, so we should generate the same opcodes that llvm does.

I think I see what's going on. Looking at the disassembly:

private static Vector128<float> W(Vector128<float> y)
{
    Vector128<int> z = Vector128.ConvertToInt32(y);
    return Vector128.ConvertToSingle(z); 
}

generates

...
0000000000000014        fcvtzs.4s       v0, v0
0000000000000018        scvtf.4s        v0, v0
...

While the scalar case

private static float Z(float y) => (float)(int)y;

emits

...
000000000000001c        fcvtzs  x0, s0
0000000000000020        sxtw    x0, w0
0000000000000024        scvtf   s0, w0
...

Note the operand x0 in the scalar case at fcvtz. This converts a float into int64, then takes the lower 32 bits and sign-extends them to 64 bits. This is distinct from directly converting to a 32-bit int in the overflow cases. I would expect a direct conversion from float to int do what the vector code here does - convert directly to int32. I did not check what LLVM does, but since the test passes there, it seems that its scalar and vector implementations are consistent. A quick check indicates that CoreCLR behaves as the proposed vector implementation also.

@tannergooding
Copy link
Member

It seems that arm64 saturates when doing f->i conversions, but this is inconsistent with how our scalar conversions work, this is the cause of the CI errors. E.g. a vector conversion of (float)Int32.MaxValue to int and back yields -3.4028235E+38 -> -2147483648 -> -2.1474836E+09, while in scalar code the results are -3.4028235E+38 -> 0 -> 0. Also strangely, 3.4028235E+38 yields -1 then converting to int in scalar code. In-range conversions look all right. @vargaz @tannergooding :

The TL;DR is that we want to saturate on overflow here.


The longer explanation is that you have to be careful when testing the conversions as there are 3 potentially different behaviors you can see:

  1. C# constant folding
  2. JIT constant folding
  3. Platform specific behavior

In general, overflow caused by conversion of floating-point to integral values is undefined behavior. C# currently constant folds to 0. The JIT generally defers to the C compiler implementation, except for on xarch where it has a "quirk" that was meant to work around a very old MSVC bug (which no longer exists) and where it go the behavior wrong for double->uint64_t.

The general desire, long term, is for us to normalize our behavior to be more consistent. Newer platforms are moving towards "saturation" as the correct approach for this and we previously reviewed and approved the "break" for .NET to make the same transition, it just hasn't happened yet and may require coordination with Roslyn to end up consistent everywhere.

This change will allow most platforms (Wasm, Arm64, etc) to be much more efficient and emit a "single instruction". It will also allow us to match the many specs that do require or implement saturating behavior. It will slightly pessimize x64, but we approved some platform specific casting methods for where perf really does matter, so those will be available to use still.

@tannergooding
Copy link
Member

#61761 was the initial attempt to normalize the behavior, but it was blocked by some needed Mono work and not picked back up, as other work became higher priority.

@jandupej
Copy link
Member Author

All CI errors are now explained. Merging.

@jandupej jandupej merged commit a9e7717 into dotnet:main Apr 26, 2023
@jandupej jandupej deleted the arm64-simd-cvt branch April 26, 2023 14:03
@ghost ghost locked as resolved and limited conversation to collaborators May 26, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants