-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[mono][jit] Adding Vector128.ConvertXX as intrinsic on arm64. #85163
Conversation
It seems that arm64 saturates when doing f->i conversions, but this is inconsistent with how our scalar conversions work, this is the cause of the CI errors. E.g. a vector conversion of
|
This worked with llvm, so we should generate the same opcodes that llvm does. |
I think I see what's going on. Looking at the disassembly: private static Vector128<float> W(Vector128<float> y)
{
Vector128<int> z = Vector128.ConvertToInt32(y);
return Vector128.ConvertToSingle(z);
} generates ...
0000000000000014 fcvtzs.4s v0, v0
0000000000000018 scvtf.4s v0, v0
... While the scalar case private static float Z(float y) => (float)(int)y; emits ...
000000000000001c fcvtzs x0, s0
0000000000000020 sxtw x0, w0
0000000000000024 scvtf s0, w0
... Note the operand |
The TL;DR is that we want to saturate on overflow here. The longer explanation is that you have to be careful when testing the conversions as there are 3 potentially different behaviors you can see:
In general, overflow caused by conversion of floating-point to integral values is undefined behavior. C# currently constant folds to The general desire, long term, is for us to normalize our behavior to be more consistent. Newer platforms are moving towards "saturation" as the correct approach for this and we previously reviewed and approved the "break" for .NET to make the same transition, it just hasn't happened yet and may require coordination with Roslyn to end up consistent everywhere. This change will allow most platforms (Wasm, Arm64, etc) to be much more efficient and emit a "single instruction". It will also allow us to match the many specs that do require or implement saturating behavior. It will slightly pessimize x64, but we approved some platform specific casting methods for where perf really does matter, so those will be available to use still. |
#61761 was the initial attempt to normalize the behavior, but it was blocked by some needed Mono work and not picked back up, as other work became higher priority. |
All CI errors are now explained. Merging. |
This adds
Vector128
conversions that maintain element width as intrinsic on arm64. These emit a single instruction.Macros for
ucvtf
,scvtf
,fcvtns
,fcvtnu
are converted to the general variant.Contributes to #80566