Implement Vector.AddSaturate/SubtractSaturate #107193

lilinus · 2024-08-30T16:31:03Z

Implement #82559

dotnet-issue-labeler · 2024-08-30T16:31:09Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2024-08-30T16:31:10Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2024-08-30T16:31:47Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs

xtqqczze · 2024-09-15T18:52:21Z

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs

Line 3904 in 4c10eff

    
           internal static Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<byte> right)

lilinus · 2024-09-16T08:42:03Z

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

I removed the existing methods that I could find in this PR, but perhaps there are additional methods I have missed.

tannergooding · 2024-09-16T14:54:21Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs

+            if (AdvSimd.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return AdvSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return AdvSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+                if (typeof(T) == typeof(int))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt32(), right.AsInt32()).As<int, T>();
+                }
+                if (typeof(T) == typeof(uint))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt32(), right.AsUInt32()).As<uint, T>();
+                }
+                if (typeof(T) == typeof(long))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt64(), right.AsInt64()).As<long, T>();
+                }
+                if (typeof(T) == typeof(ulong))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt64(), right.AsUInt64()).As<ulong, T>();
+                }
+            }
+
+            if (Sse2.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return Sse2.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return Sse2.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return Sse2.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return Sse2.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+            }
+
+            if (PackedSimd.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return PackedSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return PackedSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return PackedSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return PackedSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+            }
+
+            if (IsHardwareAccelerated)
+            {
+                return VectorMath.AddSaturate<Vector128<T>, T>(left, right);
+            }
+
+            return Create(
+                Vector64.AddSaturate(left._lower, right._lower),
+                Vector64.AddSaturate(left._upper, right._upper)
+            );


This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Rather instead we want them to be implemented in the JIT so that they don't eat away at the inlining budget or run into other issues.

Doing this requires adding an AddSaturate entry to https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistxarch.h and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistarm64.h, for the relevant vector sizes (and mostly mirroring the entry for op_Additition)

You'd then add handling for that in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L1387-L1402 and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicarm64.cpp#L700-L710, mostly following op_Addition again; but since we don't have a general GT_* kind, you'd instead use gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize) where intrinsic is NI_ISA_Name, such as NI_SSE2_AddSaturate

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well. Unsigned is simple as its effectively just the following, as x + y will always be greater than or equal to either input, unless it overflows:

var tmp = x + y; return Vector.ConditionalSelect( Vector.LessThan(tmp, x), MaxValue, tmp );

Signed is a bit trickier, but it basically boils down to (there may be a more efficient way, but this is the basics):

var z = x + y; return Vector.ConditionalSelect( (((x ^ y) ^ SignMask) & (x ^ z)) >> (sizeof(T) * 8 - 1), SignMask ^ (z >> (sizeof(T) * 8 - 1))), z );

This works because x + y for differing signs cannot overflow; while for same signs it can. In general, given two bool you can detect equality via x ^ y ^ 1 and inequality via x ^ y. Given that we want (signX == signY) && (signX != signZ) that gives us the (x ^ y ^ 1) & (x ^ z) given above to determine if overflow occurred. We then arithmetic right shift to propagate the bit so we get AllBitsSet (overflow occurred) or Zero (no overflow) per-element.

If overflow did occur, then we know that a negative result means it should be MaxValue while a positive result means it should be MinValue. Artihmetic shifting z gives us AllBitsSet (negative) or Zero (positive) on a per-element basis, we just need to xor with the sign mask. This gives us 0xFFFF_FFFF ^ 0x8000_0000 or 0x0000_0000 ^ 0x8000_0000, thus negative results become 0x7FFF_FFFF (MaxValue) and positive results become 0x8000_0000 (MinValue)

This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Understood. Thanks for the clear instructions on how to implement this in JIT instead 👍 .

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well.

There is a "fallback" algorithm in the PR already in VectorMath class.
Should the substitution be done in JIT as well for x86/x64 case too, or does it suffice to leave as it is for those cases? If handled in JIT, should the fallback in VectorMath be kept?

I'll try setting this PR as draft until I have successfully made necessary changes.

If handled in JIT, should the fallback in VectorMath be kept?

We're generally using VectorMath for cases that we can't implement in the JIT and which are unlikely to ever be inlined. We do the more naive thing for the managed implementation otherwise (often just decomposing into operating on lower/upper halves and allowing the loop to only exist as part of Vector64<T>)

am11 · 2024-09-19T06:43:52Z

@lilinus in case you didn't knew, there is a patch created by the format leg: https://github.com/dotnet/runtime/actions/runs/10928828860?pr=107193 (under artifacts)

$ cd /path/to/runtime
$ unzip ~/Downloads/format.linux.patch.zip
$ git apply format.patch
$ rm format.patch
# commit and push

tannergooding · 2024-11-08T17:35:27Z

I should be getting to this soon, just working through the backlog of PRs now that I can start focusing on things for .NET 10

tannergooding · 2025-01-10T20:07:05Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            if (simdSize == 8 && varTypeIsLong(simdBaseType))
+            {
+                break;
+            }


This shouldn't be skipped for TYP_LONG, it should just use AddSaturateScalar which is already exposed.

tannergooding · 2025-01-10T20:07:22Z

src/coreclr/jit/hwintrinsicarm64.cpp

+            if (simdSize == 8 && varTypeIsLong(simdBaseType))
+            {
+                break;
+            }


Same here, this should just use SubtractSaturateScalar

tannergooding · 2025-01-10T20:27:29Z

src/coreclr/jit/hwintrinsicxarch.cpp

+                    op1     = impSIMDPopStack();
+                    retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize);
+                }
+            }


Handling for int, uint, long, and ulong should likely still be added.

For unsigned this is simply:

var z = x + y; return ConditionalSelect(LessThan(z, x), Create(MaxValue), z);

For signed this is (I believe, did this ad-hoc so didn't double check the logic is 100% accurate):

var z = x + y; var o = ((x ^ y) ^ AllBitsSet) & (x ^ z); return ConditionalSelect(IsNegative(o), ConditionalSelect(IsNegative(z), Create(MaxValue), Create(MinValue)), z);

-- For detecting overflow we only really care about whether the sign of both inputs were the same and then if the output sign differs from that. In such a case that we have overflow, we then know that a negative result should've clamped to MaxValue (it overflowed from positive and became a negative result) and positive result should've clamped to MinValue. So the logic is detecting just that, if the sign of x and y are the same, then xoring the two will clear the sign, then xoring it with AllBitsSet means we get 1 in the sign bit. We then and that with the check of whether x and z (the result) differ which tells us if overflow occurred for that value if the resulting mask is negative.

Subtraction is much the same, just with a GreaterThan comparison for unsigned and the same logic for signed.

This is implemented fallback already in VectorMath.cs. Should I try to move it to JIT?

Yes. We try and avoid having relatively simple code for the core xplat APIs from being in managed, as there is a non-trivial cost to inlining, doing dead code elimination, and generally eating into the JIT budget for importing all the IR.

dotnet-policy-service · 2025-02-13T14:13:34Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

lilinus added 2 commits August 30, 2024 18:29

Implement Vector.AddSaturate / SubtractSaturate

3a12fe6

Add tests

ad5f344

dotnet-issue-labeler bot added area-System.Numerics new-api-needs-documentation labels Aug 30, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 30, 2024

lilinus changed the title ~~Add sub saturate~~ Implement.AddSaturate/SubtractSaturate Aug 30, 2024

lilinus commented Aug 30, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs Show resolved Hide resolved

Cleanup VectorMath

9ee124c

build-analysis bot mentioned this pull request Sep 4, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

lilinus changed the title ~~Implement.AddSaturate/SubtractSaturate~~ Implement Vector.AddSaturate/SubtractSaturate Sep 16, 2024

lilinus marked this pull request as ready for review September 16, 2024 08:48

tannergooding reviewed Sep 16, 2024

View reviewed changes

lilinus and others added 3 commits September 17, 2024 11:26

Optimize add/sub saturate fallback

0e69a94

Implement intrinsics in runtime

6cf7033

Merge branch 'main' into add-sub-saturate

3defc25

build-analysis bot mentioned this pull request Sep 18, 2024

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

Fixes to Vector.Add/SubSaturate

e7a637b

lilinus and others added 2 commits September 19, 2024 09:22

Apply format patch

435a7be

Merge branch 'main' into add-sub-saturate

d357672

build-analysis bot mentioned this pull request Sep 19, 2024

ProcessThreadTests.TestStartTimeProperty failure in CI #105526

Open

build-analysis bot mentioned this pull request Sep 5, 2024

System.Runtime.Serialization.Formatters CI failure. #107309

Closed

Merge branch 'main' into add-sub-saturate

ec2b8c4

build-analysis bot mentioned this pull request Nov 8, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

tannergooding reviewed Jan 10, 2025

View reviewed changes

lilinus marked this pull request as draft January 14, 2025 10:17

dotnet-policy-service bot closed this Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Vector.AddSaturate/SubtractSaturate #107193

Implement Vector.AddSaturate/SubtractSaturate #107193

lilinus commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-policy-service bot commented Aug 30, 2024

xtqqczze commented Sep 15, 2024

lilinus commented Sep 16, 2024 •

edited

Loading

tannergooding Sep 16, 2024

lilinus Sep 17, 2024 •

edited

Loading

tannergooding Jan 11, 2025

am11 commented Sep 19, 2024

tannergooding commented Nov 8, 2024

tannergooding Jan 10, 2025

tannergooding Jan 10, 2025

tannergooding Jan 10, 2025

tannergooding Jan 10, 2025

lilinus Jan 11, 2025

tannergooding Jan 11, 2025

dotnet-policy-service bot commented Feb 13, 2025

Implement Vector.AddSaturate/SubtractSaturate #107193

Implement Vector.AddSaturate/SubtractSaturate #107193

Conversation

lilinus commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-policy-service bot commented Aug 30, 2024

xtqqczze commented Sep 15, 2024

lilinus commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

lilinus Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

am11 commented Sep 19, 2024

tannergooding commented Nov 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnet-policy-service bot commented Feb 13, 2025

lilinus commented Sep 16, 2024 •

edited

Loading

lilinus Sep 17, 2024 •

edited

Loading