Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[wasm] Improve SIMD vector equality operator #79719

Merged

Conversation

radekdoulik
Copy link
Member

@radekdoulik radekdoulik commented Dec 15, 2022

Improve the code we emit for vector equality. Instead of using multiple shuffles, use alltrue instructions

i8x16.all_true(a: v128) -> i32
i16x8.all_true(a: v128) -> i32
i32x4.all_true(a: v128) -> i32
i64x2.all_true(a: v128) -> i32

That saves size and greatly improves performance. For example Span's SequenceEqual improves like this on chrome.

measurement old new
Span, SequenceEqual bytes 0.0087ms 0.0021ms
Span, SequenceEqual chars 0.0174ms 0.0042ms

The dotnet.wasm size drops by cca 20kbytes for bench sample.

The code diff:

> wa-diff -d -f corlib_System_SpanHelpers_SequenceEqual_byte__byte__uintptr dotnet.old.wasm dotnet.new.wasm
...
          v128.load    [SIMD]
          i8x16.eq    [SIMD]
-         local.tee $4
+         i8x16.all.true    [SIMD]
-         local.get $4
-         i8x16.shuffle 0x00000000000000000f0e0d0c0b0a0908    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000007060504    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000000000302    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000000000001    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         i8x16.extract.lane.u 0    [SIMD]
          i32.eqz
          if
...

Improve the code we emit for vector equality. Instead of using multiple
shuffles, use alltrue instructions

    i8x16.all_true(a: v128) -> i32
    i16x8.all_true(a: v128) -> i32
    i32x4.all_true(a: v128) -> i32
    i64x2.all_true(a: v128) -> i32

That saves size and greatly improves performance. For example Span's
SequenceEqual improves like this on chrome.

| measurement | old | new |
|-:|-:|-:|
|              Span, SequenceEqual bytes |     0.0087ms |     0.0021ms |
|              Span, SequenceEqual chars |     0.0174ms |     0.0042ms |

The code diff:

```
> wa-diff -d -f corlib_System_SpanHelpers_SequenceEqual_byte__byte__uintptr dotnet.old.wasm dotnet.new.wasm
...
          v128.load    [SIMD]
          i8x16.eq    [SIMD]
-         local.tee $4
+         i8x16.all.true    [SIMD]
-         local.get $4
-         i8x16.shuffle 0x00000000000000000f0e0d0c0b0a0908    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000007060504    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000000000302    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         local.tee $4
-         local.get $4
-         i8x16.shuffle 0x00000000000000000000000000000001    [SIMD]
-         local.get $4
-         v128.and    [SIMD]
-         i8x16.extract.lane.u 0    [SIMD]
          i32.eqz
          if
...
```
@radekdoulik radekdoulik merged commit af6b1bd into dotnet:main Dec 16, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jan 15, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants