Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

🚧 POC - support NaNs for SSE & AVX2 f32 #18

Closed
wants to merge 12 commits into from
Closed

🚧 POC - support NaNs for SSE & AVX2 f32 #18

wants to merge 12 commits into from

Conversation

jvdd
Copy link
Owner

@jvdd jvdd commented Feb 4, 2023

This PR aims at handling Nans, #16.

For the scalar implementation I check whether the scalar value is not equal to itself. This check will only be true if the scalar value is NAN, as the following is correct in Rust:

let v = f32::NAN;
assert!(v != v);

For the SIMD implementation I used a transformation similar to the one used in #1 - this transformation projects the NANs to integer values that are either higher / lower than the "real" floating point values. The transformation leverages the 2-complement https://observablehq.com/@rreusser/half-precision-floating-point-visualized

Some remarks:

  • + inf & - inf will get projected as well
    • validate if these are inbetween the NaNs & the "real" floating point values (pretty sure this is the case)
      => this is indeed the case - see plot ⬇️
      image
  • there are a plethora of ways to represent NaNs in floating point representation: exponent should be 11111 and the fraction should be non-zero. Thus the sign bit may be 1 or 0 -> resulting in half of the NaNs getting projected above and the other half below the "real" floating point values. Thus, only 1 of the 2 checks (either > or <) will fire & thus detect the NaN. This might be a problem when we want to implement argmin and argmax as separate functions..

Paths I looked into but did not seem worthwhile:

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 4, 2023

CodSpeed Performance Report

Merging #18 NaNs (df40466) will degrade performances by 2.4%.

Summary

🔥 6 improvements
❌ 5 regressions
✅ 33 untouched benchmarks

🆕 0 new benchmarks
⁉️ 0 dropped benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main NaNs Change
scalar_random_long_f32 1.7 ms 2.2 ms -29.87%
sse_random_long_f32 712.1 µs 946.8 µs -32.96%
avx_random_long_f32 402.9 µs 467.1 µs -15.91%
impl_random_long_f32 403.2 µs 467.2 µs -15.88%
🔥 sse_random_long_i32 797.3 µs 776 µs 2.68%
scalar_random_long_f64 1.9 ms 2.4 ms -27.06%
🔥 avx2_random_long_i64 868.4 µs 847.1 µs 2.45%
🔥 impl_random_long_i64 868.6 µs 847.3 µs 2.45%
🔥 sse_random_long_u32 861.4 µs 840 µs 2.48%
🔥 avx2_random_long_u64 868.4 µs 847.1 µs 2.45%
🔥 impl_random_long_u64 868.7 µs 847.3 µs 2.46%

@jvdd
Copy link
Owner Author

jvdd commented Feb 4, 2023

@varon I just wrote out very quickly what I pursued in this PR (is more a POC than a proper PR).
I'm interested in hearing your feedback on this! You are - of course - free to start pursuing totally different paths on how we can tackle this issue (& thus ignore what I just tried here).

P.S.: I'll improve the phrasing tomorrow - just wanted to quickly push & share the code 🙃

@varon varon mentioned this pull request Feb 5, 2023
@jvdd
Copy link
Owner Author

jvdd commented Feb 6, 2023

CodSpeed Performance Report

| ❌ | sse_random_long_f32 | 712.1 µs | 946.8 µs | -32.96% |

When I run the benchmarks on my local machine I notice only a 3-4% regression 🤔

-> main branch - ignores NaN unless NaN present in first lane
sse_random_long_f32     time:   [41.074 µs 41.109 µs 41.148 µs]
-> this branch - returns NaN index if NaN present
sse_random_long_f32     time:   [42.572 µs 42.660 µs 42.816 µs]

@jvdd jvdd changed the title 🚧 partially supported NaNs for SSE f32 🚧 suppor NaNs for SSE & AVX2 f32 Feb 6, 2023
@jvdd jvdd changed the title 🚧 suppor NaNs for SSE & AVX2 f32 🚧 support NaNs for SSE & AVX2 f32 Feb 6, 2023
@jvdd jvdd mentioned this pull request Feb 6, 2023
1 task
@jvdd jvdd mentioned this pull request Feb 12, 2023
23 tasks
@varon
Copy link
Contributor

varon commented Feb 17, 2023

Is this superseded by #21 ?

@jvdd jvdd changed the title 🚧 support NaNs for SSE & AVX2 f32 🚧 POC - support NaNs for SSE & AVX2 f32 Feb 17, 2023
@jvdd
Copy link
Owner Author

jvdd commented Feb 17, 2023

Is this superseded by #21 ?

Yes! This was some sort of a proof of concept, showcasing the utility of ord_transform (ordinal transformation of float to int) for NaN handling.
Will close this PR now.

@jvdd jvdd closed this Feb 17, 2023
@jvdd jvdd deleted the NaNs branch February 28, 2023 21:17
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants