Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

✨ support f16 + 🧹 some minor refactoring #1

Merged
merged 6 commits into from
Nov 5, 2022
Merged

✨ support f16 + 🧹 some minor refactoring #1

merged 6 commits into from
Nov 5, 2022

Conversation

jvdd
Copy link
Owner

@jvdd jvdd commented Nov 5, 2022

This PR does the following;

  • ✨ add (optional) efficient support for f16 (through the half package)
  • 🧹 some minor refactoring (using macro's to avoid duplicate code)
  • 🤖 add CI-CD (not yet tested)

P.S. f16 is supported through converting it to (what I call) i16ord - which is an ordinal (i.e., monotonic) mapping of f16 to i16;

The ord transformation:

ord_transform(v: i16) = ((v >> 15) & 0x7FFF) ^ v)

(to apply this on a f16, just transmute f16 to i16 first)

Some useful properties of this transformation

  • 🙌 As ordinality is preserved, we can use fast built-in i16 (SIMD) instructions for comparison.
  • ↔️ As the transformation is symmetric we can - as long as we don't change the i16(ord) values - transform the outcome back to f16 without needing a lookup table.
  • ⚡ (bonus): transformation only performs binary (bitwise) operations, ensuring minimal overhead
    => these operations can easily implemented in SIMD instructions 🎉

Visualization of the transformation

Illustration of ord_transform on all possible float16 numbers.
You can observe the montonic rising slope 🥳

image

Illustration of the symmetry propetry.
When applying the ord_transform twice on the same value, we get back the original value!!
image

Limitations

  • NaN values and infs are not supported in this transformation

Benchmarks

image

The f16 support that leverages the ord_transform:

  • f16 SIMD ~ 2x faster than f32 SIMD 🔥
  • f16 scalar ~ 1.25x slower than f32 scalar (:face_exhaling:)
    • 🐎 ~10x faster than generic scalar code (which f32 uses) on half::f16
    • 🤯 ~3x faster than f32 upcasting (i.e., replacing ord_transform with to_f32 in the implementation)

@jvdd jvdd merged commit f2d036f into main Nov 5, 2022
@jvdd jvdd deleted the refactoring branch November 16, 2022 20:22
@jvdd jvdd mentioned this pull request Feb 26, 2023
23 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant