Support for Fused Multiply-Add (FMA) #102

dylanede · 2021-04-21T22:34:27Z

From my limited testing, multiplying and then adding SimdF32/SimdF64 does not result in FMA instructions, and rightly so, however it would be useful to have a method on these types and a corresponding platform intrinsic to generate LLVM calls to llvm.fma.* intrinsics, which support vector types.

Edit: In fact it looks like the simd_fma platform intrinsic is already available, just not used by the crate?

The text was updated successfully, but these errors were encountered:

calebzulawski · 2021-04-21T22:49:18Z

Yes, it's simply not implemented yet, but is planned. It's easy enough to add so I'll do it when I get a chance

dylanede · 2021-04-21T22:54:57Z

Well, I'm looking at making a PR for it right now.

Currently it looks like this:

/// Performs `self * b + c` as a single operation.
#[inline]
pub fn fma(self, b: Self, c: Self) -> Self {
    unsafe { crate::intrinsics::simd_fma(self, b, c) }
}

But maybe it should be something like c.add_mul(a, b) which is c + a*b? With the above code it would be a.fma(b, c).

programmerjake · 2021-04-21T23:03:17Z

we also will want something like llvm.fmuladd.* where llvm picks whichever is more efficient: separate mul/add or fma

dylanede · 2021-04-21T23:16:39Z

Yeah, I've just tested the llvm.fma.* version out, and although it kind of improves performance in some cases when the fma feature is available on x86_64, without it it falls back to an fma procedure call, which completely tanks performance.

Presumably both options should be made available, as I understand that fused multiply add normally comes with the guarantee of machine precision for the whole operation, whereas llvm.fmuladd.* does not provide that guarantee.

Doesn't look like a platform intrinsic for fmuladd is exposed yet.

calebzulawski · 2021-04-21T23:21:41Z

I believe the function should be called mul_add to match the equivalent scalar functions in std.

Exposing fast versions of intrinsics is probably a separate issue and definitely affects other functions too

calebzulawski · 2021-04-21T23:23:33Z

@dylanede regarding the procedure call, this is related to #76, we have a path forward to fixing that, though it's non-trivial

workingjubilee · 2021-04-25T20:36:20Z

/// Performs `self * b + c` as a single operation.
#[inline]
pub fn fma(self, b: Self, c: Self) -> Self {
    unsafe { crate::intrinsics::simd_fma(self, b, c) }
}
But maybe it should be something like c.add_mul(a, b) which is c + a*b? With the above code it would be a.fma(b, c).

Sounds good, with appropriate tests!
I agree with Caleb that it should be mul_add. Unfortunately it needs to be gated under the std feature due to the current fallback to a procedure call.

I would rather not have two functions that do the same thing, nor another f32 type (SimdFastF32? what?). Resolving this is beyond the scope of this issue, however.

Lokathor · 2021-04-25T22:18:08Z

yeah people know the term "multiply add" and f32 has a mul_add method already, so SIMD types should also stick to mul_add as the method name.

programmerjake · 2021-04-25T23:06:32Z

maybe:

impl<const N: usize> f32x<N> {
    pub fn mul_add<const STRICT: bool = true>(self, multiplier: Self, addend: Self) -> Self {
        if STRICT {
            // llvm.fma.*
        } else {
            // llvm.fmuladd.*
        }
    }
}

Lokathor · 2021-04-26T01:32:44Z

Since we're not tied to being an exact hardware impl, I think we should just define it from the start to always let llvm pick what's best. If people absolutely want strict ops they can transmute the value to a platform version and use stdarch.

Particularly, I think that const-generics in methods and functions that aren't simply passed in from the type are very bad ergonomically at this time, and should be avoided if possible.

calebzulawski · 2021-04-26T01:58:12Z

IEEE 754-2008 specifies FMA to have a single rounding step.

Lokathor · 2021-04-26T17:46:03Z

I can't tell if that means that you're for or against having strict fma

programmerjake · 2021-04-26T18:03:55Z

I'm for having strict fma by default, alongside another function with semantics of: use separate mul & add (correctly rounded) or use fma (correctly rounded) at compiler's choice -- no other options (in particular the afn flag is not allowed). The non-strict function matches the semantics of llvm.fmuladd.*

gilescope · 2021-05-02T13:56:17Z

@dylanede happy to test your PR out if you've got something that hangs together? I am using _mm_maddubs_epi16 currently - am assuming that mull_add will be able to replace this.

workingjubilee · 2021-05-25T23:07:52Z

ping @dylanede to check if you're still interested in followup or have any questions ❤️

gilescope · 2021-06-10T21:09:06Z

@dylanede if you have something halfway but don't have time to finish it feel free to pop the branch on your fork and I can have a crack at finishing it.

Add various fns - Sum/Product traits - recip/to_degrees/to_radians/min/max/clamp/signum/copysign; #14 - mul_add: #14, fixes #102

dylanede added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Apr 21, 2021

workingjubilee assigned dylanede Apr 25, 2021

programmerjake mentioned this issue May 31, 2021

LIBM - sin, cos, ln, exp and friends. #126

Open

calebzulawski mentioned this issue Jun 13, 2021

Add various fns #138

Merged

workingjubilee closed this as completed in #138 Jun 23, 2021

workingjubilee added a commit that referenced this issue Jun 23, 2021

Merge pull request #138 from rust-lang/feature/various-fns

3872723

Add various fns - Sum/Product traits - recip/to_degrees/to_radians/min/max/clamp/signum/copysign; #14 - mul_add: #14, fixes #102

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Fused Multiply-Add (FMA) #102

Support for Fused Multiply-Add (FMA) #102

dylanede commented Apr 21, 2021 •

edited

Loading

calebzulawski commented Apr 21, 2021

dylanede commented Apr 21, 2021

programmerjake commented Apr 21, 2021

dylanede commented Apr 21, 2021 •

edited

Loading

calebzulawski commented Apr 21, 2021

calebzulawski commented Apr 21, 2021

workingjubilee commented Apr 25, 2021

Lokathor commented Apr 25, 2021

programmerjake commented Apr 25, 2021

Lokathor commented Apr 26, 2021

calebzulawski commented Apr 26, 2021

Lokathor commented Apr 26, 2021

programmerjake commented Apr 26, 2021

gilescope commented May 2, 2021

workingjubilee commented May 25, 2021

gilescope commented Jun 10, 2021

Support for Fused Multiply-Add (FMA) #102

Support for Fused Multiply-Add (FMA) #102

Comments

dylanede commented Apr 21, 2021 • edited Loading

calebzulawski commented Apr 21, 2021

dylanede commented Apr 21, 2021

programmerjake commented Apr 21, 2021

dylanede commented Apr 21, 2021 • edited Loading

calebzulawski commented Apr 21, 2021

calebzulawski commented Apr 21, 2021

workingjubilee commented Apr 25, 2021

Lokathor commented Apr 25, 2021

programmerjake commented Apr 25, 2021

Lokathor commented Apr 26, 2021

calebzulawski commented Apr 26, 2021

Lokathor commented Apr 26, 2021

programmerjake commented Apr 26, 2021

gilescope commented May 2, 2021

workingjubilee commented May 25, 2021

gilescope commented Jun 10, 2021

dylanede commented Apr 21, 2021 •

edited

Loading

dylanede commented Apr 21, 2021 •

edited

Loading