-
Notifications
You must be signed in to change notification settings - Fork 87
Support for Fused Multiply-Add (FMA) #102
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Yes, it's simply not implemented yet, but is planned. It's easy enough to add so I'll do it when I get a chance |
Well, I'm looking at making a PR for it right now. Currently it looks like this: /// Performs `self * b + c` as a single operation.
#[inline]
pub fn fma(self, b: Self, c: Self) -> Self {
unsafe { crate::intrinsics::simd_fma(self, b, c) }
} But maybe it should be something like |
we also will want something like |
Yeah, I've just tested the Presumably both options should be made available, as I understand that fused multiply add normally comes with the guarantee of machine precision for the whole operation, whereas Doesn't look like a platform intrinsic for fmuladd is exposed yet. |
I believe the function should be called mul_add to match the equivalent scalar functions in std. Exposing fast versions of intrinsics is probably a separate issue and definitely affects other functions too |
Sounds good, with appropriate tests! I would rather not have two functions that do the same thing, nor another f32 type (SimdFastF32? what?). Resolving this is beyond the scope of this issue, however. |
yeah people know the term "multiply add" and |
maybe: impl<const N: usize> f32x<N> {
pub fn mul_add<const STRICT: bool = true>(self, multiplier: Self, addend: Self) -> Self {
if STRICT {
// llvm.fma.*
} else {
// llvm.fmuladd.*
}
}
} |
Since we're not tied to being an exact hardware impl, I think we should just define it from the start to always let llvm pick what's best. If people absolutely want strict ops they can transmute the value to a platform version and use stdarch. Particularly, I think that const-generics in methods and functions that aren't simply passed in from the type are very bad ergonomically at this time, and should be avoided if possible. |
IEEE 754-2008 specifies FMA to have a single rounding step. |
I can't tell if that means that you're for or against having strict fma |
I'm for having strict fma by default, alongside another function with semantics of: use separate mul & add (correctly rounded) or use fma (correctly rounded) at compiler's choice -- no other options (in particular the |
@dylanede happy to test your PR out if you've got something that hangs together? I am using |
ping @dylanede to check if you're still interested in followup or have any questions ❤️ |
@dylanede if you have something halfway but don't have time to finish it feel free to pop the branch on your fork and I can have a crack at finishing it. |
From my limited testing, multiplying and then adding
SimdF32
/SimdF64
does not result in FMA instructions, and rightly so, however it would be useful to have a method on these types and a corresponding platform intrinsic to generate LLVM calls tollvm.fma.*
intrinsics, which support vector types.Edit: In fact it looks like the
simd_fma
platform intrinsic is already available, just not used by the crate?The text was updated successfully, but these errors were encountered: