Skip to content

clang: _mm512_reduce_add_ps lowers to LLVM IR that does not reflect correct reduce order #82813

Open
@RalfJung

Description

@RalfJung

This

#include <immintrin.h>
float foo(__m512 x) {
    return _mm512_reduce_add_ps(x);
}

produces

define dso_local noundef float @foo(float vector[16])(<16 x float> noundef %x) local_unnamed_addr #0 {
entry:
  %0 = tail call reassoc noundef float @llvm.vector.reduce.fadd.v16f32(float -0.000000e+00, <16 x float> %x)
  ret float %0
}

According to the LangRef, the reassoc here means that the addition may happen in any order, which is not what Intel documents -- they specify a particular, "tree-like" order.

Even worse, we can chain two of these operations:

#include <immintrin.h>
float foo(__m512 x) {
    float xr = _mm512_reduce_add_ps(x);
    __m512 y = _mm512_set_ps(
        xr, 1.8, 9.3, 0.0, 2.5, 0.0, 6.7, 9.0,
        0.0, 1.8, 9.3, 0.0, 2.5, 0.0, 6.7, 9.0
    );
    return _mm512_reduce_add_ps(y);
}

Now the second addition may be arbitrarily re-associated with the first one. As far as I understand, there's nothing about reassoc that constrains the re-association to only happen "inside" a single operation (and indeed, as a fast-math flag it is explicitly intended to apply when multiple subsequent operations are all reassoc).

_mm512_reduce_add_ps should probably either use a vendor-specific intrinsic, or LLVM IR needs a version of vector.reduce.fadd that explicitly specifies the "tree-like" reduction order documented by Intel.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions