Skip to content

target_feature doesn't trickle down to closures and internal fns #58279

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
KyleSiefring opened this issue Feb 8, 2019 · 10 comments
Open

target_feature doesn't trickle down to closures and internal fns #58279

KyleSiefring opened this issue Feb 8, 2019 · 10 comments
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. F-target_feature_11 target feature 1.1 RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@KyleSiefring
Copy link

KyleSiefring commented Feb 8, 2019

Leaves poorly optimized assembly in its wake.

use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

// Creates non inlined calls to intrinsics
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn foo(input: &[__m256]) -> f32 {
    let accum = |val: __m256| {
        let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
        let mut sum = val;
        let mut tmp = _mm256_permutevar8x32_ps(val, roll);
        for i in 0..7 {
            sum = _mm256_add_ps(tmp, sum);
            tmp = _mm256_permutevar8x32_ps(tmp, roll);
        }
        sum
    };
    // Once we call a complex internal closure or fn multiple
    //  times, we find that the compiler hasn't told them that
    //  they can inline or use avx2 intrinsics. Not the sharpest.
    let sum1 = accum(input[0]);
    let sum2 = accum(input[1]);
    _mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}

// Works as expected
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn bar(input: &[__m256]) -> f32 {
    // When we pull this tool out of the shed every thing works
    #[target_feature(enable = "avx2")]
    unsafe fn accum(val: __m256) -> __m256 {
        let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
        let mut sum = val;
        let mut tmp = _mm256_permutevar8x32_ps(val, roll);
        for i in 0..7 {
            sum = _mm256_add_ps(tmp, sum);
            tmp = _mm256_permutevar8x32_ps(tmp, roll);
        }
        sum
    }
    let sum1 = accum(input[0]);
    let sum2 = accum(input[1]);
    _mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}

https://rust.godbolt.org/z/cIr7qS

I found this bug by triggering this one with closures. I wasn't able to trigger it from godbolt. I'm using the latest stable, so if I copied the code in it would work (as in not work).
#50154

Making a separate issue since this one is a performance bug.

@KyleSiefring KyleSiefring changed the title target_feature doesn't trickle down to closures internal fns target_feature doesn't trickle down to closures andinternal fns Feb 8, 2019
@KyleSiefring KyleSiefring changed the title target_feature doesn't trickle down to closures andinternal fns target_feature doesn't trickle down to closures and internal fns Feb 8, 2019
@scottmcm
Copy link
Member

scottmcm commented Feb 8, 2019

I think this is expected for internal fns, as items in general don't get anything from their enclosures (most well known is that they don't get access to generic parameters).

I think I agree for closures, though.

@jonas-schievink jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Feb 9, 2019
@hanna-kruppe
Copy link
Contributor

One problem with closures is that functions with target_feature have to be unsafe, but there's no such thing as unsafe closures.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 11, 2019
@workingjubilee workingjubilee added the A-codegen Area: Code generation label Oct 5, 2021
@workingjubilee workingjubilee added F-target_feature_11 target feature 1.1 RFC A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Mar 3, 2023
@est31
Copy link
Member

est31 commented Mar 9, 2023

I think this has been implemented in the meantime for the target_feature 1.1 feature, as otherwise #108655 wouldn't have been filed.

@LeSeulArtichaut
Copy link
Contributor

(See #73631)

@est31
Copy link
Member

est31 commented Mar 13, 2023

@LeSeulArtichaut maybe we can close this in favour of #73631?

@LeSeulArtichaut
Copy link
Contributor

#73631 doesn't discuss inner functions at all, but if it seems uncontroversial to not allow them on inner functions then I guess this issue can be closed.

@est31
Copy link
Member

est31 commented Mar 15, 2023

Yeah, applying target_feature to inner functions is not good. There are ways to call inner functions from outside the function. It's different for closures.

@RalfJung
Copy link
Member

RalfJung commented Nov 23, 2024

So, can this be closed then? Closures have been handled by #58279, and inner functions generally don't inherit things from the outer function so it'd be surprising if they inherited target_feature.

EDIT: But the inheriting behavior only gets enabled with target_feature 1.1.

@KyleSiefring
Copy link
Author

The current implementation is still a bit of a blemish. I wonder if a subset of potential "target_feature" options are reasonable to pass down to inner functions. I would say no, since this could make things even more confusing.

Maybe a new issue should be created?

@RalfJung
Copy link
Member

AFAIK basically no attributes are inherited by inner functions. Why should target_feature be special?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. F-target_feature_11 target feature 1.1 RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants