Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support avx512 bitmasks with dynamic feature detection #332

Open
reinerp opened this issue Mar 7, 2023 · 3 comments · May be fixed by #333
Open

Support avx512 bitmasks with dynamic feature detection #332

reinerp opened this issue Mar 7, 2023 · 3 comments · May be fixed by #333
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@reinerp
Copy link

reinerp commented Mar 7, 2023

The choice between full_masks.rs and bitmask.rs seems to be made at compile time (

) based on compile-time presence of the avx512f target feature.

When compiling for general x86-64 targets but doing runtime feature detection for avx512f, the current approach will generate suboptimal code for avx512, because it will use the full_masks.rs implementation instead of the bitmask.rs implementation.

I'd like to be able to use bitmasks in avx512 even when using runtime feature detection. It's probably out of scope (and, I think, undesirable) for std::simd to be responsible for runtime feature detection, so one option would be for std::simd to be refactored to expose both Mask and Bitmask as two separate types -- perhaps both implementing the same set of traits. This way, callers could choose between a full-mask implementation and a bitmask implementation based on whatever logic they choose, including e.g. runtime feature detection.

@reinerp reinerp added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Mar 7, 2023
@reinerp
Copy link
Author

reinerp commented Mar 8, 2023

Actually looking through the implementation of bitmask.rs, I see that in avx512 mode, Mask<T, LANES> is just a wrapper for LANES::BitMask, which is already accessible to users. So I guess the answer to my request is already available: it is "use LANES::BitMask if you want a guarantee of single-bit-per-lane". Perhaps some additional documentation is all that is required.

@calebzulawski
Copy link
Member

It's worth noting that when compiling for instruction sets that use bitmasks, the compiler often optimizes these lane-width masks to bitmasks (as far as LLVM is concerned, all masks are truncated to bitmasks anyway). The layout matters less than you might think--only when writing masks to memory or general purpose registers to use them outside of SIMD operations, which is relatively rare.

@reinerp
Copy link
Author

reinerp commented Mar 13, 2023

That's a really great point Caleb, and I think it pretty much entirely addresses my concern. Thanks!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants