Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

#BONUS intrinsics that might be useful #84

Open
8 of 33 tasks
p0nce opened this issue Oct 18, 2021 · 2 comments
Open
8 of 33 tasks

#BONUS intrinsics that might be useful #84

p0nce opened this issue Oct 18, 2021 · 2 comments

Comments

@p0nce
Copy link
Collaborator

p0nce commented Oct 18, 2021

Add one here every time you wish for one:

  • _mm_cvtpd_epi64 that would convert 2x double using MXCSR would speed up things for arm and non-avx x86 => actually a AVX512DQ + AVX512VL existing instruction
  • _mm_abs_ps
  • _mm_movemask_epi16
  • _mm_cmpge_epi8
  • _mm_cmpge_epi16 (twice)
  • _mm_cmple_epi8
  • _mm_cmple_epi16
  • _mm_not_si128

Ideas from Alfred Klomp

  • mm_absdiff_epu16
  • mm_absdiff_epu8
  • mm_blendv_si128
  • mm_bswap_epi16
  • mm_bswap_epi32
  • mm_bswap_epi64
  • mm_bswap_si128
  • mm_cmpge_epu16
  • mm_cmpge_epu8
  • mm_cmpgt_epu16
  • mm_cmpgt_epu8
  • mm_cmple_epu16
  • mm_cmple_epu8
  • mm_cmplt_epu16
  • mm_cmplt_epu8
  • mm_div255_epu16
  • mm_div_epu8
  • mm_divfast_epu16
  • mm_divfast_epu8
  • mm_max_epu16
  • mm_min_epu16
  • mm_not_si128
  • mm_scale_epu8
  • _mm256_unpacklo_si128
  • _mm256_unpackhi_si128
@p0nce
Copy link
Collaborator Author

p0nce commented Jan 24, 2022

@p0nce
Copy link
Collaborator Author

p0nce commented Oct 2, 2022

complex multiply, complex add, complex sub, complex divide

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant