Skip to content
Michael R. Crusoe edited this page Feb 1, 2025 · 40 revisions

Here we draft the release notes for the next release.

Note: format is [summary] [commit hash or PR#] [author(s)]

Use the release notes helper script to generate the preliminary list. Then group the changes and review the descriptions and look out for ????

Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.

SIMDe 0.8.4

Summary

Details

NEON

  • avoid warnings when "__ARM_NEON_FP" is not defined. f046ab7 @clopez

  • Rename ARM ROL/ROR functions with a SIMDE prefix. cb846d9 @Syonyk

  • define native alias only under the inverse of the conditions of a pass-through 2b450c0 @mr-c

  • cmla{_rot{90,180,270},}_lane: fix implementations with correct tests (confirmed on an ARMv8.3 system) 00ea77e @wewe5215

  • crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 01470d2 @mr-c

  • qdmlal: fix saturation (#1194) cf1db25 @Ryo-not-rio

  • qdmlsl: fix instructions to use saturation correctly 44a748a @Ryo-not-rio

  • qdmulh: Fix vqdmulhs_s32 native alias. 403e942 @Syonyk

  • qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 948b236 @Syonyk

  • qrdmulh: Remove incorrect SSE code. 8e27139 @Syonyk

  • qrshl: Fix incorrect UQRSHL implementation. 2c6adb6 @Syonyk

  • qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) e5d5064 @Syonyk

  • qshlu: Fix vqshlud_n_s64 implementation to be 64-bit. 3527e86 @Syonyk

  • sli_n: Fix invalid shifts (#1253) 8067442 @Syonyk

  • vminnmv_f16: remove duplicate statement (#1208) d1d9f82 @mr-c

WASM intrinsics

x86 intrinsics

  • avx512f: new intrinsics family: fmaddsub (#1246) 6daf535 @robinchrist
  • fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) bd05320 @AlexK-BD
  • avx: _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes 96054b8 @AlexK-BD
  • avx: use INT64_C when the destination is i64 (#1238) 60a3a24 @jinboson
  • sse4.2: Apply half tabular method in _mm_crc32 family for the best trade-off between performance and lookup table size 0f68b62 @Cuda-Chen
  • sse2: move definition of 'value' to correct branch in simde_mm_loadl_epi64 b8e468a @K-os
  • sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high 1a9d47f @mr-c
  • some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ 1691ae0 @mr-c

SVML

XOP

Arch support

Altivec

  • wasm: add u16x8 and u8x16 avgr AltiVec optimized implementations f9bf637 @wrv

arm / arm64

  • wasm: add u16x8 and u8x16 avgr NEON optimized implementations 7e65734 @wrv
  • wasm simd128: fix a FAST_NANS error on arm64 a9ebb8a @mr-c
  • arm neon native: FCMLA with 16-bit floats, requires the FP16 feature 4936149 @mr-c
  • arm neon native: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. afd77a9 @mr-c

LongAarch

  • float16: use a portable version to avoid compilation errors 600050d @XiWeiGu
  • x86/sse2: add lsx support b331ea2 @HecaiYuan
  • x86/sse2: small fixes for loongarch d344e3c @jinboson
  • x86/sse4.2: add loongarch lsx optimized implementations fa6a869 @HecaiYuan
  • x86/sse4.1: add loongarch lsx optimized implementations f85ad3b @HecaiYuan
  • x86/ssse3: add loongarch lsx optimized implementations 879be03 @HecaiYuan
  • x86/sse3: add loongarch lsx optimized implementations 8fdc0e8 @HecaiYuan
  • x86/sse: Fix type convert error for LSX. a6d4207 @yinshiyou
  • x86/sse: add loongarch lsx optimized implementations 49f73d9 @HecaiYuan
  • x86/avx2: add loongarch lasx optimized implementations (#1241) d62ab5a @jinboson
  • x86/avx2: small fixes for loongarch 1bbb5af @jinboson
  • x86/avx: add loongarch lasx optimized implementations (#1239) 5e406dc @jinboson
  • x86/avx: reoptimized simde_mm256_addsub_ps/d with lasx 4242de3 @jinboson
  • x86/clmul: _x_bitreverse_u64: add loongarch implementation (#1249) 866cc57 @jinboson
  • x86/fma: add loongarch lasx optimized implementations d2cd71b @jinboson
  • x86/f16c: add loongarch lasx optimized implementations a70fca2 @jinboso

RISCV64

  • arm: improve performance in vqadd and vmvn in risc-v 17416b1 @zengdage
  • arm/neon: additional RVV implementations (43 instructions) - part 1 (#1188) 6346405 @Ruhung
  • arm/neon: additional RVV implementations (34 instructions) - part 2. (#1189) c903416 @wewe5215
  • x86 sse2: fix _mm_pause for RISCV systems ed042d5 @mr-c

WASM

  • arm neon st2: add vst2_u8 WASM optimized implementation 9aeb89e @wrv
  • arm neon shll_n: add vshll WASM optimized implementations 1fdca85 @wrv
  • arm neon st4: add vst4_u8 WASM optimized implementation 7f47244 @wrv
  • sse2: remove redundant mm_add_pd optimized implementation for WASM (#1190) 8ee42f6 @wrv
  • sse2: Wasm SIMD version of _mm_sad_epu8 bc37d4b @wrv

z/Arch

  • neon/cvz: stop using deprecated functions. 776d0b6 @mr-c

Compiler Specific

Clang

  • Don't use _Float16 on s390x a1ce45c @mcatanzaro
  • Don't use _Float16 on non-SSE2 x86 40f4d28 @mcatanzaro
  • x86 avx512: fix clang type redef error f4daa86 @bd-jahn

GCC

  • Use _Float16 in C++ on aarch64 with GCC 13+ e30e6ec @mcatanzaro
  • arm neon: fix arm64 gcc11 build excess elements in vector failure d370f28 @Qingwu-Li
  • arm neon: avoid vst1_*_x4 built-in functions in GCC 11 and before 557fd6d @Qingwu-Li
  • arm neon sm3: gcc-14 -O3 complained about some possible uninitialized values 99ac62b @mr-c
  • arm neon _vext_p6: reverse logic to avoid GCC14 i586 bug (#1251) e958b0a @mr-c
  • risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions b2ad094 @Syonyk
  • simde-aes: gcc 13.2+ ignore unused variable warnings f4f5904 @mr-c
  • arm neon gcc-12 FRINT workaround e5605e9 @mr-c

Emscripten

MSVC

  • add simde_MemoryBarrier to avoid including <windows.h> f47e3c5 @Epixu

Testing with Docker/Podman & CI

  • meson: 0.55.1 is needed for Python 3.12+ 030c07c @mr-c
  • x86/avx: Adding several overflow tests for various avx functions e8c881d @qvd808
  • arm neon qdmlsl: unroll SIMDE_CONSTIFY for testing macro implemented functions 858b005 @mr-c
  • native-aliases test: allow running on macos 6b6e4ef @mr-c
  • arm neon abd & cvt tests: add missing import ab5c3e5 @mr-c
  • Add tests for vqdmulhs_s32. f56ef45 @Syonyk
  • x86 sse2: skip two extreme test cases for mm_cvtps_epi32 if SIMDE_FAST_ROUND_TIES is active. 0e6756b @mr-c
  • stop testing with MSVC 2022 until they fix their regressions b6ea9ba @mr-c
  • switch container for gcc11 i686 -O2 test 56b7c7a @mr-c
  • run on the primary development branch to prime the cache f0de562 @mr-c
  • always save ccache cache 02cc09b 6eabe36 @mr-c
  • add linux arm64 native aliases testing b036110 @mr-c
  • use ccache consistently ab758b5 @mr-c
  • GitHub has retired the macos-11 runners, add some more -13 (x86-64) and -14 (arm64) testing 32c959c @mr-c
  • ensure that gcov is present when needed 6f52a1d @mr-c
  • upgrade to Ubuntu 24.04 LTS; upgrade/add GCC 13 / clang 18 d67c190 @mr-c
  • test loongson + lsx with gcc14 from Ubuntu Oracular 59bf8de @mr-c
  • add CI testing for gcc 11 aarch64/arm64 4b96738 @mr-c
  • upgrade gcc-qemu to gcc-14 561556c @mr-c
  • test aarch64 without extra features 6686232 @mr-c
  • add loongarch64 clang-18 test ac3870b @mr-c
  • clean up install list 9cbeced @mr-c
  • pin emsdk to earlier version until https://github.com/llvm/llvm-project/issues/117200 is fixed and released 3257054 @mr-c
  • upgrade Ubuntu Mantic to Ubuntu Noble (24.04) e1bc420 @mr-c
  • macos: xcode 14.3.1 is no longer available, switch to macos-15 to test xcode 16.0 7035777 @mr-c
  • msvc-arm64: turn off due to compiler issue 6802efa @mr-c
  • macos 12: deprecated, going offline on 2024-12-03 2bb7f48 @mr-c
  • update CI test for loongarch 0cf3528 @jinboson
  • Add some native Linux arm64 clang builds 2f0c939 @mr-c
  • aarch64 qemu testing: increase arm levels and features targeted. 067ab5d @mr-c
  • Add more native Linux arm64 builds 693337a @mr-c
  • more ccache 17b2cbf @mr-c

Misc

  • pow: consistently use simde_math_pow 8f727c0 @mr-c
  • math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF 40567df @mr-c
  • math: Whoops, missing comma 73e43dd @Dave-Lowndes
  • remove extraneous semicolons from many macro-defined functions 01f7a4f @mr-c
Template for next time

# Summary
## [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
### Newly added function families
### Additions to existing families
## [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
## [MSA](https://github.com/simd-everywhere/implementation-status/blob/main/msa.md)
# Details
## Implementation of Arm intrinsics
### NEON
### SVE Intrinsics
## WASM intrinsics
## x86 intrinsics
### SSE*
### AVX
### AVX2
### AVX512
### GFNI 
### XOP
### F16C
### FMA
### SVML
## MIPS MSA intrinics
## Arch support
### arm64
### z/Arch
### AltiVec
### e2k (Elbrus)
### Power
## Testing with Docker/Podman & CI
### [Appveyor](https://ci.appveyor.com/project/nemequ/simde/history)
### [Azure](https://dev.azure.com/simd-everywhere/SIMDe/_build?definitionId=3)
### [Circle CI](https://app.circleci.com/pipelines/github/simd-everywhere/simde)
### [Cirrus CI](https://cirrus-ci.com/github/simd-everywhere/simde)
### [Local testing with Docker/Podman](https://github.com/simd-everywhere/simde/tree/master/docker#readme)
### [Drone.io](https://cloud.drone.io/simd-everywhere/simde)
### [GitHub Actions](https://github.com/simd-everywhere/simde/actions)
### [Netlify](https://app.netlify.com/sites/simde/)
### [Packit CI](https://dashboard.packit.dev/projects/github.com/simd-everywhere/simde)
### [Semaphore CI](https://nemequ.semaphoreci.com/projects/simde)
### [Travis](https://app.travis-ci.com/github/simd-everywhere/simde)
## Misc