Release Notes

Here we draft the release notes for the next release.

Note: format is [summary] [commit hash or PR#] [author(s)]

Use the release notes helper script to generate the preliminary list. Then group the changes and review the descriptions and look out for ????

Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.

SIMDe 0.8.4

Summary

Details

NEON

avoid warnings when "__ARM_NEON_FP" is not defined. f046ab7 @clopez
Rename ARM ROL/ROR functions with a SIMDE prefix. cb846d9 @Syonyk
define native alias only under the inverse of the conditions of a pass-through 2b450c0 @mr-c
cmla{_rot{90,180,270},}_lane: fix implementations with correct tests (confirmed on an ARMv8.3 system) 00ea77e @wewe5215
crc32: define SIMDE_ARCH_ARM_CRC32 and consistently use it 01470d2 @mr-c
qdmlal: fix saturation (#1194) cf1db25 @Ryo-not-rio
qdmlsl: fix instructions to use saturation correctly 44a748a @Ryo-not-rio
qdmulh: Fix vqdmulhs_s32 native alias. 403e942 @Syonyk
qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 948b236 @Syonyk
qrdmulh: Remove incorrect SSE code. 8e27139 @Syonyk
qrshl: Fix incorrect UQRSHL implementation. 2c6adb6 @Syonyk
qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) e5d5064 @Syonyk
qshlu: Fix vqshlud_n_s64 implementation to be 64-bit. 3527e86 @Syonyk
sli_n: Fix invalid shifts (#1253) 8067442 @Syonyk
vminnmv_f16: remove duplicate statement (#1208) d1d9f82 @mr-c

WASM intrinsics

x86 intrinsics

avx512f: new intrinsics family: fmaddsub (#1246) 6daf535 @robinchrist
fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) bd05320 @AlexK-BD
avx: _mm256_storeu_pd and _mm256_loadu_pd using 128 bit lanes 96054b8 @AlexK-BD
avx: use INT64_C when the destination is i64 (#1238) 60a3a24 @jinboson
sse4.2: Apply half tabular method in _mm_crc32 family for the best trade-off between performance and lookup table size 0f68b62 @Cuda-Chen
sse2: move definition of 'value' to correct branch in simde_mm_loadl_epi64 b8e468a @K-os
sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high 1a9d47f @mr-c
some better implementations for MSVC and others without SIMDE_STATEMENT_EXPR_ 1691ae0 @mr-c

SVML

XOP

Arch support

Altivec

wasm: add u16x8 and u8x16 avgr AltiVec optimized implementations f9bf637 @wrv

arm / arm64

wasm: add u16x8 and u8x16 avgr NEON optimized implementations 7e65734 @wrv
wasm simd128: fix a FAST_NANS error on arm64 a9ebb8a @mr-c
arm neon native: FCMLA with 16-bit floats, requires the FP16 feature 4936149 @mr-c
arm neon native: replace use of SIMDE_ARCH_ARM_CHECK(8+) with feature checks. afd77a9 @mr-c

LongAarch

float16: use a portable version to avoid compilation errors 600050d @XiWeiGu
x86/sse2: add lsx support b331ea2 @HecaiYuan
x86/sse2: small fixes for loongarch d344e3c @jinboson
x86/sse4.2: add loongarch lsx optimized implementations fa6a869 @HecaiYuan
x86/sse4.1: add loongarch lsx optimized implementations f85ad3b @HecaiYuan
x86/ssse3: add loongarch lsx optimized implementations 879be03 @HecaiYuan
x86/sse3: add loongarch lsx optimized implementations 8fdc0e8 @HecaiYuan
x86/sse: Fix type convert error for LSX. a6d4207 @yinshiyou
x86/sse: add loongarch lsx optimized implementations 49f73d9 @HecaiYuan
x86/avx2: add loongarch lasx optimized implementations (#1241) d62ab5a @jinboson
x86/avx2: small fixes for loongarch 1bbb5af @jinboson
x86/avx: add loongarch lasx optimized implementations (#1239) 5e406dc @jinboson
x86/avx: reoptimized simde_mm256_addsub_ps/d with lasx 4242de3 @jinboson
x86/clmul: _x_bitreverse_u64: add loongarch implementation (#1249) 866cc57 @jinboson
x86/fma: add loongarch lasx optimized implementations d2cd71b @jinboson
x86/f16c: add loongarch lasx optimized implementations a70fca2 @jinboso

RISCV64

arm: improve performance in vqadd and vmvn in risc-v 17416b1 @zengdage
arm/neon: additional RVV implementations (43 instructions) - part 1 (#1188) 6346405 @Ruhung
arm/neon: additional RVV implementations (34 instructions) - part 2. (#1189) c903416 @wewe5215
x86 sse2: fix _mm_pause for RISCV systems ed042d5 @mr-c

WASM

arm neon st2: add vst2_u8 WASM optimized implementation 9aeb89e @wrv
arm neon shll_n: add vshll WASM optimized implementations 1fdca85 @wrv
arm neon st4: add vst4_u8 WASM optimized implementation 7f47244 @wrv
sse2: remove redundant mm_add_pd optimized implementation for WASM (#1190) 8ee42f6 @wrv
sse2: Wasm SIMD version of _mm_sad_epu8 bc37d4b @wrv

z/Arch

neon/cvz: stop using deprecated functions. 776d0b6 @mr-c

Compiler Specific

Clang

Don't use _Float16 on s390x a1ce45c @mcatanzaro
Don't use _Float16 on non-SSE2 x86 40f4d28 @mcatanzaro
x86 avx512: fix clang type redef error f4daa86 @bd-jahn

GCC

Use _Float16 in C++ on aarch64 with GCC 13+ e30e6ec @mcatanzaro
arm neon: fix arm64 gcc11 build excess elements in vector failure d370f28 @Qingwu-Li
arm neon: avoid vst1_*_x4 built-in functions in GCC 11 and before 557fd6d @Qingwu-Li
arm neon sm3: gcc-14 -O3 complained about some possible uninitialized values 99ac62b @mr-c
arm neon _vext_p6: reverse logic to avoid GCC14 i586 bug (#1251) e958b0a @mr-c
risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions b2ad094 @Syonyk
simde-aes: gcc 13.2+ ignore unused variable warnings f4f5904 @mr-c
arm neon gcc-12 FRINT workaround e5605e9 @mr-c

Emscripten

MSVC

add simde_MemoryBarrier to avoid including <windows.h> f47e3c5 @Epixu

Testing with Docker/Podman & CI

meson: 0.55.1 is needed for Python 3.12+ 030c07c @mr-c
x86/avx: Adding several overflow tests for various avx functions e8c881d @qvd808
arm neon qdmlsl: unroll SIMDE_CONSTIFY for testing macro implemented functions 858b005 @mr-c
native-aliases test: allow running on macos 6b6e4ef @mr-c
arm neon abd & cvt tests: add missing import ab5c3e5 @mr-c
Add tests for vqdmulhs_s32. f56ef45 @Syonyk
x86 sse2: skip two extreme test cases for mm_cvtps_epi32 if SIMDE_FAST_ROUND_TIES is active. 0e6756b @mr-c

Appveyor

stop testing with MSVC 2022 until they fix their regressions b6ea9ba @mr-c

Circle CI

switch container for gcc11 i686 -O2 test 56b7c7a @mr-c
run on the primary development branch to prime the cache f0de562 @mr-c
always save ccache cache 02cc09b 6eabe36 @mr-c
add linux arm64 native aliases testing b036110 @mr-c
use ccache consistently ab758b5 @mr-c

GitHub Actions

GitHub has retired the macos-11 runners, add some more -13 (x86-64) and -14 (arm64) testing 32c959c @mr-c
ensure that gcov is present when needed 6f52a1d @mr-c
upgrade to Ubuntu 24.04 LTS; upgrade/add GCC 13 / clang 18 d67c190 @mr-c
test loongson + lsx with gcc14 from Ubuntu Oracular 59bf8de @mr-c
add CI testing for gcc 11 aarch64/arm64 4b96738 @mr-c
upgrade gcc-qemu to gcc-14 561556c @mr-c
test aarch64 without extra features 6686232 @mr-c
add loongarch64 clang-18 test ac3870b @mr-c
clean up install list 9cbeced @mr-c
pin emsdk to earlier version until https://github.com/llvm/llvm-project/issues/117200 is fixed and released 3257054 @mr-c
upgrade Ubuntu Mantic to Ubuntu Noble (24.04) e1bc420 @mr-c
macos: xcode 14.3.1 is no longer available, switch to macos-15 to test xcode 16.0 7035777 @mr-c
msvc-arm64: turn off due to compiler issue 6802efa @mr-c
macos 12: deprecated, going offline on 2024-12-03 2bb7f48 @mr-c
update CI test for loongarch 0cf3528 @jinboson
Add some native Linux arm64 clang builds 2f0c939 @mr-c
aarch64 qemu testing: increase arm levels and features targeted. 067ab5d @mr-c
Add more native Linux arm64 builds 693337a @mr-c
more ccache 17b2cbf @mr-c

Packit CI

Semaphore CI

Misc

pow: consistently use simde_math_pow 8f727c0 @mr-c
math: typo fix, check SIMDE_MATH_NANF instead of the old-style SIMDE_NANF 40567df @mr-c
math: Whoops, missing comma 73e43dd @Dave-Lowndes
remove extraneous semicolons from many macro-defined functions 01f7a4f @mr-c

Template for next time

# Summary
## [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
### Newly added function families
### Additions to existing families
## [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
## [MSA](https://github.com/simd-everywhere/implementation-status/blob/main/msa.md)
# Details
## Implementation of Arm intrinsics
### NEON
### SVE Intrinsics
## WASM intrinsics
## x86 intrinsics
### SSE*
### AVX
### AVX2
### AVX512
### GFNI 
### XOP
### F16C
### FMA
### SVML
## MIPS MSA intrinics
## Arch support
### arm64
### z/Arch
### AltiVec
### e2k (Elbrus)
### Power
## Testing with Docker/Podman & CI
### [Appveyor](https://ci.appveyor.com/project/nemequ/simde/history)
### [Azure](https://dev.azure.com/simd-everywhere/SIMDe/_build?definitionId=3)
### [Circle CI](https://app.circleci.com/pipelines/github/simd-everywhere/simde)
### [Cirrus CI](https://cirrus-ci.com/github/simd-everywhere/simde)
### [Local testing with Docker/Podman](https://github.com/simd-everywhere/simde/tree/master/docker#readme)
### [Drone.io](https://cloud.drone.io/simd-everywhere/simde)
### [GitHub Actions](https://github.com/simd-everywhere/simde/actions)
### [Netlify](https://app.netlify.com/sites/simde/)
### [Packit CI](https://dashboard.packit.dev/projects/github.com/simd-everywhere/simde)
### [Semaphore CI](https://nemequ.semaphoreci.com/projects/simde)
### [Travis](https://app.travis-ci.com/github/simd-everywhere/simde)
## Misc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly