-
Notifications
You must be signed in to change notification settings - Fork 263
Release Notes
Here we draft the release notes for the next release.
Note: format is [summary] [commit hash or PR#] [author(s)]
Use the release notes helper script
to generate the preliminary list. Then group the changes and review the descriptions and look out for ????
Mostly the first line of the commit line is a good summary, but please think through each entry and (re)write a summary that helps users quickly determine if this change would be interesting/useful to them. For example, include the name of the intrinsic/function in the summary so that users don't have to click through each commit themselves.
-
avoid warnings when "__ARM_NEON_FP" is not defined. f046ab7 @clopez
-
Rename ARM ROL/ROR functions with a SIMDE prefix. cb846d9 @Syonyk
-
define native alias only under the inverse of the conditions of a pass-through 2b450c0 @mr-c
-
cmla{_rot{90,180,270},}_lane: fix implementations with correct tests (confirmed on an ARMv8.3 system) 00ea77e @wewe5215
-
crc32: define
SIMDE_ARCH_ARM_CRC32
and consistently use it 01470d2 @mr-c -
qdmlal: fix saturation (#1194) cf1db25 @Ryo-not-rio
-
qdmlsl: fix instructions to use saturation correctly 44a748a @Ryo-not-rio
-
qdmulh: Fix vqdmulhs_s32 native alias. 403e942 @Syonyk
-
qdmull: Fix SQDMULL implementation for 32-bit inputs. (#1255) 948b236 @Syonyk
-
qrdmulh: Remove incorrect SSE code. 8e27139 @Syonyk
-
qrshl: Fix incorrect UQRSHL implementation. 2c6adb6 @Syonyk
-
qshl: Fix UQSHL to match hardware. Add extensive test vectors. (#1256) e5d5064 @Syonyk
-
qshlu: Fix vqshlud_n_s64 implementation to be 64-bit. 3527e86 @Syonyk
-
sli_n: Fix invalid shifts (#1253) 8067442 @Syonyk
-
vminnmv_f16: remove duplicate statement (#1208) d1d9f82 @mr-c
- avx512f: new intrinsics family: fmaddsub (#1246) 6daf535 @robinchrist
- fma: Use 128 bit fnmadd_pd to do 256 bit fnmadd_pd (#1197) bd05320 @AlexK-BD
- avx:
_mm256_storeu_pd
and_mm256_loadu_pd
using 128 bit lanes 96054b8 @AlexK-BD - avx: use INT64_C when the destination is i64 (#1238) 60a3a24 @jinboson
- sse4.2: Apply half tabular method in
_mm_crc32
family for the best trade-off between performance and lookup table size 0f68b62 @Cuda-Chen - sse2: move definition of 'value' to correct branch in
simde_mm_loadl_epi64
b8e468a @K-os - sse2: fix overflow error detected by clang scan-build in simde_mm_srl_epi{16,32,64} when count is too high 1a9d47f @mr-c
- some better implementations for MSVC and others without
SIMDE_STATEMENT_EXPR_
1691ae0 @mr-c
- wasm: add u16x8 and u8x16 avgr AltiVec optimized implementations f9bf637 @wrv
- wasm: add u16x8 and u8x16 avgr NEON optimized implementations 7e65734 @wrv
- wasm simd128: fix a FAST_NANS error on arm64 a9ebb8a @mr-c
- arm neon native: FCMLA with 16-bit floats, requires the FP16 feature 4936149 @mr-c
- arm neon native: replace use of
SIMDE_ARCH_ARM_CHECK(8+)
with feature checks. afd77a9 @mr-c
- float16: use a portable version to avoid compilation errors 600050d @XiWeiGu
- x86/sse2: add lsx support b331ea2 @HecaiYuan
- x86/sse2: small fixes for loongarch d344e3c @jinboson
- x86/sse4.2: add loongarch lsx optimized implementations fa6a869 @HecaiYuan
- x86/sse4.1: add loongarch lsx optimized implementations f85ad3b @HecaiYuan
- x86/ssse3: add loongarch lsx optimized implementations 879be03 @HecaiYuan
- x86/sse3: add loongarch lsx optimized implementations 8fdc0e8 @HecaiYuan
- x86/sse: Fix type convert error for LSX. a6d4207 @yinshiyou
- x86/sse: add loongarch lsx optimized implementations 49f73d9 @HecaiYuan
- x86/avx2: add loongarch lasx optimized implementations (#1241) d62ab5a @jinboson
- x86/avx2: small fixes for loongarch 1bbb5af @jinboson
- x86/avx: add loongarch lasx optimized implementations (#1239) 5e406dc @jinboson
- x86/avx: reoptimized
simde_mm256_addsub_ps/d
with lasx 4242de3 @jinboson - x86/clmul:
_x_bitreverse_u64
: add loongarch implementation (#1249) 866cc57 @jinboson - x86/fma: add loongarch lasx optimized implementations d2cd71b @jinboson
- x86/f16c: add loongarch lasx optimized implementations a70fca2 @jinboso
- arm: improve performance in vqadd and vmvn in risc-v 17416b1 @zengdage
- arm/neon: additional RVV implementations (43 instructions) - part 1 (#1188) 6346405 @Ruhung
- arm/neon: additional RVV implementations (34 instructions) - part 2. (#1189) c903416 @wewe5215
- x86 sse2: fix
_mm_pause
for RISCV systems ed042d5 @mr-c
- arm neon st2: add vst2_u8 WASM optimized implementation 9aeb89e @wrv
- arm neon shll_n: add vshll WASM optimized implementations 1fdca85 @wrv
- arm neon st4: add vst4_u8 WASM optimized implementation 7f47244 @wrv
- sse2: remove redundant
mm_add_pd
optimized implementation for WASM (#1190) 8ee42f6 @wrv - sse2: Wasm SIMD version of
_mm_sad_epu8
bc37d4b @wrv
- neon/cvz: stop using deprecated functions. 776d0b6 @mr-c
- Don't use
_Float16
on s390x a1ce45c @mcatanzaro - Don't use
_Float16
on non-SSE2 x86 40f4d28 @mcatanzaro - x86 avx512: fix clang type redef error f4daa86 @bd-jahn
- Use
_Float16
in C++ on aarch64 with GCC 13+ e30e6ec @mcatanzaro - arm neon: fix arm64 gcc11 build excess elements in vector failure d370f28 @Qingwu-Li
- arm neon: avoid vst1_*_x4 built-in functions in GCC 11 and before 557fd6d @Qingwu-Li
- arm neon sm3: gcc-14 -O3 complained about some possible uninitialized values 99ac62b @mr-c
- arm neon
_vext_p6
: reverse logic to avoid GCC14 i586 bug (#1251) e958b0a @mr-c - risc64 gcc-14: Disable uninitialized variable warnings for some ARM neon SM3 functions b2ad094 @Syonyk
- simde-aes: gcc 13.2+ ignore unused variable warnings f4f5904 @mr-c
- arm neon gcc-12 FRINT workaround e5605e9 @mr-c
- add
simde_MemoryBarrier
to avoid including<windows.h>
f47e3c5 @Epixu
- meson: 0.55.1 is needed for Python 3.12+ 030c07c @mr-c
- x86/avx: Adding several overflow tests for various avx functions e8c881d @qvd808
- arm neon qdmlsl: unroll SIMDE_CONSTIFY for testing macro implemented functions 858b005 @mr-c
- native-aliases test: allow running on macos 6b6e4ef @mr-c
- arm neon abd & cvt tests: add missing import ab5c3e5 @mr-c
- Add tests for vqdmulhs_s32. f56ef45 @Syonyk
- x86 sse2: skip two extreme test cases for
mm_cvtps_epi32
ifSIMDE_FAST_ROUND_TIES
is active. 0e6756b @mr-c
- stop testing with MSVC 2022 until they fix their regressions b6ea9ba @mr-c
- switch container for gcc11 i686 -O2 test 56b7c7a @mr-c
- run on the primary development branch to prime the cache f0de562 @mr-c
- always save ccache cache 02cc09b 6eabe36 @mr-c
- add linux arm64 native aliases testing b036110 @mr-c
- use ccache consistently ab758b5 @mr-c
- GitHub has retired the macos-11 runners, add some more -13 (x86-64) and -14 (arm64) testing 32c959c @mr-c
- ensure that gcov is present when needed 6f52a1d @mr-c
- upgrade to Ubuntu 24.04 LTS; upgrade/add GCC 13 / clang 18 d67c190 @mr-c
- test loongson + lsx with gcc14 from Ubuntu Oracular 59bf8de @mr-c
- add CI testing for gcc 11 aarch64/arm64 4b96738 @mr-c
- upgrade gcc-qemu to gcc-14 561556c @mr-c
- test aarch64 without extra features 6686232 @mr-c
- add loongarch64 clang-18 test ac3870b @mr-c
- clean up install list 9cbeced @mr-c
- pin emsdk to earlier version until https://github.com/llvm/llvm-project/issues/117200 is fixed and released 3257054 @mr-c
- upgrade Ubuntu Mantic to Ubuntu Noble (24.04) e1bc420 @mr-c
- macos: xcode 14.3.1 is no longer available, switch to macos-15 to test xcode 16.0 7035777 @mr-c
- msvc-arm64: turn off due to compiler issue 6802efa @mr-c
- macos 12: deprecated, going offline on 2024-12-03 2bb7f48 @mr-c
- update CI test for loongarch 0cf3528 @jinboson
- Add some native Linux arm64 clang builds 2f0c939 @mr-c
- aarch64 qemu testing: increase arm levels and features targeted. 067ab5d @mr-c
- Add more native Linux arm64 builds 693337a @mr-c
- more ccache 17b2cbf @mr-c
- pow: consistently use simde_math_pow 8f727c0 @mr-c
- math: typo fix, check
SIMDE_MATH_NANF
instead of the old-styleSIMDE_NANF
40567df @mr-c - math: Whoops, missing comma 73e43dd @Dave-Lowndes
- remove extraneous semicolons from many macro-defined functions 01f7a4f @mr-c
Template for next time
# Summary
## [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
### Newly added function families
### Additions to existing families
## [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
## [MSA](https://github.com/simd-everywhere/implementation-status/blob/main/msa.md)
# Details
## Implementation of Arm intrinsics
### NEON
### SVE Intrinsics
## WASM intrinsics
## x86 intrinsics
### SSE*
### AVX
### AVX2
### AVX512
### GFNI
### XOP
### F16C
### FMA
### SVML
## MIPS MSA intrinics
## Arch support
### arm64
### z/Arch
### AltiVec
### e2k (Elbrus)
### Power
## Testing with Docker/Podman & CI
### [Appveyor](https://ci.appveyor.com/project/nemequ/simde/history)
### [Azure](https://dev.azure.com/simd-everywhere/SIMDe/_build?definitionId=3)
### [Circle CI](https://app.circleci.com/pipelines/github/simd-everywhere/simde)
### [Cirrus CI](https://cirrus-ci.com/github/simd-everywhere/simde)
### [Local testing with Docker/Podman](https://github.com/simd-everywhere/simde/tree/master/docker#readme)
### [Drone.io](https://cloud.drone.io/simd-everywhere/simde)
### [GitHub Actions](https://github.com/simd-everywhere/simde/actions)
### [Netlify](https://app.netlify.com/sites/simde/)
### [Packit CI](https://dashboard.packit.dev/projects/github.com/simd-everywhere/simde)
### [Semaphore CI](https://nemequ.semaphoreci.com/projects/simde)
### [Travis](https://app.travis-ci.com/github/simd-everywhere/simde)
## Misc