Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

optimize neon loadu_128/storeu_128 #384

Merged
merged 3 commits into from
Mar 12, 2024
Merged

Conversation

divinity76
Copy link
Contributor

vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)

vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)
@divinity76 divinity76 changed the title slightly optimize neon loadu_128/storeu_128 optimize neon loadu_128/storeu_128 Feb 9, 2024
divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 9, 2024
vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input,
 from 13920 nanoseconds down to 13800 nanoseconds (approx)

ref BLAKE3-team/BLAKE3#384
@oconnor663 oconnor663 merged commit 58bea0b into BLAKE3-team:master Mar 12, 2024
50 checks passed
@oconnor663
Copy link
Member

I see a ~1% improvement on the Graviton2 CPU on my AWS instance too. Thanks!

oconnor663 added a commit that referenced this pull request Mar 12, 2024
Changes since 1.5.0:
- The Rust crate is now compatible with Miri.
- ~1% performance improvement on Arm NEON contributed by @divinity76 (#384).
- Various fixes and improvements in the CMake build.
oconnor663 added a commit that referenced this pull request Mar 12, 2024
Changes since 1.5.0:
- The Rust crate is now compatible with Miri.
- ~1% performance improvement on Arm NEON contributed by @divinity76 (#384).
- Various fixes and improvements in the CMake build.
- The MSRV of b3sum is now 1.74.1. (The MSRV of the library crate is
  unchanged, 1.66.1.)
@oconnor663
Copy link
Member

Released as part of v1.5.1.

@divinity76
Copy link
Contributor Author

divinity76 commented Mar 31, 2024

I wonder if this might have made big endian work too 🤔 (doesn't really matter, nothing runs big endian)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants