Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

arm: optimize decoder on Arm SVE2 platform #92

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

cyb70289
Copy link
Contributor

@cyb70289 cyb70289 commented Aug 5, 2024

This patch improves sonic json decoder performance on Arm SVE2 CPU.
It leverages SVMATCH instruction to locate multiple tokens in a vector
efficiently.

Enable this feature by specifying cmake option "-DENABLE_SVE2_128=ON".
Please note the binary can only run on hardware with SVE2 supported,
and the vector size must be 128 bits, like Neoverse-N2. Otherwise,
the code behaviour is undefined.

As shown in the table below, tested on Bluewhale server, obvious
performance uplift is observed from sonic decoder benchmarks.
No side effect observed for other benchmarks.

Benchmark Original SVE2 Improvement
gsoc-2018/Decode_SonicDyn 2.38736 2.76677 15.89%
citm_catalog/Decode_SonicDyn 1.41729 1.76191 24.32%
otfcc/Decode_SonicDyn 399.916 413.417 3.38%
fgo/Decode_SonicDyn 691.597 716.301 3.57%
twitter/Decode_SonicDyn 1.33604 1.58737 18.81%
twitterescaped/Decode_SonicDyn 1.24759 1.30216 4.37%
github_events/Decode_SonicDyn 1.38961 1.65635 19.20%
canada/Decode_SonicDyn 526.145 524.517 -0.31%
poet/Decode_SonicDyn 2.06297 2.40383 16.52%
lottie/Decode_SonicDyn 419.902 438.824 4.51%
book/Decode_SonicDyn 456.615 487.196 6.70%

@xiegx94 xiegx94 requested a review from liuq19 August 5, 2024 06:09
@xiegx94
Copy link
Collaborator

xiegx94 commented Aug 5, 2024

@cyb70289 What's unit of your benchmark results? HIB or LIB?

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 5, 2024

@cyb70289 What's unit of your benchmark results? HIB or LIB?

Gi/s and Mi/s, bytes per second.

As an example

$ build/benchmark/bench --benchmark_filter=Decode_Sonic
gsoc-2018/Decode_SonicDyn         1299148 ns      1299146 ns          537 bytes_per_second=2.38563Gi/s testdata/gsoc-2018.json
citm_catalog/Decode_SonicDyn      1136378 ns      1136290 ns          617 bytes_per_second=1.41565Gi/s testdata/citm_catalog.json
otfcc/Decode_SonicDyn           158508828 ns    158472460 ns            4 bytes_per_second=399.646Mi/s testdata/otfcc.json
fgo/Decode_SonicDyn              67084470 ns     67084360 ns            9 bytes_per_second=692.246Mi/s testdata/fgo.json
......

@xiegx94
Copy link
Collaborator

xiegx94 commented Aug 5, 2024

see #56,support sve as a different arch.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 5, 2024

Thanks, will try to refactor following that PR.
Instead of adding a complete SVE implementation, I'm thinking about "inherit" from NEON and only override code that can benefit from SVE. Looks to me many code will be the same for NEON and SVE.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 6, 2024

@xiegx94 , sve2-128 implementation is added. Arm common code is moved to common/arm_common/.
I checked sonic decoder benchmarks, no performance regression is found.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 6, 2024

Any convenient way to run clang-format job locally?

@xiegx94
Copy link
Collaborator

xiegx94 commented Aug 6, 2024

Any convenient way to run clang-format job locally?

Could you install clang in your machine? If you have a clang-format, run git clang-format

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 6, 2024

Any convenient way to run clang-format job locally?

Could you install clang in your machine? If you have a clang-format, run git clang-format

Thanks, format should be fixed now.

@cyb70289
Copy link
Contributor Author

cyb70289 commented Aug 6, 2024

"Test coverage" runs successfully on my local x86 server. Not sure why CI job fails. Looks it's only for x86?

@xiegx94
Copy link
Collaborator

xiegx94 commented Aug 29, 2024

@cyb70289 pls update cmake/set_arch_flags.cmake.

@xiegx94
Copy link
Collaborator

xiegx94 commented Aug 29, 2024

#93 FYI @cyb70289

@cyb70289
Copy link
Contributor Author

@cyb70289 pls update cmake/set_arch_flags.cmake.

@xiegx94 updated

This patch improves sonic json decoder performance on Arm SVE2 CPU.
It leverages SVMATCH instruction to locate multiple tokens in a vector
efficiently.

Enable this feature by specifying cmake option "-DENABLE_SVE2_128=ON".
Please note the binary can only run on hardware with SVE2 supported,
and the vector size must be 128 bits, like Neoverse-N2. Otherwise,
the code behaviour is undefined.

As shown in the table below, tested on bluewhale server, obvious
performance uplift is observed from sonic decoder benchmarks.
No side effect observed for other benchmarks.

| Benchmark                      | Original | SVE2    | Improvement |
|--------------------------------|----------|---------|-------------|
| gsoc-2018/Decode_SonicDyn      | 2.38736  | 2.76677 | 15.89%      |
| citm_catalog/Decode_SonicDyn   | 1.41729  | 1.76191 | 24.32%      |
| otfcc/Decode_SonicDyn          | 399.916  | 413.417 | 3.38%       |
| fgo/Decode_SonicDyn            | 691.597  | 716.301 | 3.57%       |
| twitter/Decode_SonicDyn        | 1.33604  | 1.58737 | 18.81%      |
| twitterescaped/Decode_SonicDyn | 1.24759  | 1.30216 | 4.37%       |
| github_events/Decode_SonicDyn  | 1.38961  | 1.65635 | 19.20%      |
| canada/Decode_SonicDyn         | 526.145  | 524.517 | -0.31%      |
| poet/Decode_SonicDyn           | 2.06297  | 2.40383 | 16.52%      |
| lottie/Decode_SonicDyn         | 419.902  | 438.824 | 4.51%       |
| book/Decode_SonicDyn           | 456.615  | 487.196 | 6.70%       |
Copy link
Collaborator

@xiegx94 xiegx94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiegx94 xiegx94 merged commit 70821cb into bytedance:master Sep 9, 2024
@cyb70289 cyb70289 deleted the sve2-128 branch September 9, 2024 09:24
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants