Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Strong performance regression with target-cpu=native #190

Open
fr0staman opened this issue Dec 22, 2023 · 11 comments
Open

Strong performance regression with target-cpu=native #190

fr0staman opened this issue Dec 22, 2023 · 11 comments

Comments

@fr0staman
Copy link

So, ahash with target-cpu=native on my setup shows significant performance regression
This may be a Rust/LLVM issue, but I'll create an issue here first.

Repro:
https://github.com/fr0staman/rust-ahash-target-native-performance-issue

My setup

Rust:

rustc 1.74.1 (a28077b28 2023-12-04)
binary: rustc
commit-hash: a28077b28a02b92985b3a3faecf92813155f1ea1
commit-date: 2023-12-04
host: x86_64-unknown-linux-gnu
release: 1.74.1
LLVM version: 17.0.4

System:

CPU: AMD Ryzen 5 4500U
OS: Ubuntu 22.04.3 LTS

Results

Standard target

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ cargo bench
    Finished bench [optimized] target(s) in 36.18s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [21.672 µs 21.698 µs 21.727 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
Performance/ahash/(256, 1024)
                        time:   [983.01 µs 983.94 µs 984.92 µs]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
Performance/ahash/(1024, 4096)
                        time:   [15.256 ms 15.298 ms 15.341 ms]

target-cpu=native

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
    Finished bench [optimized] target(s) in 46.42s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-4df22a78d1110619)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/ahash-3b7ae86a7bc7bacb)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.734 µs 37.761 µs 37.789 µs]
                        change: [+73.336% +73.657% +73.980%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.4681 ms 2.4698 ms 2.4717 ms]
                        change: [+150.51% +150.90% +151.29%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.308 ms 38.369 ms 38.433 ms]
                        change: [+149.98% +150.82% +151.60%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
@puuuuh
Copy link

puuuuh commented Dec 24, 2023

example can be reduced to target-feature=+aes

@tkaitchuck
Copy link
Owner

It looks like this bench is only hashing char which SHOULD be specialized both cases. (Ideally to identical instructions.) I'll take a look this.

@tkaitchuck
Copy link
Owner

This does not appear to happen on my intel i9. There must be something odd in the assembly for the Ryzen.
If +aes is giving identical performance to native it is possible it's not picking up the sse2 instructions for some reason.

@tkaitchuck
Copy link
Owner

@fr0staman If you run rustc --print=target-cpus what does it indicate the detected CPU target is?

@tkaitchuck
Copy link
Owner

This might be related rust-lang/rust#80633

@0xdeafbeef
Copy link

rustc --print=target-cpus

Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver4).
    alderlake
    amdfam10
    athlon
    athlon-4
    athlon-fx
    athlon-mp
    athlon-tbird
    athlon-xp
    athlon64
    athlon64-sse3
    atom
    atom_sse4_2
    atom_sse4_2_movbe
    barcelona
    bdver1
    bdver2
    bdver3
    bdver4
    bonnell
    broadwell
    btver1
    btver2
    c3
    c3-2
    cannonlake
    cascadelake
    cooperlake
    core-avx-i
    core-avx2
    core2
    core_2_duo_sse4_1
    core_2_duo_ssse3
    core_2nd_gen_avx
    core_3rd_gen_avx
    core_4th_gen_avx
    core_4th_gen_avx_tsx
    core_5th_gen_avx
    core_5th_gen_avx_tsx
    core_aes_pclmulqdq
    core_i7_sse4_2
    corei7
    corei7-avx
    emeraldrapids
    generic
    geode
    goldmont
    goldmont-plus
    goldmont_plus
    grandridge
    graniterapids
    graniterapids-d
    graniterapids_d
    haswell
    i386
    i486
    i586
    i686
    icelake-client
    icelake-server
    icelake_client
    icelake_server
    ivybridge
    k6
    k6-2
    k6-3
    k8
    k8-sse3
    knl
    knm
    lakemont
    meteorlake
    mic_avx512
    nehalem
    nocona
    opteron
    opteron-sse3
    penryn
    pentium
    pentium-m
    pentium-mmx
    pentium2
    pentium3
    pentium3m
    pentium4
    pentium4m
    pentium_4
    pentium_4_sse3
    pentium_ii
    pentium_iii
    pentium_iii_no_xmm_regs
    pentium_m
    pentium_mmx
    pentium_pro
    pentiumpro
    prescott
    raptorlake
    rocketlake
    sandybridge
    sapphirerapids
    sierraforest
    silvermont
    skx
    skylake
    skylake-avx512
    skylake_avx512
    slm
    tigerlake
    tremont
    westmere
    winchip-c6
    winchip2
    x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
    x86-64-v2
    x86-64-v3
    x86-64-v4
    yonah
    znver1
    znver2
    znver3
    znver4
    ```
    
    Also has regression

@fr0staman
Copy link
Author

rustc --print=target-cpus

Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver1).
    alderlake
    amdfam10
    athlon
    athlon-4
    athlon-fx
    athlon-mp
    athlon-tbird
    athlon-xp
    athlon64
    athlon64-sse3
    atom
    atom_sse4_2
    atom_sse4_2_movbe
    barcelona
    bdver1
    bdver2
    bdver3
    bdver4
    bonnell
    broadwell
    btver1
    btver2
    c3
    c3-2
    cannonlake
    cascadelake
    cooperlake
    core-avx-i
    core-avx2
    core2
    core_2_duo_sse4_1
    core_2_duo_ssse3
    core_2nd_gen_avx
    core_3rd_gen_avx
    core_4th_gen_avx
    core_4th_gen_avx_tsx
    core_5th_gen_avx
    core_5th_gen_avx_tsx
    core_aes_pclmulqdq
    core_i7_sse4_2
    corei7
    corei7-avx
    emeraldrapids
    generic
    geode
    goldmont
    goldmont-plus
    goldmont_plus
    grandridge
    graniterapids
    graniterapids-d
    graniterapids_d
    haswell
    i386
    i486
    i586
    i686
    icelake-client
    icelake-server
    icelake_client
    icelake_server
    ivybridge
    k6
    k6-2
    k6-3
    k8
    k8-sse3
    knl
    knm
    lakemont
    meteorlake
    mic_avx512
    nehalem
    nocona
    opteron
    opteron-sse3
    penryn
    pentium
    pentium-m
    pentium-mmx
    pentium2
    pentium3
    pentium3m
    pentium4
    pentium4m
    pentium_4
    pentium_4_sse3
    pentium_ii
    pentium_iii
    pentium_iii_no_xmm_regs
    pentium_m
    pentium_mmx
    pentium_pro
    pentiumpro
    prescott
    raptorlake
    rocketlake
    sandybridge
    sapphirerapids
    sierraforest
    silvermont
    skx
    skylake
    skylake-avx512
    skylake_avx512
    slm
    tigerlake
    tremont
    westmere
    winchip-c6
    winchip2
    x86-64                  - This is the default target CPU for the current build target (currently x86_64-unknown-linux-gnu).
    x86-64-v2
    x86-64-v3
    x86-64-v4
    yonah
    znver1
    znver2
    znver3
    znver4

@Pzixel
Copy link

Pzixel commented Dec 30, 2023

@tkaitchuck I actually think this issue might be relevant: https://internals.rust-lang.org/t/slower-code-with-c-target-cpu-native/17315/7

@0xdeafbeef
Copy link

https://share.firefox.dev/3RWEHk5 without aes flag
https://share.firefox.dev/48D3E9Y with aes flag

image
image

Aes feature is indeed detected

@tkaitchuck
Copy link
Owner

@fr0staman Can you check if this is fixed on the 0.9 prerelease branch

@fr0staman
Copy link
Author

Certainly!

Unfortunately, nothing has changed:

fr0staman@kotobook:~/source/repos/rust/rust-ahash-target-native-performance-issue$ RUSTFLAGS='-C target-cpu=native' cargo bench
   ...
   Compiling ahash v0.9.0 (https://github.com/tkaitchuck/aHash?branch=0.9-prerelease#af37d79e)
   ...
    Finished bench [optimized] target(s) in 43.16s
     Running unittests src/main.rs (target/release/deps/rust_ahash_target_native_performance_issue-a98c230d15dcf9ae)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/issue.rs (target/release/deps/issue-a3d835f7ef64d9be)
Gnuplot not found, using plotters backend
Performance/ahash/(32, 128)
                        time:   [37.539 µs 37.543 µs 37.546 µs]
                        change: [+97.437% +97.897% +98.305%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Performance/ahash/(256, 1024)
                        time:   [2.3726 ms 2.3733 ms 2.3740 ms]
                        change: [+156.12% +156.46% +156.76%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Performance/ahash/(1024, 4096)
                        time:   [38.066 ms 38.109 ms 38.153 ms]
                        change: [+154.20% +155.09% +155.95%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants