Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

Open
zamazan4ik opened this issue Nov 20, 2023 · 0 comments

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to the tests, PGO can help with achieving better performance. That's why I think trying to optimize tokei with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

  • Fedora 39
  • Linux kernel 6.5.11
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.74
  • tokei version: the latest for now from the master branch on commit c8e4d0703252c87b1df45382b365c6bb00769dbe
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use tokei with tokei llvm-project command, where llvm-project is full LLVM project checkout. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase via tokei built with cargo pgo build. PGO optimized results I got with tokei built with cargo pgo optimize build.

Results

I got the following results:

hyperfine --warmup 10 --min-runs 50 './tokei_release ../../llvm-project >> /dev/null' './tokei_optimized ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_release ../../llvm-project >> /dev/null
  Time (mean ± σ):     630.2 ms ±  15.5 ms    [User: 4380.7 ms, System: 1760.9 ms]
  Range (min … max):   582.3 ms … 666.2 ms    50 runs

Benchmark 2: ./tokei_optimized ../../llvm-project >> /dev/null
  Time (mean ± σ):     576.7 ms ±  16.5 ms    [User: 3227.9 ms, System: 1820.6 ms]
  Range (min … max):   521.0 ms … 608.9 ms    50 runs

Summary
  ./tokei_optimized ../../llvm-project >> /dev/null ran
    1.09 ± 0.04 times faster than ./tokei_release ../../llvm-project >> /dev/null

Just for reference, Tokei in instrumented mode timings:

hyperfine --warmup 1 --min-runs 1 './tokei_instrumented ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_instrumented ../../llvm-project >> /dev/null
  Time (abs ≡):        27.329 s               [User: 623.284 s, System: 1.771 s]

At least in the scenario above, PGO helps with optimizing tokei performance.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on tokei. If it shows improvements - add a note to the documentation about possible improvements in tokei performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize tokei according to their workloads.
  • Optimize pre-built binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant