Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

alamb · 2024-03-11T19:27:41Z

This ticket tracks adding a profile guided optimization to the documentation section and link to #9507

Many thanks to @@zamazan4ik for this wonderful content

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it?

Yes, it would be a great option. It requires almost no resources to maintain (write once and link to this discussion for the results). In this case, users who are interested in optimizing arrow-datafusion more will be able to use this information as an additional optimization opportunity. I have several examples of how such documentation can be written (it's for applications but anyway - for a library case it should look a similar way):

ClickHouse: https://clickhouse.com/docs/en/operations/optimizing-performance/profile-guided-optimization
Databend: https://databend.rs/doc/contributing/pgo
Vector: https://vector.dev/docs/administration/tuning/pgo/
Nebula: https://docs.nebula-graph.io/3.5.0/8.service-tuning/enable_autofdo_for_nebulagraph/
GCC: Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
Clang:
- https://llvm.org/docs/HowToBuildWithPGO.html
- https://llvm.org/docs/AdvancedBuilds.html
Rustc: https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization
tsv-utils: https://github.com/eBay/tsv-utils/blob/master/docs/BuildingWithLTO.md

Provide pre-gathered PGO data somehow, so users could build DataFusion with profiles guided from TPCH (or clickbench).

Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO profiles have multiple issues - e.g. incompatibilities between different compiler versions, a profile skew (when a PGO profile is gathered for an older version of the code. When time flies, pre-gathered PGO profiles become less and less efficient so some kind of regular PGO profile regeneration is required).

I could suggest another similar way - integrate into the build scripts the way to build the library with enabled PGO (based on some workload like TPCH, Clickbench, any other target workload, or any combination of them - it's up to discussion). On the one hand, users will be able to build the PGO-optimized version of the library. On another hand, you won't waste your maintenance resources on maintaining always up-to-date pre-gathered PGO profiles (however, this process can be simplified with CI).

Some examples of PGO build integration into the build scripts:

Rustc: a CI tool for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script.
Clang:
- Docs
- MinGW build script
Python:
- CPython: README
- Pyston: README
Go: Bash script
Swift: CMake script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag
ISPC: CMake scipts
NodeJS: Configure script
Android Open Source Project (AOSP):
- Official documentation
- Committed PGO profiles: repository
DMD: Custom build rule
LDC: GitHub action
tsv-utils: Makefile
Erlang OTP: Makefile
Clingo (PGO enabled only in Spack): Package recipe
SWI-Prolog:
- Script
- CMake module
hck: Justfile

If you have some prebuilt versions of the library (e.g. a Python wheel), you can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, Clickbench, etc.). As an example - Pydantic-core: GitHub PR.

Originally posted by @zamazan4ik in #9507 (reply in thread)

The text was updated successfully, but these errors were encountered:

alamb added the documentation Improvements or additions to documentation label Mar 11, 2024

alamb mentioned this issue Mar 11, 2024

[Epic] Better / Improved Documentation, Tutorials and Examples #7013

Open

32 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

alamb commented Mar 11, 2024 •

edited

Loading

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

Comments

alamb commented Mar 11, 2024 • edited Loading

alamb commented Mar 11, 2024 •

edited

Loading