Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it? #9561

Open
alamb opened this issue Mar 11, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@alamb
Copy link
Contributor

alamb commented Mar 11, 2024

This ticket tracks adding a profile guided optimization to the documentation section and link to #9507

Many thanks to @@zamazan4ik for this wonderful content

Add a section to the documentation explaining that PGO can help up substantially (25%) and maybe offer some tips for users to use it?

Yes, it would be a great option. It requires almost no resources to maintain (write once and link to this discussion for the results). In this case, users who are interested in optimizing arrow-datafusion more will be able to use this information as an additional optimization opportunity. I have several examples of how such documentation can be written (it's for applications but anyway - for a library case it should look a similar way):

Provide pre-gathered PGO data somehow, so users could build DataFusion with profiles guided from TPCH (or clickbench).

Unfortunately, this way is a bit trickier in practice. Pre-gathered PGO profiles have multiple issues - e.g. incompatibilities between different compiler versions, a profile skew (when a PGO profile is gathered for an older version of the code. When time flies, pre-gathered PGO profiles become less and less efficient so some kind of regular PGO profile regeneration is required).

I could suggest another similar way - integrate into the build scripts the way to build the library with enabled PGO (based on some workload like TPCH, Clickbench, any other target workload, or any combination of them - it's up to discussion). On the one hand, users will be able to build the PGO-optimized version of the library. On another hand, you won't waste your maintenance resources on maintaining always up-to-date pre-gathered PGO profiles (however, this process can be simplified with CI).

Some examples of PGO build integration into the build scripts:

If you have some prebuilt versions of the library (e.g. a Python wheel), you can think about pre-optimizing these prebuilt binaries with PGO (based on TPCH, Clickbench, etc.). As an example - Pydantic-core: GitHub PR.

Originally posted by @zamazan4ik in #9507 (reply in thread)

@alamb alamb added the documentation Improvements or additions to documentation label Mar 11, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant