Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Change python invocation #1908

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Change python invocation #1908

wants to merge 6 commits into from

Conversation

8W9aG
Copy link
Contributor

@8W9aG 8W9aG commented Aug 26, 2024

  • Use -B to prevent python from writing pyc files, this is wasted effort due to the container being
    ephemeral
  • Set check-hash-based-pycs to never, this prevents python from scanning the entire file and
    calculating its hash to check on pyc hits, instead forces it into the timestamp validation step
  • Use -OO to force runtime optimizations such as ignorance of assets, debug flags and docstrings

* Use -B to prevent python from writing pyc files,
this is wasted effort due to the container being
ephemeral
* Set check-hash-based-pycs to never, this
prevents python from scanning the entire file and
calculating its hash to check on pyc hits, instead
forces it into the timestamp validation step
* Use -OO to force runtime optimizations such as
ignorance of assets, debug flags and docstrings
@nickstenning
Copy link
Member

I'm sure these optimizations improve things, but how do we know that and how are we tracking regressions? Can you add some benchmarks to the test suite that demonstrate these are improvements?

@8W9aG
Copy link
Contributor Author

8W9aG commented Aug 27, 2024

I'm sure these optimizations improve things, but how do we know that and how are we tracking regressions? Can you add some benchmarks to the test suite that demonstrate these are improvements?

In this case this is part of the "Compile All" tests from the Python Interpreter Speed Tests spreadsheet. It should be noted that this is a split up PR out of that, with the follow up PR focused on generating compiled python bytecode using find, however first we need to generate base images with the findutils installed (so hence the split of the original PR). Going from no precompiled bytecode to compiled bytecode yields a 50% decrease in interpreter boot time. The flags in this PR force the interpreter to act a certain way that is conducive to not stomping on the precompiled bytecode (although they should in general have other benefits as well in our case, but that has yet to be studied in isolation).

Sadly I don't have any great ideas around preventing regressions on this front, since general noise can be louder than the signal, and the study above does this 100 times to establish the signal. We could theoretically run the same test in our own test suites however it would take on the order of hours, and running on peoples dev machines might introduce even more noise making it very non-deterministic. Let me know if you have any ideas on how to prevent regressions here.

@technillogue
Copy link
Contributor

this won't help in prod as we override the entrypoint, the same change needs to be repeated elsewhere

@8W9aG
Copy link
Contributor Author

8W9aG commented Aug 27, 2024

this won't help in prod as we override the entrypoint, the same change needs to be repeated elsewhere

I'll make sure the change is repeated in the k8s, I do think they should be kept as similar as possible though to minimise drift between prod and dev.

@8W9aG 8W9aG mentioned this pull request Aug 27, 2024
8W9aG added 5 commits August 28, 2024 16:19
* PYTHONDONTWRITEBYTECODE is the same as
specifying -B
* PYTHONOPTIMIZE is the same as specifying -OO
Signed-off-by: Will Sackfield <sackfield@replicate.com>
* Mirroring the server side
Signed-off-by: Will Sackfield <sackfield@replicate.com>
@8W9aG
Copy link
Contributor Author

8W9aG commented Sep 5, 2024

Removed the optimize flags in line with the cluster discussion.

Copy link
Contributor

@tempusfrangit tempusfrangit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything that stands out as wrong with this but I'm going to tag in @mattt for a second set of eyes.

Copy link
Contributor

@mattt mattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems well-reasoned. I also don't have a good answer to @nickstenning's question about how to track regressions other than to try rolling this out and looking for aggregate error rates and performance metrics.

Copy link
Contributor

@mattt mattt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so that we can test this out internally before cutting a release. Following the same general approach described in #1858 (review).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants