-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Pod-startup regression due to probing periods #10973
Comments
/assign |
Added a rough sketch of what may be a additional workaround for this (amongst other things) in #10978. |
/area autoscale |
@evankanderson: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
We should separate 'cold-start' with Pod 'Readiness' as the activator doesn't wait for the Pod to be 'Ready' since it has it's own 200ms polling tight loop. Is there a follow up issue that captures the changes we should make when we bump to |
I agree in principle @dprotaso. Both measures are important for different circumstances. There's no followup issue (yet) but the code currently says
Do we still want to open a tracker? |
Made the issue - should be good :) |
I've been doing some investigation into pod startup times since that's something the autoscaling WG wants to optimize, and that made me stumble into wildly variant pod-startup time data. (Note: All data is captured using https://github.com/markusthoemmes/podspeed, I might totally have a bug there).
v0.21 and prior
Our assumption was, that the
exec
readiness probe would start running after the container has been started initially. That assumption is wrong. Generally, the probes run on their own timer async to pod startup (until recently, but more on that later). The readiness probe's period was bumped to 10s in #8147. If the container is not yet started when the timer ticks first, it'll only tick again 10s later, making for... well.. very slow startup times:HEAD
@julz swapped us over to using a startup probe that's running on a tight-ish timer. That in theory fixes the above, but the readiness probe is still running on a 10s period. Even with the startup probe passing, the readiness probe is required to mark the container ready. K8s v1.21 has a fix that kicks the readiness probe as soon as the startup probe passes, making that process very quick. As it stands however, the 10s timer causes the same friction as above, causing super variable startup times.
A few datapoints to illustrate
HEAD
HEAD (reducing the readiness probe period to 1s)
So... we should do something here! We definitely need E2E test coverage for cases like this, so we don't regress as badly in the future. Since #8147 is no longer an issue, I think we can safely reduce the period of our readiness probe (which is now an HTTP probe) to 1s to at least get us back down to more sensible and stable levels.
The text was updated successfully, but these errors were encountered: