-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978
Comments
FWIW, I toyed a bit with a PostStart hook executing our binary and that seems to perform in the same ballpark as containers without probes at all. |
This has some great content on the tradeoffs we'd buy into when using lifecycle hooks: https://livebook.manning.com/book/kubernetes-in-action/chapter-17/64. In a nutshell: Error reporting seems worse, so we prolly need to adjust the way we report errors from pods and the insight into why stuff fails seems to not be great. It'd also be interesting to doublecheck if the pod's IP is appearing as NonReadyAddress before the lifecycle runs, since it's going into a Pending state rather a NotReady state. I haven't checked on these pieces yet at all. |
Tripleposting 😂 I did investigate the behavior of the postStart hook briefly, and sadly, I was right above: The pod isn't getting any IP while the post-start hook is running, so that'd destroy other optimizations that we have in the activator to completely bypass the readiness signals and just probe the container on our own and send traffic to it ASAP (without waiting for signals to propagate through the API for instance). |
It sounds like this doesn't work for right now? /close (Feel free to reopen and add more notes if this should stay open) |
@evankanderson: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen The bit that doesn't work is the postStart idea, which was a little side comment, not the main point of the issue (the thread's a bit confusing I guess since we side tracked ourselves a bit on a side issue). Tldr I still think it might make sense to avoid the unnecessary readiness check (every second x every pod) that the user didn't ask for since we only really need a startup check to do what we want (avoid flakes on scale up). |
@julz: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/kind enhancement Does this need a proposal before implementation? /triage assigned |
@evankanderson: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
This issue is stale because it has been open for 90 days with no |
Not defaulting in a readiness probe would be useful for the container-freezer, as a readiness probe on a frozen container isn't likely to be very useful (and I'm pretty sure it doesn't make sense to wake up a frozen container every second for a probe :) ) |
I don't have an intrinsic concern with this on second thought (my first thought was "oop, don't do that"). It does seem like the container-freezer might want to run the startup probe on un-freeze, and would still need to be able to handle containers with an explicit readiness check. |
Copying @julz 's comment on potentially swapping out the readiness probe for a startup probe: #12281 (comment) |
/assign |
Is there any documentation on this behavior or reasons behind it? I found it quite surprising and only figured it out after debugging why a Pod revision was not switching over to ready, despite the service container reporting as ready. Our service needs several minutes during startup to load a large data model, and the readinessProbe parameters we had chosen were causing funny behavior. For the use case I have, the extra latency of checking the container status via API would not be a problem. It seems that asking the API for ready state would allow you to use exec probes too, and not change the behavior of the container probe config also. |
It's a bit of a long story (most of which was before my time, but there's a number of issues from early 2021 (?) that provide more details), but the short version is: containers are often "ready" before the API shows they are Ready, so checking the container directly ends up being faster. |
FWIW - If I was a user (using container-freezer) and I configure a readiness probe I would expect it to probe while the container is active. But I would expect it to be paused when the container is paused. I'm unsure if that's possible in K8s right now |
Yeah, the case where a user specifically added a probe would be.... tricky, to say the least. I think the order of doing probe updates would go something like:
|
/area API
/area autoscale
Describe the feature
Today, if a user doesnt specify a readiness probe we introduce a default (tcp) probe for them. Since #10741 this is run via both a startup probe (using an intermediate exec probe in queue proxy), and as a readiness probe which runs after the startup probe succeeds. The reason we do this is to avoid flakes when scaling up, ensuring the pod is at least ready to accept tcp connections before routing to it.
Before StartupProbe was available, running this as a ReadinessProbe was really the only way to accomplish this (readiness was the only game in town even tho we only really, I think, cared about startup flakes). However, now that we have StartupProbe it might be possible to only run the tcp probe for the startupprobe, and then not run any readiness probe at all. This is technically a behaviour change since we would currently detect the pod going unready in this way after startup, but it (a) is a closer match to what the user actually asked for -- they didnt specify a probe at all, and wouldnt get one in raw k8s (b) would be a partial workaround for the issue in #10973 since we would avoid potentially missing the window to run the readiness probe after the startup probe completes (c) is simpler and more efficient (especially since, due to (b) we need the probe period to be as short as possible..). We also (d) might be able to do even better and do the startup probe in a PostStart lifecycle hook rather than an actual StartupProbe which might fix #10973 entirely (while still avoiding flakes on scale-up).
Note: this is a rough thought for discussion. It may (well) have a fatal flaw I haven't spotted yet.
The text was updated successfully, but these errors were encountered: