Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978

julz · 2021-03-18T13:33:43Z

/area API
/area autoscale

Describe the feature

Today, if a user doesnt specify a readiness probe we introduce a default (tcp) probe for them. Since #10741 this is run via both a startup probe (using an intermediate exec probe in queue proxy), and as a readiness probe which runs after the startup probe succeeds. The reason we do this is to avoid flakes when scaling up, ensuring the pod is at least ready to accept tcp connections before routing to it.

Before StartupProbe was available, running this as a ReadinessProbe was really the only way to accomplish this (readiness was the only game in town even tho we only really, I think, cared about startup flakes). However, now that we have StartupProbe it might be possible to only run the tcp probe for the startupprobe, and then not run any readiness probe at all. This is technically a behaviour change since we would currently detect the pod going unready in this way after startup, but it (a) is a closer match to what the user actually asked for -- they didnt specify a probe at all, and wouldnt get one in raw k8s (b) would be a partial workaround for the issue in #10973 since we would avoid potentially missing the window to run the readiness probe after the startup probe completes (c) is simpler and more efficient (especially since, due to (b) we need the probe period to be as short as possible..). We also (d) might be able to do even better and do the startup probe in a PostStart lifecycle hook rather than an actual StartupProbe which might fix #10973 entirely (while still avoiding flakes on scale-up).

Note: this is a rough thought for discussion. It may (well) have a fatal flaw I haven't spotted yet.

markusthoemmes · 2021-03-18T14:20:15Z

FWIW, I toyed a bit with a PostStart hook executing our binary and that seems to perform in the same ballpark as containers without probes at all.

markusthoemmes · 2021-03-18T14:37:56Z

This has some great content on the tradeoffs we'd buy into when using lifecycle hooks: https://livebook.manning.com/book/kubernetes-in-action/chapter-17/64.

In a nutshell: Error reporting seems worse, so we prolly need to adjust the way we report errors from pods and the insight into why stuff fails seems to not be great. It'd also be interesting to doublecheck if the pod's IP is appearing as NonReadyAddress before the lifecycle runs, since it's going into a Pending state rather a NotReady state. I haven't checked on these pieces yet at all.

markusthoemmes · 2021-03-18T15:08:09Z

Tripleposting 😂

I did investigate the behavior of the postStart hook briefly, and sadly, I was right above: The pod isn't getting any IP while the post-start hook is running, so that'd destroy other optimizations that we have in the activator to completely bypass the readiness signals and just probe the container on our own and send traffic to it ASAP (without waiting for signals to propagate through the API for instance).

evankanderson · 2021-03-19T17:43:24Z

It sounds like this doesn't work for right now?

/close

(Feel free to reopen and add more notes if this should stay open)

knative-prow-robot · 2021-03-19T17:43:29Z

@evankanderson: Closing this issue.

In response to this:

It sounds like this doesn't work for right now?

/close

(Feel free to reopen and add more notes if this should stay open)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

julz · 2021-03-19T17:56:54Z

/reopen

The bit that doesn't work is the postStart idea, which was a little side comment, not the main point of the issue (the thread's a bit confusing I guess since we side tracked ourselves a bit on a side issue). Tldr I still think it might make sense to avoid the unnecessary readiness check (every second x every pod) that the user didn't ask for since we only really need a startup check to do what we want (avoid flakes on scale up).

knative-prow-robot · 2021-03-19T17:56:58Z

@julz: Reopened this issue.

In response to this:

/reopen

The bit that doesn't work is the postStart idea, which was a little side comment, not the main point of the issue (the thread's a bit confusing I guess since we side tracked ourselves a bit on a side issue). Tldr I still think it might make sense to avoid the unnecessary readiness check (every second x every pod) that the user didn't ask for since we only really need a startup check to do what we want (avoid flakes on scale up).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

evankanderson · 2021-03-22T00:23:51Z

/kind enhancement
/remove-kind feature

Does this need a proposal before implementation?

/triage assigned

knative-prow-robot · 2021-03-22T00:23:52Z

@evankanderson: The label(s) triage/assigned cannot be applied, because the repository doesn't have them.

In response to this:

/kind enhancement
/remove-kind feature

Does this need a proposal before implementation?

/triage assigned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

evankanderson · 2021-03-22T00:23:56Z

/triage accepted

github-actions · 2021-06-20T01:25:33Z

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

psschwei · 2021-11-01T19:02:16Z

Not defaulting in a readiness probe would be useful for the container-freezer, as a readiness probe on a frozen container isn't likely to be very useful (and I'm pretty sure it doesn't make sense to wake up a frozen container every second for a probe :) )

evankanderson · 2021-11-02T13:45:47Z

I don't have an intrinsic concern with this on second thought (my first thought was "oop, don't do that"). It does seem like the container-freezer might want to run the startup probe on un-freeze, and would still need to be able to handle containers with an explicit readiness check.

psschwei · 2021-11-16T20:13:51Z

Copying @julz 's comment on potentially swapping out the readiness probe for a startup probe: #12281 (comment)

psschwei · 2022-02-04T19:30:16Z

/assign

xtaje · 2022-07-14T12:54:19Z

... optimizations that we have in the activator to completely bypass the readiness signals and just probe the container on our own and send traffic to it ASAP (without waiting for signals to propagate through the API for instance).

Is there any documentation on this behavior or reasons behind it? I found it quite surprising and only figured it out after debugging why a Pod revision was not switching over to ready, despite the service container reporting as ready. Our service needs several minutes during startup to load a large data model, and the readinessProbe parameters we had chosen were causing funny behavior.

For the use case I have, the extra latency of checking the container status via API would not be a problem. It seems that asking the API for ready state would allow you to use exec probes too, and not change the behavior of the container probe config also.

psschwei · 2022-07-14T13:36:43Z

It's a bit of a long story (most of which was before my time, but there's a number of issues from early 2021 (?) that provide more details), but the short version is: containers are often "ready" before the API shows they are Ready, so checking the container directly ends up being faster.

dprotaso · 2022-08-02T19:08:46Z

Not defaulting in a readiness probe would be useful for the container-freezer,

FWIW - If I was a user (using container-freezer) and I configure a readiness probe I would expect it to probe while the container is active. But I would expect it to be paused when the container is paused.

I'm unsure if that's possible in K8s right now

psschwei · 2022-08-02T19:25:50Z

Yeah, the case where a user specifically added a probe would be.... tricky, to say the least. I think the order of doing probe updates would go something like:

add default startup probe (Support startupProbe in webhook #10037 )
remove default readiness probe (this one)
figure out how the heck to handle user readiness probes if freezer is enabled (no issue yet)

julz added the kind/feature Well-understood/specified features, ready for coding. label Mar 18, 2021

knative-prow-robot added area/API API objects and controllers area/autoscale labels Mar 18, 2021

julz mentioned this issue Mar 18, 2021

Pod-startup regression due to probing periods #10973

Closed

knative-prow-robot closed this as completed Mar 19, 2021

knative-prow-robot reopened this Mar 19, 2021

knative-prow-robot added the kind/enhancement label Mar 22, 2021

knative-prow-robot removed the kind/feature Well-understood/specified features, ready for coding. label Mar 22, 2021

knative-prow-robot added the triage/accepted Issues which should be fixed (post-triage) label Mar 22, 2021

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2021

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2021

psschwei mentioned this issue Nov 15, 2021

[WIP] Remove readiness probe when using container freezer #12281

Closed

knative-prow-robot assigned psschwei Feb 4, 2022

dprotaso added this to the Icebox milestone Aug 2, 2022

psschwei removed their assignment Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978

Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978

julz commented Mar 18, 2021

markusthoemmes commented Mar 18, 2021

markusthoemmes commented Mar 18, 2021

markusthoemmes commented Mar 18, 2021

evankanderson commented Mar 19, 2021

knative-prow-robot commented Mar 19, 2021

julz commented Mar 19, 2021

knative-prow-robot commented Mar 19, 2021

evankanderson commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

evankanderson commented Mar 22, 2021

github-actions bot commented Jun 20, 2021

psschwei commented Nov 1, 2021

evankanderson commented Nov 2, 2021

psschwei commented Nov 16, 2021

psschwei commented Feb 4, 2022

xtaje commented Jul 14, 2022

psschwei commented Jul 14, 2022

dprotaso commented Aug 2, 2022 •

edited

Loading

psschwei commented Aug 2, 2022

Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978

Would it be terrible if: we default to no readiness probe if the user doesnt ask for one #10978

Comments

julz commented Mar 18, 2021

Describe the feature

markusthoemmes commented Mar 18, 2021

markusthoemmes commented Mar 18, 2021

markusthoemmes commented Mar 18, 2021

evankanderson commented Mar 19, 2021

knative-prow-robot commented Mar 19, 2021

julz commented Mar 19, 2021

knative-prow-robot commented Mar 19, 2021

evankanderson commented Mar 22, 2021

knative-prow-robot commented Mar 22, 2021

evankanderson commented Mar 22, 2021

github-actions bot commented Jun 20, 2021

psschwei commented Nov 1, 2021

evankanderson commented Nov 2, 2021

psschwei commented Nov 16, 2021

psschwei commented Feb 4, 2022

xtaje commented Jul 14, 2022

psschwei commented Jul 14, 2022

dprotaso commented Aug 2, 2022 • edited Loading

psschwei commented Aug 2, 2022

dprotaso commented Aug 2, 2022 •

edited

Loading