add warmup duration secs api #2153

ramaraochavali · 2021-11-15T06:28:04Z

API For istio/istio#21228

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

howardjohn · 2021-11-15T19:36:37Z

A few questions

If a pod is removed and then latter added back (say, from a blip in readiness probe) does it re-warm?
If we use create a subset and are just adding more endpoints to that subset, are they warmed?

ramaraochavali · 2021-11-16T04:09:35Z

If a pod is removed and then latter added back (say, from a blip in readiness probe) does it re-warm?

Yes. As long as Envoy sees the membership change it warms the new pods

If we use create a subset and are just adding more endpoints to that subset, are they warmed?

Depends on subset traffic policy. If we enable this in subset traffic policy it warms subset endpoints also.

howardjohn · 2021-11-16T05:54:29Z

I understand why it would warm when we have a readiness blip but it does seem like unexpected behavior. Tbh in this scenario we should probably send the endpoint as UNHEALTHY for other reasons as well (locality lb),which would fix this issue? wdyt?

…

On Mon, Nov 15, 2021, 8:09 PM Rama Chavali ***@***.***> wrote: If a pod is removed and then latter added back (say, from a blip in readiness probe) does it re-warm? Yes. As long as Envoy sees the membership change it warms the new pods If we use create a subset and are just adding more endpoints to that subset, are they warmed? Depends on subset traffic policy. If we enable this in subset traffic policy it warms subset endpoints also. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#2153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXMNFXDWCYKYMTZ72I3UMHKQVANCNFSM5IA4RLOA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ramaraochavali · 2021-11-16T07:32:54Z

I understand why it would warm when we have a readiness blip but it does
seem like unexpected behavior.

Agree but I think it would not create too much of an issue because it is just 30-60s which is the typical configuration used.

Tbh in this scenario we should probably send the endpoint as UNHEALTHY for
other reasons as well (locality lb),which would fix this issue? wdyt?

I agree. I remember we discussed this some time back not sure why we did not change?

ramaraochavali · 2021-11-23T10:09:41Z

@howardjohn WDYT about this?

howardjohn · 2021-11-23T16:09:27Z

I am a bit concerned about the behaviors of the warming occurring at times other than pod startup. Maybe users requesting this like @Stono and @BradErz can give some feedback on this?

ramaraochavali · 2021-11-24T04:00:42Z

@howardjohn While I share the same concern about possibility of unnecessary warmups, I also think it is safe change because

The unnecessary warmups could be a rare scenario (affecting less endpoint?) and in the grand scheme of may not matter much and the requests wont be failing any ways
It is opt-in. People can try and if it creates issues they can turn it off

hzxuzhonghu · 2021-11-25T08:29:19Z

Does it rewarm when eds reload when the other pods down/up?

ramaraochavali · 2021-11-25T08:57:27Z

If endpoint IP changes the new endpoint will be warmed

Stono · 2021-11-25T10:45:05Z

👋
Firstly, thanks so much for working on this, it'll be a great feature. And I totally appreciate this is a first pass implementation so my comments below are based on utopian goals.

I understand why it would warm when we have a readiness blip but it does seem like unexpected behavior.

Yes, we certainly wouldn't want to re warm in the following scenarios:

Readiness Probe going unready
Outlier Detection triggering

We would want to warm:

When the pod first starts
If the pod restarts (crash, liveness probe, preemption)

From my perspective the purpose of this feature is to facilitate warming of JIT runtimes such as the JVM, so that only needs to happen on startup.

We would not use this feature if it warmed in readiness probe failures etc, that would increase our MTR during issues (for example, service 1 depends on some external service, it fails it readinessProbe because some external service is down, as soon as some external service is up, it becomes ready again - we wouldn't want to delay recovery of the service by 60s to facilitate warming).

ramaraochavali · 2021-11-25T10:51:58Z

Readiness Probe going unready
Outlier Detection triggering

Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn mentioned here #2153 (comment). I will fix that.

For outlier detection - Envoy does not warm.

howardjohn · 2021-11-29T16:59:58Z

Tricky thing is readiness probe vs liveness probe cannot (reasonably) be determined by Istiod. Likely we will need to do the same for both of those cases unfortunately.

…

On Thu, Nov 25, 2021 at 2:52 AM Rama Chavali ***@***.***> wrote: Readiness Probe going unready Outlier Detection triggering Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn <https://github.com/howardjohn> mentioned here #2153 (comment) <#2153 (comment)>. I will fix that. For outlier detection - Envoy does not warm. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXLQZ6ISAGFL72KTKK3UNYINTANCNFSM5IA4RLOA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Stono · 2021-11-29T17:30:18Z

Tricky thing is readiness probe vs liveness probe cannot (reasonably) be determined by Istiod. Likely we will need to do the same for both of those cases unfortunately.
…
On Thu, Nov 25, 2021 at 2:52 AM Rama Chavali @.***> wrote: Readiness Probe going unready Outlier Detection triggering Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn https://github.com/howardjohn mentioned here #2153 (comment) <#2153 (comment)>. I will fix that. For outlier detection - Envoy does not warm. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2153 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXLQZ6ISAGFL72KTKK3UNYINTANCNFSM5IA4RLOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Because they're both observed as endpoint changes and either would remove it from the endpoint list?

If the choice was between:

Not warming after a liveness probe failure (bad), but also not warming after readiness probe blips (good)
Warming after a liveness probe failure (good), but also warmed after readiness probe blips (bad)

Eg, warm once after the pod is created but never again.

howardjohn · 2021-11-29T17:44:11Z

Yeah exactly. Hypothetically we could try to read to pod spec and reverse engineer that it was killed by a liveness probe but I suspect that would be quite tricky. I think I missed what choice you prefer - choice (1)? That would be the one I lean towards On Mon, Nov 29, 2021 at 9:30 AM Karl Stoney ***@***.***> wrote:

…

Tricky thing is readiness probe vs liveness probe cannot (reasonably) be determined by Istiod. Likely we will need to do the same for both of those cases unfortunately. … <#m_520272831354884788_> On Thu, Nov 25, 2021 at 2:52 AM Rama Chavali *@*.***> wrote: Readiness Probe going unready Outlier Detection triggering Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn <https://github.com/howardjohn> https://github.com/howardjohn mentioned here #2153 <#2153> (comment) <#2153 (comment) <#2153 (comment)>>. I will fix that. For outlier detection - Envoy does not warm. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2153 (comment) <#2153 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXLQZ6ISAGFL72KTKK3UNYINTANCNFSM5IA4RLOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . Because they're both observed as endpoint changes and either would remove it from the endpoint list? If the choice was between: 1. Not warming after a liveness probe failure (bad), but also not warming after readiness probe blips (good) 2. Warming after a liveness probe failure (good), but also warmed after readiness probe blips (bad) Eg, warm once after the pod is created but never again. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2153 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEYGXLTWJD43EFXD5AVJ5DUOO2DJANCNFSM5IA4RLOA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Stono · 2021-11-29T17:46:06Z

Haha yes, sorry, I missed it off, #1!
The logic there being liveiness probe failures are rare as usually indicate bigger issues, readiness probes are more frequent and can be considered somewhat BAU. Happy for a pod not to warm after a liveness failure because i consider warming an optimisation anyway, and then just optimise for the happy path (new pods only).

jiangshantao-dbg · 2021-12-28T07:38:27Z

can we cherry pick Support slow Start mode in Envoy #13176 to istio/envoy so that we can use EnvoyFilter to use this feature first. @howardjohn

ramaraochavali · 2021-12-28T08:03:26Z

@jiangshantao-dbg it is already available in master. Are you asking for cherry picking to 1.12 - if yes, we do not generally cherry pick features to release.

jiangshantao-dbg · 2021-12-28T08:08:15Z

@jiangshantao-dbg it is already available in master. Are you asking for cherry picking to 1.12 - if yes, we do not generally cherry pick features to release.

ok. i got it. thank you!

ramaraochavali · 2022-02-11T06:55:51Z

@howardjohn Now that we are sending unready endpoints, can we add this field?

add warmup duration secs api

58a03d1

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

ramaraochavali requested review from ericvn, howardjohn, linsun, louiscryan, nrjpoddar and smawson as code owners November 15, 2021 06:28

google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Nov 15, 2021

istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 15, 2021

add release notes

9f6720e

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

ramaraochavali mentioned this pull request Jan 19, 2022

send unready endpoints as unhealthy istio/istio#36274

Merged

10 tasks

istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 8, 2022

resolve conflicts

7112c67

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 10, 2022

howardjohn approved these changes Feb 11, 2022

View reviewed changes

ericvn approved these changes Feb 11, 2022

View reviewed changes

istio-testing merged commit 6ad61f9 into istio:master Feb 11, 2022

ramaraochavali deleted the fix/slow_start branch February 12, 2022 06:24

This was referenced Feb 28, 2022

Add support for SlowStart mode istio/istio#37582

Closed

Configure warmupDurationSecs for slow start mode istio/istio#37583

Merged

rsalmond mentioned this pull request May 25, 2022

warmup_duration_secs missing from LoadBalancerSettings docs istio/istio.io#11354

Closed

recarga-brubs mentioned this pull request Jan 24, 2023

Support for istio's warmupDurationSecs fluxcd/flagger#1350

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add warmup duration secs api #2153

add warmup duration secs api #2153

ramaraochavali commented Nov 15, 2021

howardjohn commented Nov 15, 2021

ramaraochavali commented Nov 16, 2021

howardjohn commented Nov 16, 2021 via email

ramaraochavali commented Nov 16, 2021

ramaraochavali commented Nov 23, 2021

howardjohn commented Nov 23, 2021

ramaraochavali commented Nov 24, 2021

hzxuzhonghu commented Nov 25, 2021

ramaraochavali commented Nov 25, 2021

Stono commented Nov 25, 2021 •

edited

Loading

ramaraochavali commented Nov 25, 2021

howardjohn commented Nov 29, 2021 via email

Stono commented Nov 29, 2021

howardjohn commented Nov 29, 2021 via email

Stono commented Nov 29, 2021

jiangshantao-dbg commented Dec 28, 2021

ramaraochavali commented Dec 28, 2021

jiangshantao-dbg commented Dec 28, 2021

ramaraochavali commented Feb 11, 2022

add warmup duration secs api #2153

add warmup duration secs api #2153

Conversation

ramaraochavali commented Nov 15, 2021

howardjohn commented Nov 15, 2021

ramaraochavali commented Nov 16, 2021

howardjohn commented Nov 16, 2021 via email

ramaraochavali commented Nov 16, 2021

ramaraochavali commented Nov 23, 2021

howardjohn commented Nov 23, 2021

ramaraochavali commented Nov 24, 2021

hzxuzhonghu commented Nov 25, 2021

ramaraochavali commented Nov 25, 2021

Stono commented Nov 25, 2021 • edited Loading

ramaraochavali commented Nov 25, 2021

howardjohn commented Nov 29, 2021 via email

Stono commented Nov 29, 2021

howardjohn commented Nov 29, 2021 via email

Stono commented Nov 29, 2021

jiangshantao-dbg commented Dec 28, 2021

ramaraochavali commented Dec 28, 2021

jiangshantao-dbg commented Dec 28, 2021

ramaraochavali commented Feb 11, 2022

Stono commented Nov 25, 2021 •

edited

Loading