Skip to content

add warmup duration secs api #2153

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 3 commits into from
Feb 11, 2022
Merged

Conversation

ramaraochavali
Copy link
Contributor

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@google-cla google-cla bot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Nov 15, 2021
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 15, 2021
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@howardjohn
Copy link
Member

A few questions

  • If a pod is removed and then latter added back (say, from a blip in readiness probe) does it re-warm?
  • If we use create a subset and are just adding more endpoints to that subset, are they warmed?

@ramaraochavali
Copy link
Contributor Author

If a pod is removed and then latter added back (say, from a blip in readiness probe) does it re-warm?

Yes. As long as Envoy sees the membership change it warms the new pods

If we use create a subset and are just adding more endpoints to that subset, are they warmed?

Depends on subset traffic policy. If we enable this in subset traffic policy it warms subset endpoints also.

@howardjohn
Copy link
Member

howardjohn commented Nov 16, 2021 via email

@ramaraochavali
Copy link
Contributor Author

I understand why it would warm when we have a readiness blip but it does
seem like unexpected behavior.

Agree but I think it would not create too much of an issue because it is just 30-60s which is the typical configuration used.

Tbh in this scenario we should probably send the endpoint as UNHEALTHY for
other reasons as well (locality lb),which would fix this issue? wdyt?

I agree. I remember we discussed this some time back not sure why we did not change?

@ramaraochavali
Copy link
Contributor Author

@howardjohn WDYT about this?

@howardjohn
Copy link
Member

I am a bit concerned about the behaviors of the warming occurring at times other than pod startup. Maybe users requesting this like @Stono and @BradErz can give some feedback on this?

@ramaraochavali
Copy link
Contributor Author

@howardjohn While I share the same concern about possibility of unnecessary warmups, I also think it is safe change because

  • The unnecessary warmups could be a rare scenario (affecting less endpoint?) and in the grand scheme of may not matter much and the requests wont be failing any ways
  • It is opt-in. People can try and if it creates issues they can turn it off

@hzxuzhonghu
Copy link
Member

Does it rewarm when eds reload when the other pods down/up?

@ramaraochavali
Copy link
Contributor Author

If endpoint IP changes the new endpoint will be warmed

@Stono
Copy link

Stono commented Nov 25, 2021

👋
Firstly, thanks so much for working on this, it'll be a great feature. And I totally appreciate this is a first pass implementation so my comments below are based on utopian goals.

I understand why it would warm when we have a readiness blip but it does seem like unexpected behavior.

Yes, we certainly wouldn't want to re warm in the following scenarios:

  • Readiness Probe going unready
  • Outlier Detection triggering

We would want to warm:

  • When the pod first starts
  • If the pod restarts (crash, liveness probe, preemption)

From my perspective the purpose of this feature is to facilitate warming of JIT runtimes such as the JVM, so that only needs to happen on startup.

We would not use this feature if it warmed in readiness probe failures etc, that would increase our MTR during issues (for example, service 1 depends on some external service, it fails it readinessProbe because some external service is down, as soon as some external service is up, it becomes ready again - we wouldn't want to delay recovery of the service by 60s to facilitate warming).

@ramaraochavali
Copy link
Contributor Author

Readiness Probe going unready
Outlier Detection triggering

Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn mentioned here #2153 (comment). I will fix that.

For outlier detection - Envoy does not warm.

@howardjohn
Copy link
Member

howardjohn commented Nov 29, 2021 via email

@Stono
Copy link

Stono commented Nov 29, 2021

Tricky thing is readiness probe vs liveness probe cannot (reasonably) be determined by Istiod. Likely we will need to do the same for both of those cases unfortunately.

On Thu, Nov 25, 2021 at 2:52 AM Rama Chavali @.***> wrote: Readiness Probe going unready Outlier Detection triggering Readiness probe I think currently Istio removes and adds the endpoint - which we should fix as @howardjohn https://github.com/howardjohn mentioned here #2153 (comment) <#2153 (comment)>. I will fix that. For outlier detection - Envoy does not warm. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2153 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEYGXLQZ6ISAGFL72KTKK3UNYINTANCNFSM5IA4RLOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Because they're both observed as endpoint changes and either would remove it from the endpoint list?

If the choice was between:

  1. Not warming after a liveness probe failure (bad), but also not warming after readiness probe blips (good)
  2. Warming after a liveness probe failure (good), but also warmed after readiness probe blips (bad)

Eg, warm once after the pod is created but never again.

@howardjohn
Copy link
Member

howardjohn commented Nov 29, 2021 via email

@Stono
Copy link

Stono commented Nov 29, 2021

Haha yes, sorry, I missed it off, #1!
The logic there being liveiness probe failures are rare as usually indicate bigger issues, readiness probes are more frequent and can be considered somewhat BAU. Happy for a pod not to warm after a liveness failure because i consider warming an optimisation anyway, and then just optimise for the happy path (new pods only).

@jiangshantao-dbg
Copy link

can we cherry pick Support slow Start mode in Envoy #13176 to istio/envoy so that we can use EnvoyFilter to use this feature first. @howardjohn

@ramaraochavali
Copy link
Contributor Author

@jiangshantao-dbg it is already available in master. Are you asking for cherry picking to 1.12 - if yes, we do not generally cherry pick features to release.

@jiangshantao-dbg
Copy link

@jiangshantao-dbg it is already available in master. Are you asking for cherry picking to 1.12 - if yes, we do not generally cherry pick features to release.

ok. i got it. thank you!

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 8, 2022
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 10, 2022
@ramaraochavali
Copy link
Contributor Author

@howardjohn Now that we are sending unready endpoints, can we add this field?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants