Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

✨ Add support for startup probes #2669

Closed
wants to merge 1 commit into from

Conversation

ramperher
Copy link

This PR address this issue: #2644, including the support for a separate endpoint for startup probe, defined in k8s docs. Current code only support readiness and liveness probes, but doesn't include support for startup probe.

Copy link

linux-foundation-easycla bot commented Jan 31, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ramperher / name: Ramon Perez (1aee555)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 31, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @ramperher!

It looks like this is your first PR to kubernetes-sigs/controller-runtime 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/controller-runtime has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 31, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @ramperher. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ramperher
Once this PR has been reviewed and has the lgtm label, please assign fillzpp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 31, 2024

// StartupEndpointName, defaults to "startz"
// +optional
StartupEndpointName string `json:"startupEndpointName,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is where the other endpoint names live but this is deprecated. Do we want to keep it here with the risk of it being moved and/or removed for configuration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of this file being deprecated. Where should it be instead? I didn't find another place where the endpoint name is defined. I suppose that the risk for StartupEndpointName is the same than for the others, is it correct?

Copy link
Member

@troy0820 troy0820 Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ControllerHealth defines the health configs.
//
// Deprecated: The component config package has been deprecated and will be removed in a future release. Users should migrate to their own config implementation, please share feedback in #895.

There is this PR that removes this file #2648 because most of the config is deprecated. Adding this functionality there doesn't seem like it will stay long as the plan would be to get rid of this configuration.

Now that doesn't mean it doesn't have a place defined in this repo, but I would probably place it as an option and not within the deprecated type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thank you. However, from my point of view, isn't it better to remove this config from my change when #2648 is merged? Because right now it's still in progress

After merging your change, I can rebase my change to bring your changes and then I can remove my code from the deprecated files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yours may get merged first, so with that if the maintainers lgtm/approve, I can work on finding a spot for this to move it out of the deprecated struct.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine, although it seems to me that the Kubernetes docs show https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes startup probes are just pointing to healthz endpoint? Is this even needed in that case?

@ramperher
Copy link
Author

ramperher commented Feb 8, 2024

Changes look fine, although it seems to me that the Kubernetes docs show https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes startup probes are just pointing to healthz endpoint? Is this even needed in that case?

Thanks @vincepri ! In my opinion, k8s docs are hiding the fact that each lifecycle probe has a different meaning and purpose. If startup probe is linked to the same endpoint than another probe (in this case, liveness), then, why is it defined the startup probe?

I believe k8s docs are just reflecting an example, also knowing that startup probe looks like the last lifecycle probe integrated in the code, and just for making the docs easier, the same endpoint than liveness probe was used. But the truth is, users should be able to define a specific endpoint per probe if required in their use cases (again, taking into account that each probe is different and referring to a different stage in the container lifecycle checks), and that's why this code would be needed.

@alvaroaleman
Copy link
Member

I still do not see what problem this solves other than some certification of yours. You are talking about use-cases without actually giving any concrete example of a use case that would require this. You can run a custom http server to server any endpoint of your choosing or even add custom endpoints to the webhook server, none of that requires changes in controller-runtime.

Before we merge anything like this, I want to see a specific example of a controller-manager that requires this.
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 10, 2024
@ramperher
Copy link
Author

I still do not see what problem this solves other than some certification of yours. You are talking about use-cases without actually giving any concrete example of a use case that would require this. You can run a custom http server to server any endpoint of your choosing or even add custom endpoints to the webhook server, none of that requires changes in controller-runtime.

Before we merge anything like this, I want to see a specific example of a controller-manager that requires this. /hold

@alvaroaleman , just to clarify the rationale behind the work: this appeared as a consequence of working with some cases submitted for Red Hat CNF Certification, but this is not a fix for that. This work is to provide the capability of defining a custom startup probe, decoupling it from liveness/readiness probe, for operator-sdk's controller-manager pods. These pods, which manage the lifecycle of the resources deployed by the operator, rely on the controller-runtime code to be built, and currently, it's not possible to define a custom startup probe if needed, as documented in the issue.

In our case, right now, we are working in a Telco CNF called example-cnf, where the operators are made with operator-sdk, and in the controller-manager (example here), we have to point to /healthz endpoint for startup probe because there's no endpoint defined for startup probe, currently.

In the issue opened in operator-sdk, the current status is just waiting for the resolution of this work in controller-runtime, then the endpoint would be added there.

After saying this, I'd also like to quote k8s official docs regarding this feature:

The kubelet uses startup probes to know when a container application has started. If such a probe is configured, liveness and readiness probes do not start until it succeeds, making sure those probes don't interfere with the application startup.

This means that the startup probe is the first one being executed. Liveness and readiness probes are not started till startup probe execution succeed. So, for me, it makes sense that there may be cases where a controller-manager pod from operator-sdk wants to control what happens just after starting the containers, using this startup probe feature, and managing it in a different way than liveness and readiness cases.

Hope this clarifies the purpose of this change.

@alvaroaleman
Copy link
Member

we have to point to /healthz endpoint for startup probe because there's no endpoint defined for startup probe, currently.

And what is the problem with that? Please, start with the problem description. "It doesn't have a startupProbe enndpoint" is not a problem description, its jumping to a problem in an assumed solution of a problem that wasn't described.

You need to describe why you need the startupProbe in the first place. What is not working correctly when not using a startupProbe or using the healthz endpoint for it?

@ramperher
Copy link
Author

we have to point to /healthz endpoint for startup probe because there's no endpoint defined for startup probe, currently.

And what is the problem with that? Please, start with the problem description. "It doesn't have a startupProbe enndpoint" is not a problem description, its jumping to a problem in an assumed solution of a problem that wasn't described.

You need to describe why you need the startupProbe in the first place. What is not working correctly when not using a startupProbe or using the healthz endpoint for it?

we have to point to /healthz endpoint for startup probe because there's no endpoint defined for startup probe, currently.

And what is the problem with that? Please, start with the problem description. "It doesn't have a startupProbe enndpoint" is not a problem description, its jumping to a problem in an assumed solution of a problem that wasn't described.

You need to describe why you need the startupProbe in the first place. What is not working correctly when not using a startupProbe or using the healthz endpoint for it?

As I can extract from k8s documentation, startup probe targets a different container lifecycle stage compared to readiness and liveness probes. My rationale for this change is, if the feature is different, shouldn't it have a different endpoint too?

@ramperher
Copy link
Author

I've been analyzing this KEP with @vincepri: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/950-liveness-probe-holdoff/README.md, and there, same endpoint is used for startup probe and liveness probe.

As I said, for me, it made sense to separate features in different endpoints, but, if by design, in this KEP, it was decided to do the opposite, I don't have much more arguments to justify my change.

So, having said that, I'll close the PR. Thanks for your reviews here.

@ramperher ramperher closed this Feb 13, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants