-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Bug 1955489: enable hard-anti affinity and PDB for Alertmanager #1489
Bug 1955489: enable hard-anti affinity and PDB for Alertmanager #1489
Conversation
@simonpasquier: This pull request references Bugzilla bug 1955489, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@simonpasquier: This pull request references Bugzilla bug 1955489, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9c5435c
to
6d848f7
Compare
see from https://bugzilla.redhat.com/show_bug.cgi?id=1955489#c11, tested with the PR, Alertmanager Statefulsets now have 2 replicas and hard affinity set but pods can not be started |
6d848f7
to
3dd7c1b
Compare
@juzhao yes the PR is still WIP and the CI fails for the same reason as you've noticed. |
3dd7c1b
to
fa637b6
Compare
/test e2e-agnostic-operator |
1 similar comment
/test e2e-agnostic-operator |
fa637b6
to
4fff5a0
Compare
4fff5a0
to
f86c588
Compare
@simonpasquier: This pull request references Bugzilla bug 1955489, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/skip |
130683f
to
3e8ffd2
Compare
42a6c24
to
c570f13
Compare
This change introduces hard pod anti-affinity rules and pod disruption budgets for Alertmanager to ensure the maximum availability for the Alertmanager cluster in the event of nodes going down (either due to upgrades or unexpected outages). The cluster monitoring operator updates the `Upgradeable` condition to false when it detects that the pods aren't correctly spread to ensure that an upgrade only happens in safe configurations. The change also decreases the number of Alertmanager replicas from 3 to 2 to be consistent with the other monitoring components as well as the HA conventions stating that in general OpenShift component should run with a replica count of 2 [1]. In addition with 3 replicas, it is impossible to enable hard anti-affinity on nodes since 2 worker nodes is a supported deployment for OCP. The initial idea of running 3 replicas was to guarantee the replication of data (silences + notifications) during pod roll-outs even if the user didn't configure persistent storage. However given that no pod disruption budget were defined, there was no guarantee that Kubernetes would always keep one Alertmanager pod running. With hard anti-affinity and PDB, we are now sure that at least one Alertmanager pod is kept running. And we are also setting up a startup probe that waits for at least 20 seconds meaning that Kubernetes should wait for about 20 seconds after a new Alertmanager pod is running before considering rolling out the next one. This interval of time should be more than enough for the new Alertmanager to synchronize its data from the older peer. [1] https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability Signed-off-by: Simon Pasquier <spasquie@redhat.com>
c570f13
to
d0fafd8
Compare
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
d0fafd8
to
7f745e1
Compare
/skip |
/retest |
/label tide/merge-method-squash |
/skip |
@simonpasquier: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
// that 20 seconds is enough for a full synchronization (this is twice | ||
// the time Alertmanager waits before declaring that it can start | ||
// sending notfications). | ||
a.Spec.Containers = append(a.Spec.Containers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this change be made in the prometheus operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it makes sense to follow up upstream. BTW I notice that we have no readiness probe for Alertmanager...
name: alertmanager-main | ||
namespace: openshift-monitoring | ||
spec: | ||
maxUnavailable: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxUnavailable
is fine, maybe minAvailable
is better
# oc -n openshift-monitoring get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
alertmanager-main N/A 1 1 39m
prometheus-adapter 1 N/A 1 50m
prometheus-k8s 1 N/A 1 39m
thanos-querier-pdb 1 N/A 1 38m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxUnavailable
comes from upstream (https://github.com/prometheus-operator/kube-prometheus/blob/6d013d4e4f980ba99cfdafa9432819d484e2f829/jsonnet/kube-prometheus/components/alertmanager.libsonnet#L154) and my understanding is that because kube-prometheus deploys 3 replicas of Alertmanager, the choice was either maxUnavailable: 1
or minAvailable: 2
. Ideally we should settle on one field and automatically calculate the budget.
From https://kubernetes.io/docs/tasks/run-application/configure-pdb/
The use of maxUnavailable is recommended as it automatically responds to changes in the number of replicas of the corresponding controller.
I'll check with the workloads team if there's any strong recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for using maxUnavailable
here. The note about the number of replicas is relevant here imo.
/skip |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jan--f, simonpasquier The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@simonpasquier: All pull requests linked via external trackers have merged: Bugzilla bug 1955489 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…shift#1489) * *: enable hard-anti affinity and PDB for Alertmanager This change introduces hard pod anti-affinity rules and pod disruption budgets for Alertmanager to ensure the maximum availability for the Alertmanager cluster in the event of nodes going down (either due to upgrades or unexpected outages). The cluster monitoring operator updates the `Upgradeable` condition to false when it detects that the pods aren't correctly spread to ensure that an upgrade only happens in safe configurations. The change also decreases the number of Alertmanager replicas from 3 to 2 to be consistent with the other monitoring components as well as the HA conventions stating that in general OpenShift component should run with a replica count of 2 [1]. In addition with 3 replicas, it is impossible to enable hard anti-affinity on nodes since 2 worker nodes is a supported deployment for OCP. The initial idea of running 3 replicas was to guarantee the replication of data (silences + notifications) during pod roll-outs even if the user didn't configure persistent storage. However given that no pod disruption budget were defined, there was no guarantee that Kubernetes would always keep one Alertmanager pod running. With hard anti-affinity and PDB, we are now sure that at least one Alertmanager pod is kept running. And we are also setting up a startup probe that waits for at least 20 seconds meaning that Kubernetes should wait for about 20 seconds after a new Alertmanager pod is running before considering rolling out the next one. This interval of time should be more than enough for the new Alertmanager to synchronize its data from the older peer. [1] https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability Signed-off-by: Simon Pasquier <spasquie@redhat.com> * assets: regenerate * jsonnet,pkg: configure startupProbe only when no storage * assets: regenerate * test/e2e: add TestAlertmanagerDataReplication test Signed-off-by: Simon Pasquier <spasquie@redhat.com>
This change introduces hard pod anti-affinity rules and pod disruption
budgets for Alertmanager to ensure the maximum availability for the
Alertmanager cluster in the event of nodes going down (either due to
upgrades or unexpected outages). The cluster monitoring operator updates
the
Upgradeable
condition to false when it detects that the podsaren't correctly spread to ensure that an upgrade only happens in safe
configurations.
The change also decreases the number of Alertmanager replicas from 3 to
2 to be consistent with the other monitoring components as well as the
HA conventions stating that in general OpenShift component should run
with a replica count of 2 [1]. In addition with 3 replicas, it is
impossible to enable hard anti-affinity on nodes since 2 worker nodes is
a supported deployment for OCP.
The initial idea of running 3 replicas was to guarantee the replication
of data (silences + notifications) during pod roll-outs even if the user
didn't configure persistent storage. However given that no pod
disruption budget were defined, there was no guarantee that Kubernetes
would always keep one Alertmanager pod running. With hard anti-affinity
and PDB, we are now sure that at least one Alertmanager pod is kept
running. And we are also setting up a startup probe that waits for at
least 20 seconds meaning that Kubernetes should wait for about 20
seconds after a new Alertmanager pod is running before considering
rolling out the next one. This interval of time should be more than
enough for the new Alertmanager to synchronize its data from the older
peer.