Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bug 1955489: enable hard-anti affinity and PDB for Alertmanager #1489

Merged
merged 5 commits into from
Dec 7, 2021

Conversation

simonpasquier
Copy link
Contributor

@simonpasquier simonpasquier commented Nov 22, 2021

This change introduces hard pod anti-affinity rules and pod disruption
budgets for Alertmanager to ensure the maximum availability for the
Alertmanager cluster in the event of nodes going down (either due to
upgrades or unexpected outages). The cluster monitoring operator updates
the Upgradeable condition to false when it detects that the pods
aren't correctly spread to ensure that an upgrade only happens in safe
configurations.

The change also decreases the number of Alertmanager replicas from 3 to
2 to be consistent with the other monitoring components as well as the
HA conventions stating that in general OpenShift component should run
with a replica count of 2 [1]. In addition with 3 replicas, it is
impossible to enable hard anti-affinity on nodes since 2 worker nodes is
a supported deployment for OCP.

The initial idea of running 3 replicas was to guarantee the replication
of data (silences + notifications) during pod roll-outs even if the user
didn't configure persistent storage. However given that no pod
disruption budget were defined, there was no guarantee that Kubernetes
would always keep one Alertmanager pod running. With hard anti-affinity
and PDB, we are now sure that at least one Alertmanager pod is kept
running. And we are also setting up a startup probe that waits for at
least 20 seconds meaning that Kubernetes should wait for about 20
seconds after a new Alertmanager pod is running before considering
rolling out the next one. This interval of time should be more than
enough for the new Alertmanager to synchronize its data from the older
peer.

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 22, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 22, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 22, 2021

@simonpasquier: This pull request references Bugzilla bug 1955489, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[WIP] Bug 1955489: enable hard-anti affinity and PDB for Alertmanager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@simonpasquier
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Nov 23, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 23, 2021

@simonpasquier: This pull request references Bugzilla bug 1955489, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @juzhao

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from juzhao November 23, 2021 07:53
@simonpasquier simonpasquier force-pushed the bz1955489 branch 3 times, most recently from 9c5435c to 6d848f7 Compare November 23, 2021 09:37
@juzhao
Copy link
Contributor

juzhao commented Nov 25, 2021

see from https://bugzilla.redhat.com/show_bug.cgi?id=1955489#c11, tested with the PR, Alertmanager Statefulsets now have 2 replicas and hard affinity set but pods can not be started

@simonpasquier
Copy link
Contributor Author

@juzhao yes the PR is still WIP and the CI fails for the same reason as you've noticed.

@simonpasquier
Copy link
Contributor Author

/test e2e-agnostic-operator

1 similar comment
@simonpasquier
Copy link
Contributor Author

/test e2e-agnostic-operator

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 30, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 1, 2021

@simonpasquier: This pull request references Bugzilla bug 1955489, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @juzhao

In response to this:

[WIP] Bug 1955489: enable hard-anti affinity and PDB for Alertmanager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@simonpasquier
Copy link
Contributor Author

/skip

This change introduces hard pod anti-affinity rules and pod disruption
budgets for Alertmanager to ensure the maximum availability for the
Alertmanager cluster in the event of nodes going down (either due to
upgrades or unexpected outages). The cluster monitoring operator updates
the `Upgradeable` condition to false when it detects that the pods
aren't correctly spread to ensure that an upgrade only happens in safe
configurations.

The change also decreases the number of Alertmanager replicas from 3 to
2 to be consistent with the other monitoring components as well as the
  HA conventions stating that in general OpenShift component should run
with a replica count of 2 [1]. In addition with 3 replicas, it is
impossible to enable hard anti-affinity on nodes since 2 worker nodes is
a supported deployment for OCP.

The initial idea of running 3 replicas was to guarantee the replication
of data (silences + notifications) during pod roll-outs even if the user
didn't configure persistent storage. However given that no pod
disruption budget were defined, there was no guarantee that Kubernetes
would always keep one Alertmanager pod running.  With hard anti-affinity
and PDB, we are now sure that at least one Alertmanager pod is kept
running. And we are also setting up a startup probe that waits for at
least 20 seconds meaning that Kubernetes should wait for about 20
seconds after a new Alertmanager pod is running before considering
rolling out the next one. This interval of time should be more than
enough for the new Alertmanager to synchronize its data from the older
peer.

[1]
https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
@simonpasquier
Copy link
Contributor Author

/skip
/retitle Bug 1955489: enable hard-anti affinity and PDB for Alertmanager

@openshift-ci openshift-ci bot changed the title [WIP] Bug 1955489: enable hard-anti affinity and PDB for Alertmanager Bug 1955489: enable hard-anti affinity and PDB for Alertmanager Dec 6, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2021
@simonpasquier
Copy link
Contributor Author

/retest
/assign @jan--f

@simonpasquier
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Dec 6, 2021
@simonpasquier
Copy link
Contributor Author

/skip

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 6, 2021

@simonpasquier: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node 7f745e1 link false /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

// that 20 seconds is enough for a full synchronization (this is twice
// the time Alertmanager waits before declaring that it can start
// sending notfications).
a.Spec.Containers = append(a.Spec.Containers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this change be made in the prometheus operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it makes sense to follow up upstream. BTW I notice that we have no readiness probe for Alertmanager...

name: alertmanager-main
namespace: openshift-monitoring
spec:
maxUnavailable: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxUnavailable is fine, maybe minAvailable is better

# oc -n openshift-monitoring get pdb 
NAME                 MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
alertmanager-main    N/A             1                 1                     39m
prometheus-adapter   1               N/A               1                     50m
prometheus-k8s       1               N/A               1                     39m
thanos-querier-pdb   1               N/A               1                     38m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxUnavailable comes from upstream (https://github.com/prometheus-operator/kube-prometheus/blob/6d013d4e4f980ba99cfdafa9432819d484e2f829/jsonnet/kube-prometheus/components/alertmanager.libsonnet#L154) and my understanding is that because kube-prometheus deploys 3 replicas of Alertmanager, the choice was either maxUnavailable: 1 or minAvailable: 2. Ideally we should settle on one field and automatically calculate the budget.

From https://kubernetes.io/docs/tasks/run-application/configure-pdb/

The use of maxUnavailable is recommended as it automatically responds to changes in the number of replicas of the corresponding controller.

I'll check with the workloads team if there's any strong recommendation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for using maxUnavailable here. The note about the number of replicas is relevant here imo.

@simonpasquier
Copy link
Contributor Author

/skip

@jan--f
Copy link
Contributor

jan--f commented Dec 7, 2021

/lgtm
This looks great!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jan--f,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 4996004 into openshift:master Dec 7, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2021

@simonpasquier: All pull requests linked via external trackers have merged:

Bugzilla bug 1955489 has been moved to the MODIFIED state.

In response to this:

Bug 1955489: enable hard-anti affinity and PDB for Alertmanager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@simonpasquier simonpasquier deleted the bz1955489 branch December 7, 2021 15:04
dgrisonnet pushed a commit to dgrisonnet/cluster-monitoring-operator that referenced this pull request Sep 5, 2023
…shift#1489)

* *: enable hard-anti affinity and PDB for Alertmanager

This change introduces hard pod anti-affinity rules and pod disruption
budgets for Alertmanager to ensure the maximum availability for the
Alertmanager cluster in the event of nodes going down (either due to
upgrades or unexpected outages). The cluster monitoring operator updates
the `Upgradeable` condition to false when it detects that the pods
aren't correctly spread to ensure that an upgrade only happens in safe
configurations.

The change also decreases the number of Alertmanager replicas from 3 to
2 to be consistent with the other monitoring components as well as the
  HA conventions stating that in general OpenShift component should run
with a replica count of 2 [1]. In addition with 3 replicas, it is
impossible to enable hard anti-affinity on nodes since 2 worker nodes is
a supported deployment for OCP.

The initial idea of running 3 replicas was to guarantee the replication
of data (silences + notifications) during pod roll-outs even if the user
didn't configure persistent storage. However given that no pod
disruption budget were defined, there was no guarantee that Kubernetes
would always keep one Alertmanager pod running.  With hard anti-affinity
and PDB, we are now sure that at least one Alertmanager pod is kept
running. And we are also setting up a startup probe that waits for at
least 20 seconds meaning that Kubernetes should wait for about 20
seconds after a new Alertmanager pod is running before considering
rolling out the next one. This interval of time should be more than
enough for the new Alertmanager to synchronize its data from the older
peer.

[1]
https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* assets: regenerate

* jsonnet,pkg: configure startupProbe only when no storage

* assets: regenerate

* test/e2e: add TestAlertmanagerDataReplication test

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants