Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

more robust alert for node reports being up, while going down. #1877

Closed
amnonh opened this issue Feb 1, 2023 · 0 comments · Fixed by #1882
Closed

more robust alert for node reports being up, while going down. #1877

amnonh opened this issue Feb 1, 2023 · 0 comments · Fixed by #1882
Labels
enhancement New feature or request

Comments

@amnonh
Copy link
Collaborator

amnonh commented Feb 1, 2023

When a node goes down (or in extreme situtation when a node stop to operate) it can be stack in a limbo,
it reports metric, so Prometheus think it's up, while not all metrics are reported.

We already have an alert for this situations but it uses absent that is only valid for a period of time.
Instead we shoul move to:

sum(up{job="scylla"}>0)by(instance) unless sum(scylla_transport_requests_served{shard="0"}) by(instance)
@amnonh amnonh added the enhancement New feature or request label Feb 1, 2023
@amnonh amnonh added this to the Monitoring 4.3 milestone Feb 1, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant