Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Show scylla_sstables_bloom_filter_memory_size on one of the dashboards #2219

Closed
michoecho opened this issue Mar 11, 2024 · 9 comments · Fixed by #2223
Closed

Show scylla_sstables_bloom_filter_memory_size on one of the dashboards #2219

michoecho opened this issue Mar 11, 2024 · 9 comments · Fixed by #2223
Labels
enhancement New feature or request

Comments

@michoecho
Copy link
Contributor

michoecho commented Mar 11, 2024

We saw production nodes OOMing due to bloated bloom filters multiple times lately. (2 times in the last week).

And every time people seem to forget that we have a metric for this, and they waste time e.g. logging into the cluster and duing the bloom filter files.

Since the metric proved useful many times, maybe we should put it into one of the dashboards.

Perhaps it should be normalized by total memory. (E.g. sum by (...) (scylla_sstables_bloom_filter_memory_size) / sum by (...) (scylla_memory_total_memory)).

@michoecho michoecho added the enhancement New feature or request label Mar 11, 2024
@amnonh amnonh added this to the Monitoring 4.7 milestone Mar 11, 2024
@amnonh
Copy link
Collaborator

amnonh commented Mar 11, 2024

@michoecho can you also suggest an alert based on that?

@amnonh
Copy link
Collaborator

amnonh commented Mar 11, 2024

@michoecho can you look at the memory metrics you put, it's the same one

@michoecho
Copy link
Contributor Author

@michoecho can you also suggest an alert based on that?

@avikivity @denesb @mykaul What do you think? Should we have an alert if bloom filters exceed some fraction of memory? And if yes, what should be the threshold? 0.1, or is that too aggressive?

@avikivity
Copy link
Member

@michoecho can you also suggest an alert based on that?

@avikivity @denesb @mykaul What do you think? Should we have an alert if bloom filters exceed some fraction of memory?

yes

And if yes, what should be the threshold? 0.1, or is that too aggressive?

I think it's a reasonable starting point.

@mykaul
Copy link
Contributor

mykaul commented Mar 12, 2024

I've asked @d-helios to scan our cloud to see where we are at right now, I think based on the results we can determine the alert threshold

@amnonh
Copy link
Collaborator

amnonh commented Mar 12, 2024

@michoecho ping, please look at the metric you put, one is missing

@michoecho
Copy link
Contributor Author

@michoecho ping, please look at the metric you put, one is missing

missing

If you mean the fact that I typed

sum by (...) (scylla_sstables_bloom_filter_memory_size) / sum by (...) (scylla_sstables_bloom_filter_memory_size)

then the answer is that I meant

sum by (...) (scylla_sstables_bloom_filter_memory_size) / sum by (...) (scylla_memory_total_memory)

Otherwise I don't understand what you are asking for.

@amnonh
Copy link
Collaborator

amnonh commented Mar 12, 2024

@michoecho yes, that is exactly what I was asking about

@amnonh
Copy link
Collaborator

amnonh commented Mar 13, 2024

I've checked the cloud, I think 0.1 is a good threshold

amnonh added a commit to amnonh/scylla-grafana-monitoring that referenced this issue Mar 13, 2024
amnonh added a commit to amnonh/scylla-grafana-monitoring that referenced this issue Mar 13, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants