Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a graph for scylla_io_queue_flow_ratio #2306

Closed
vladzcloudius opened this issue Jun 3, 2024 · 8 comments · Fixed by #2312
Closed

Add a graph for scylla_io_queue_flow_ratio #2306

vladzcloudius opened this issue Jun 3, 2024 · 8 comments · Fixed by #2312
Labels
enhancement New feature or request

Comments

@vladzcloudius
Copy link
Contributor

System information

  • Scylla version (you are using): 2022.x+

Describe the feature and the current behavior/state.
In light of fixing scylladb/seastar#1641 there was a new metric add: scylla_io_queue_flow_ratio.
Here is a patch with its description: scylladb/seastar@dd6b20d

Correlating with this value may be helpful when debugging I/O related performance issues.

cc @xemul

@vladzcloudius vladzcloudius added the enhancement New feature or request label Jun 3, 2024
@amnonh
Copy link
Collaborator

amnonh commented Jun 7, 2024

it has both mountpoint and iogroup labels, do you want just everything on the same panel?
This is an example from a three nodes cluster
image

What about aggregation? Naturally, the sum is meaningless, but do we want to have an option to aggregate?

@amnonh amnonh added this to the Monitoring 4.8 milestone Jun 7, 2024
@vladzcloudius
Copy link
Contributor Author

it has both mountpoint and iogroup labels, do you want just everything on the same panel? This is an example from a three nodes cluster image

What about aggregation? Naturally, the sum is meaningless, but do we want to have an option to aggregate?

AFAIU for this metric any aggregation is meaningless.
@xemul, could you, please, confirm?

@amnonh
Copy link
Collaborator

amnonh commented Jun 7, 2024

@vladzcloudius Naturally, there is no point in summing over it, will Max? Min be helpful? Do you want to filter out values that equal to 1?
I'm thinking about a cluster with a few hundred cores. How will you make sense of a graph like this?

@vladzcloudius
Copy link
Contributor Author

@vladzcloudius Naturally, there is no point in summing over it, will Max? Min be helpful? Do you want to filter out values that equal to 1? I'm thinking about a cluster with a few hundred cores. How will you make sense of a graph like this?

Good point.
A special thing about this metric is that we want to see "outliers" - mins and maxes.
However seeing a single maximum or minimum value is also not very useful.
Ideally we'd be able to toggle a sorting order: this way one case see first all maximum and then all minimum values.

@amnonh
Copy link
Collaborator

amnonh commented Jun 7, 2024

Good point. A special thing about this metric is that we want to see "outliers" - mins and maxes. However seeing a single maximum or minimum value is also not very useful. Ideally we'd be able to toggle a sorting order: this way one case see first all maximum and then all minimum values.

Sorry, still no easy way to play with the sort order in Grafana.

I want to include this panel in the next release, but I'm afraid as is, it will not be useful with so many lines in the graph. If we do not have a good idea, I will do it anyhow, but I hope we can come up with something better.

If you only care about the outliers I can show the average of everything as a base (maybe per mount point and iogroup) and anything that is outside of let's say two standard deviations (plus some minimal threshold to remove noise)

image

@amnonh
Copy link
Collaborator

amnonh commented Jun 9, 2024

@vladzcloudius take a look at my latest comments, if we can't find something better, I'll include a panel with all graphs in it, I'm afraid it will not be useful with clusters with many cores

amnonh added a commit to amnonh/scylla-grafana-monitoring that referenced this issue Jun 10, 2024
This patch adds a panel that shows scylla_io_queue_flow_ratio.

Fixes scylladb#2306

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
@vladzcloudius
Copy link
Contributor Author

We probably should think about it a bit more - having a graph for all shards will be quote a hassle indeed.
Probably we should show some statistical function expressing a level of noise in this metric, e.g. the SD itself.

@amnonh
Copy link
Collaborator

amnonh commented Jun 10, 2024

I think we should filter all values that are close to 1 (i.e., close to 100%). I'm not sure what the threshold should be but, no point in scrolling throw few hundreds values all of them 1.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants