Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add percentile pause info as well as avg pause #139

Open
elyograg opened this issue Jul 8, 2015 · 9 comments
Open

Add percentile pause info as well as avg pause #139

elyograg opened this issue Jul 8, 2015 · 9 comments
Milestone

Comments

@elyograg
Copy link

elyograg commented Jul 8, 2015

The average pause is useful information, but the median pause (50th percentile) would be a lot more useful. Also helpful would be 75th, 95th, and 99th percentile information.

I think the info required to calculate these values is already mostly available, in order to draw the graphs. I don't know how much additional memory would be required. For a very large GC log, it could be quite a lot.

https://en.wikipedia.org/wiki/Percentile

@ecki
Copy link
Contributor

ecki commented Jul 8, 2015

I agree (especially if we get percentiles over periods). But its quite some work.

I had a quick hack back then (not sure if it still applies) which allows me to specify a threshold of STW pause times which I care about, and get a count for all those "outliers". Thats not a percentile but similiar usefull for SLA monitoring. ecki@7ca9236

@elyograg
Copy link
Author

elyograg commented Jul 8, 2015

If you want to avoid huge amounts of additional memory use, you could do it in the same way that codahale metrics does. Or you could simply incorporate codahale metrics directly into your application, but it would increase the jar size. The Solr project has included codahale metrics to give users percentile information on query times in the admin UI.

https://issues.apache.org/jira/browse/SOLR-1972

From what I understand, codahale builds a 1024-element construct to store the info being tracked (don't know if it's an array or a Collection) ... when it fills up, they randomly pick one of the existing entries to evict and use that smaller data set to calculate the percentiles. It doesn't give you exactly the same information as you would get from a complete dataset, but apparently it is pretty close.

@duckdeer
Copy link

duckdeer commented Jul 9, 2015

This would be a very nice enhancement, too. I think it can be a big help analyzing big logfiles.

Percentiles would be great, too. :-)

Am 9. Juli 2015 00:54:50 MESZ, schrieb Bernd notifications@github.com:

I agree (especially if we get percentiles over periods). But its quite
some work.

I had a quick hack back then (not sure if it still applies) which
allows me to specify a threshold of STW pause times which I care about,
and get a count for all those "outliers". Thats not a percentile but
similiar usefull for SLA monitoring.
ecki@7ca9236


Reply to this email directly or view it on GitHub:
#139 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

@chewiebug
Copy link
Owner

Hmm, this looks like I should start thinking on how to implement the percentiles :-).

@chewiebug
Copy link
Owner

I have made a first quick and dirty implementation of the percentile calculation. The values are shown on the "event details" tab.

It is pushed to the feature/#139/percentile branch.

What do you think?

@elyograg
Copy link
Author

Now I just need to figure out how to build the project!

@duckdeer
Copy link

You must build the project with maven and java 8.
Just go to the main directory and type "mvn clean install". After that you'll find the jar file under "target".

Am 15. Juli 2015 17:31:06 MESZ, schrieb Shawn Heisey notifications@github.com:

Now I just need to figure out how to build the project!


Reply to this email directly or view it on GitHub:
#139 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

@elyograg
Copy link
Author

I figured out how to build it, and looked at the Event details tab. Looks pretty awesome!

One number that I didn't see is something that might be called "total pause" ... which would only be something to worry about if there are situations where a GC pause includes multiple pause types. If that's not a situation that can ever occur, then I guess it's not a number that needs to be gathered.

Nitpick, feel free to ignore: Some people might be specifically looking for "Median" rather than the 50th percentile.

@elyograg
Copy link
Author

Thought of something else. The "Pause" tab has min/avg/max for "Pause interval" ... would it be possible to have statistics in "Event details" for that number?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants