-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
8341427: JFR: Adjust object sampler span handling #19334
Conversation
Hi @srdo, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user srdo" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@srdo This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 880 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@egahlin) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
cc @egahlin, as per this thread https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006264.html Master currently does not compile, so I checked that these changes can at least build when applied to the jdk-23+23 tag. Maybe you can give me a hint about which tests to run, the full tier-1 set takes a long time, and showed a few failures for me, even with no change to the code? |
@srdo This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@srdo This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
/open |
@srdo This pull request is now open |
/signed |
Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
The span stored in each sample is not the calculated span, it's just the object's byte size (`allocated`). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started. Changed object samples to store the calculated span rather than the bytes allocated for the sampled object. When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want. When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.
/covered Employer: Crowdstrike |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
1b2c9df
to
376f047
Compare
I created a small program with a constant memory leak, and I observed that the samples were fewer at the end. I will try your patch and see if it fixes the issue once the OCA clears. I filed a bug to track the issue: If you change the title of the PR to "8341427: JFR: Adjust object sampler span handling" they will be connected. |
Anything I can do to speed up the OCA check? The agreement for my employer seems like it was approved a while ago, as it's listed on https://oca.opensource.oracle.com/?ojr=contrib-list. |
I've pinged the OCA signatory at your company again to verify your account is going to be contributing on their behalf. |
@robilad Thanks. He forwarded it to my corporate email, and I've replied to your email from that corporate address, which I hope covers this verification. |
@egahlin For when you have time to look at this again, the OCA stuff should be handled now (thanks Dalibor) |
I tried your change and it seems to fix the problem. Could you move the PR to review state, so it will be sent out on the JFR mailing list and a webrev will be generated. |
Yes, done |
I think there is a space missing in the title, try "8341427: JFR: Adjust object sampler span handling". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can sponsor this change.
|
/integrate |
Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)? |
My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size was larger, and the span belongs to the next sample in time. I have a vague memory that the span at some point was split into younger and older samples, but I didn't go with that solution. Let me think about it. |
Thanks for the explanation. Ok I see what you mean. It think splitting it evenly to both makes sense as well. |
Background thread describing how I understand this algorithm to be intended to work https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006255.html The goal is to get samples evenly spread over the entire allocation timeline. My understanding is that we want samples to account for the span "to their left" on the allocation timeline. A fresh sample will cover the span between itself and the previous sample. By giving the span of a removed sample to the younger neighbor, we get the spans adjusted as if we never had the removed sample. That's not the case if we give the span to the older neighbor or split the span between the two neighbors. A small example: Sample 1 (span 0...10) If we add another sample at byte 40 on the timeline and drop sample 2, I think we'd like to get this: Sample 1 (0...10) The spans of the samples are accurately representing which span of "time" on the allocation timeline the sample represents. In this case we'd be very likely to want to keep sample 3 because it covers a large span. If we split the span instead, we'd get Sample 1 (0...15) Sample 1 now claims to represent 0...15 on the timeline, even though the sample was actually created before the end of that interval. I think the effect this could have is to allow older samples an advantage in being kept, which might skew which samples we keep toward older samples, which causes the distribution of samples over the timeline to be uneven. Edit: When more samples arrive, I think it is better to keep a sample taken at byte 30 (sample 3) than to keep a sample taken at byte 10 (sample 1), if we're going for an even distribution of the samples. |
/sponsor |
Going to push as commit 822a155.
Your commit was automatically rebased without conflicts. |
@srdo Thanks for you contribution! |
The span stored in each sample is not the calculated span, it's just the object's byte size (
allocated
). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started.Changed object samples to store the calculated span rather than the bytes allocated for the sampled object.
When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want.
When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19334/head:pull/19334
$ git checkout pull/19334
Update a local copy of the PR:
$ git checkout pull/19334
$ git pull https://git.openjdk.org/jdk.git pull/19334/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 19334
View PR using the GUI difftool:
$ git pr show -t 19334
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19334.diff
Using Webrev
Link to Webrev Comment