8341427: JFR: Adjust object sampler span handling #19334

srdo · 2024-05-21T18:37:35Z

The span stored in each sample is not the calculated span, it's just the object's byte size (allocated). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started.

Changed object samples to store the calculated span rather than the bytes allocated for the sampled object.

When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want.

When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8341427: JFR: Adjust object sampler span handling (Bug - P2)

Reviewers

Erik Gahlin (@egahlin - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19334/head:pull/19334
$ git checkout pull/19334

Update a local copy of the PR:
$ git checkout pull/19334
$ git pull https://git.openjdk.org/jdk.git pull/19334/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19334

View PR using the GUI difftool:
$ git pr show -t 19334

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19334.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-05-21T18:38:26Z

Hi @srdo, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user srdo" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

openjdk · 2024-05-21T18:39:09Z

@srdo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341427: JFR: Adjust object sampler span handling

Reviewed-by: egahlin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 880 new commits pushed to the master branch:

9769ee8: 8344652: Remove access control context text from SSLEngine and SSLSession APIs
82c3612: 8344830: [BACKOUT] JDK-8341334: CDS: Parallel relocation
64e4aa2: 8339916: AIOOBE due to Math.abs(Integer.MIN_VALUE) in tests
bf374c3: 8343453: Modernize FloatingDecimal tests
847f65c: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553
8903854: 8344718: Test runtime/cds/appcds/jigsaw/addmods/AddmodsOption.java fails on Linuxppc64le after JDK-8344239
a07b72b: 8344346: java/net/httpclient/ShutdownNow.java fails with java.lang.AssertionError: client was still running, but exited after further delay: timeout should be adjusted
2ea0364: 8343893: Test jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java failed: heap should have grown and NMT should show that: expected 0 > 0
50c099d: 8344799: Remove permissions checks from java.awt.Desktop
e21d06f: 8344338: javax/swing/JTextArea/bug4265784.java fails on Ubuntu 24.04.1
... and 870 more: https://git.openjdk.org/jdk/compare/f1bf469b4ee07b48b629a126111e307d3cab7fd7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@egahlin) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2024-05-21T18:39:26Z

@srdo The following label will be automatically applied to this pull request:

hotspot-jfr

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

srdo · 2024-05-21T18:39:52Z

cc @egahlin, as per this thread https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006264.html

Master currently does not compile, so I checked that these changes can at least build when applied to the jdk-23+23 tag. Maybe you can give me a hint about which tests to run, the full tier-1 set takes a long time, and showed a few failures for me, even with no change to the code?

bridgekeeper · 2024-07-16T20:08:05Z

@srdo This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2024-09-10T21:24:09Z

@srdo This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

srdo · 2024-09-30T14:48:30Z

/open

openjdk · 2024-09-30T14:49:56Z

@srdo This pull request is now open

srdo · 2024-09-30T14:50:38Z

/signed

bridgekeeper · 2024-09-30T14:51:02Z

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

The span stored in each sample is not the calculated span, it's just the object's byte size (`allocated`). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started. Changed object samples to store the calculated span rather than the bytes allocated for the sampled object. When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want. When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.

srdo · 2024-09-30T14:52:01Z

/covered

Employer: Crowdstrike

bridgekeeper · 2024-09-30T14:53:04Z

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

egahlin · 2024-10-02T15:07:55Z

I created a small program with a constant memory leak, and I observed that the samples were fewer at the end. I will try your patch and see if it fixes the issue once the OCA clears.

I filed a bug to track the issue:
https://bugs.openjdk.org/browse/JDK-8341427

If you change the title of the PR to "8341427: JFR: Adjust object sampler span handling" they will be connected.

srdo · 2024-10-15T18:02:03Z

Anything I can do to speed up the OCA check? The agreement for my employer seems like it was approved a while ago, as it's listed on https://oca.opensource.oracle.com/?ojr=contrib-list.

robilad · 2024-10-28T16:07:11Z

I've pinged the OCA signatory at your company again to verify your account is going to be contributing on their behalf.

srdo · 2024-10-29T16:41:40Z

@robilad Thanks. He forwarded it to my corporate email, and I've replied to your email from that corporate address, which I hope covers this verification.

srdo · 2024-11-14T14:24:28Z

@egahlin For when you have time to look at this again, the OCA stuff should be handled now (thanks Dalibor)

egahlin · 2024-11-21T23:48:13Z

I tried your change and it seems to fix the problem. Could you move the PR to review state, so it will be sent out on the JFR mailing list and a webrev will be generated.

srdo · 2024-11-22T12:07:54Z

Yes, done

egahlin · 2024-11-22T13:06:08Z

Yes, done

I think there is a space missing in the title, try "8341427: JFR: Adjust object sampler span handling".

mlbridge · 2024-11-22T13:11:25Z

Webrevs

00: Full (376f0475)

egahlin

I can sponsor this change.

openjdk · 2024-11-22T13:20:27Z

⚠️ @srdo the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout object-sampling
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

srdo · 2024-11-22T13:46:30Z

/integrate

openjdk · 2024-11-22T13:47:14Z

@srdo
Your change (at version 376f047) is now ready to be sponsored by a Committer.

roberttoyonaga · 2024-11-22T14:53:14Z

Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)?

egahlin · 2024-11-22T15:18:14Z

Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)?'

My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size was larger, and the span belongs to the next sample in time. I have a vague memory that the span at some point was split into younger and older samples, but I didn't go with that solution.

Let me think about it.

roberttoyonaga · 2024-11-22T15:31:33Z

My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size were larger, and the span belongs to the next sample in time

Thanks for the explanation. Ok I see what you mean. It think splitting it evenly to both makes sense as well.

srdo · 2024-11-22T15:53:14Z

@roberttoyonaga

Background thread describing how I understand this algorithm to be intended to work https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006255.html

The goal is to get samples evenly spread over the entire allocation timeline. My understanding is that we want samples to account for the span "to their left" on the allocation timeline. A fresh sample will cover the span between itself and the previous sample.

By giving the span of a removed sample to the younger neighbor, we get the spans adjusted as if we never had the removed sample.

That's not the case if we give the span to the older neighbor or split the span between the two neighbors.

A small example:

Sample 1 (span 0...10)
Sample 2 (span 10...20)
Sample 3 (span 20...30)

If we add another sample at byte 40 on the timeline and drop sample 2, I think we'd like to get this:

Sample 1 (0...10)
Sample 3 (10...30)
Sample 4 (30...40)

The spans of the samples are accurately representing which span of "time" on the allocation timeline the sample represents. In this case we'd be very likely to want to keep sample 3 because it covers a large span.

If we split the span instead, we'd get

Sample 1 (0...15)
Sample 3 (15...30)
Sample 4 (30...40)

Sample 1 now claims to represent 0...15 on the timeline, even though the sample was actually created before the end of that interval. I think the effect this could have is to allow older samples an advantage in being kept, which might skew which samples we keep toward older samples, which causes the distribution of samples over the timeline to be uneven.

Edit:
What I'm getting at is that if the goal is to keep evenly distributed samples over the allocation timeline, then sample 3 should be preferred over sample 1 when future samples arrive, and if we split the span, it won't be.

When more samples arrive, I think it is better to keep a sample taken at byte 30 (sample 3) than to keep a sample taken at byte 10 (sample 1), if we're going for an even distribution of the samples.

egahlin · 2024-11-23T15:57:56Z

/sponsor

openjdk · 2024-11-23T15:58:42Z

Going to push as commit 822a155.
Since your change was applied there have been 896 commits pushed to the master branch:

d00f311: 8343150: Change URLClassLoader.getPermissions to return empty PermissionCollection
effee12: 8344533: CTW: Add option to remove clinits before loading
70c4e2c: 8344587: Reduce number of "jdk.jpackage.internal" classes used from other packages
1114704: 6672644: JComboBox still scrolling if switch to another window and return back
98b6678: 8343741: SA jstack --mixed should print information about VM locks
1b2d9ca: 8344881: Problemlist java/awt/Robot/InfiniteLoopException.java on Linux
6aec2dc: 8344788: Specify that the access control context parameters of Subject.doAsPrivileged are ignored
079f503: 8344568: Renaming ceil_log2 to log2i_ceil
51763b6: 8344525: Fix leftover ExceptionOccurred in java.base
4b16530: 8344795: Remove uses of AccessControlContext in java.desktop module
... and 886 more: https://git.openjdk.org/jdk/compare/f1bf469b4ee07b48b629a126111e307d3cab7fd7...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-11-23T15:58:51Z

@egahlin @srdo Pushed as commit 822a155.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

egahlin · 2024-11-24T23:54:45Z

@srdo Thanks for you contribution!

bridgekeeper bot added the oca Needs verification of OCA signatory status label May 21, 2024

openjdk bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label May 21, 2024

bridgekeeper bot closed this Sep 10, 2024

openjdk bot reopened this Sep 30, 2024

bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Sep 30, 2024

srdo force-pushed the object-sampling branch from 1b2c9df to 376f047 Compare September 30, 2024 15:24

srdo changed the title ~~Adjust object sampler span handling~~ 8341427:JFR: Adjust object sampler span handling Oct 10, 2024

bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Nov 13, 2024

srdo marked this pull request as ready for review November 22, 2024 12:07

srdo changed the title ~~8341427:JFR: Adjust object sampler span handling~~ 8341427: JFR: Adjust object sampler span handling Nov 22, 2024

openjdk bot added the rfr Pull request is ready for review label Nov 22, 2024

egahlin approved these changes Nov 22, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Nov 22, 2024

openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 22, 2024

openjdk bot added the integrated Pull request has been integrated label Nov 23, 2024

openjdk bot closed this Nov 23, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 23, 2024

roberttoyonaga mentioned this pull request Nov 25, 2024

[GR-60079] JFR leak profiler span fixes oracle/graal#10154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8341427: JFR: Adjust object sampler span handling #19334

8341427: JFR: Adjust object sampler span handling #19334

srdo commented May 21, 2024 •

edited by openjdk bot

Loading

bridgekeeper bot commented May 21, 2024

openjdk bot commented May 21, 2024 •

edited

Loading

openjdk bot commented May 21, 2024

srdo commented May 21, 2024 •

edited

Loading

bridgekeeper bot commented Jul 16, 2024

bridgekeeper bot commented Sep 10, 2024

srdo commented Sep 30, 2024

openjdk bot commented Sep 30, 2024

srdo commented Sep 30, 2024

bridgekeeper bot commented Sep 30, 2024

srdo commented Sep 30, 2024 •

edited

Loading

bridgekeeper bot commented Sep 30, 2024

egahlin commented Oct 2, 2024 •

edited

Loading

srdo commented Oct 15, 2024

robilad commented Oct 28, 2024

srdo commented Oct 29, 2024 •

edited

Loading

srdo commented Nov 14, 2024

egahlin commented Nov 21, 2024 •

edited

Loading

srdo commented Nov 22, 2024

egahlin commented Nov 22, 2024

mlbridge bot commented Nov 22, 2024

egahlin left a comment

openjdk bot commented Nov 22, 2024

srdo commented Nov 22, 2024

openjdk bot commented Nov 22, 2024

roberttoyonaga commented Nov 22, 2024

egahlin commented Nov 22, 2024 •

edited

Loading

roberttoyonaga commented Nov 22, 2024

srdo commented Nov 22, 2024 •

edited

Loading

egahlin commented Nov 23, 2024

openjdk bot commented Nov 23, 2024

openjdk bot commented Nov 23, 2024

egahlin commented Nov 24, 2024

8341427: JFR: Adjust object sampler span handling #19334

8341427: JFR: Adjust object sampler span handling #19334

Conversation

srdo commented May 21, 2024 • edited by openjdk bot Loading

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented May 21, 2024

openjdk bot commented May 21, 2024 • edited Loading

openjdk bot commented May 21, 2024

srdo commented May 21, 2024 • edited Loading

bridgekeeper bot commented Jul 16, 2024

bridgekeeper bot commented Sep 10, 2024

srdo commented Sep 30, 2024

openjdk bot commented Sep 30, 2024

srdo commented Sep 30, 2024

bridgekeeper bot commented Sep 30, 2024

srdo commented Sep 30, 2024 • edited Loading

bridgekeeper bot commented Sep 30, 2024

egahlin commented Oct 2, 2024 • edited Loading

srdo commented Oct 15, 2024

robilad commented Oct 28, 2024

srdo commented Oct 29, 2024 • edited Loading

srdo commented Nov 14, 2024

egahlin commented Nov 21, 2024 • edited Loading

srdo commented Nov 22, 2024

egahlin commented Nov 22, 2024

mlbridge bot commented Nov 22, 2024

Webrevs

egahlin left a comment

Choose a reason for hiding this comment

openjdk bot commented Nov 22, 2024

srdo commented Nov 22, 2024

openjdk bot commented Nov 22, 2024

roberttoyonaga commented Nov 22, 2024

egahlin commented Nov 22, 2024 • edited Loading

roberttoyonaga commented Nov 22, 2024

srdo commented Nov 22, 2024 • edited Loading

egahlin commented Nov 23, 2024

openjdk bot commented Nov 23, 2024

openjdk bot commented Nov 23, 2024

egahlin commented Nov 24, 2024

srdo commented May 21, 2024 •

edited by openjdk bot

Loading

openjdk bot commented May 21, 2024 •

edited

Loading

srdo commented May 21, 2024 •

edited

Loading

srdo commented Sep 30, 2024 •

edited

Loading

egahlin commented Oct 2, 2024 •

edited

Loading

srdo commented Oct 29, 2024 •

edited

Loading

egahlin commented Nov 21, 2024 •

edited

Loading

egahlin commented Nov 22, 2024 •

edited

Loading

srdo commented Nov 22, 2024 •

edited

Loading