Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

8341427: JFR: Adjust object sampler span handling #19334

Closed
wants to merge 1 commit into from

Conversation

srdo
Copy link
Contributor

@srdo srdo commented May 21, 2024

The span stored in each sample is not the calculated span, it's just the object's byte size (allocated). That means as soon as any object falls out of the queue, the spans in the queue no longer sum to cover the allocation timeline. This causes all future samples to be added to be unduly prioritized for adding to the queue, because they are given an artificially high span. In effect, future samples are weighted as if they cover both the interval between themselves and the older neighbor sample, plus all "missing spans" from nodes that have been discarded since the program started.

Changed object samples to store the calculated span rather than the bytes allocated for the sampled object.

When a sample is removed from the queue because a sample with a larger span is being added, the span of the removed node is not handed to the younger neighbor, this only happens when a sample is removed due to GC. This means that the span will be given to the next sample added to the queue. When the sample being removed is the youngest sample, this is fine, but when it's a sample that has a younger neighbor, the span should probably be given to that neighbor rather than the newcomer. Handing it to the newcomer gives the new sample a high weight it doesn't deserve. It ends up covering not just the span to the older neighbor, but also the span of the removed node, which is not what we want.

When replacing a sample in the queue, give the span of the removed sample to the younger neighbor. If there is no such neighbor, because the youngest sample is being replaced, give the span to the node being added instead, as that will become the new youngest sample.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8341427: JFR: Adjust object sampler span handling (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19334/head:pull/19334
$ git checkout pull/19334

Update a local copy of the PR:
$ git checkout pull/19334
$ git pull https://git.openjdk.org/jdk.git pull/19334/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19334

View PR using the GUI difftool:
$ git pr show -t 19334

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19334.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label May 21, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented May 21, 2024

Hi @srdo, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user srdo" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented May 21, 2024

@srdo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341427: JFR: Adjust object sampler span handling

Reviewed-by: egahlin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 880 new commits pushed to the master branch:

  • 9769ee8: 8344652: Remove access control context text from SSLEngine and SSLSession APIs
  • 82c3612: 8344830: [BACKOUT] JDK-8341334: CDS: Parallel relocation
  • 64e4aa2: 8339916: AIOOBE due to Math.abs(Integer.MIN_VALUE) in tests
  • bf374c3: 8343453: Modernize FloatingDecimal tests
  • 847f65c: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553
  • 8903854: 8344718: Test runtime/cds/appcds/jigsaw/addmods/AddmodsOption.java fails on Linuxppc64le after JDK-8344239
  • a07b72b: 8344346: java/net/httpclient/ShutdownNow.java fails with java.lang.AssertionError: client was still running, but exited after further delay: timeout should be adjusted
  • 2ea0364: 8343893: Test jdk/jfr/event/runtime/TestNativeMemoryUsageEvents.java failed: heap should have grown and NMT should show that: expected 0 > 0
  • 50c099d: 8344799: Remove permissions checks from java.awt.Desktop
  • e21d06f: 8344338: javax/swing/JTextArea/bug4265784.java fails on Ubuntu 24.04.1
  • ... and 870 more: https://git.openjdk.org/jdk/compare/f1bf469b4ee07b48b629a126111e307d3cab7fd7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@egahlin) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented May 21, 2024

@srdo The following label will be automatically applied to this pull request:

  • hotspot-jfr

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label May 21, 2024
@srdo
Copy link
Contributor Author

srdo commented May 21, 2024

cc @egahlin, as per this thread https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006264.html

Master currently does not compile, so I checked that these changes can at least build when applied to the jdk-23+23 tag. Maybe you can give me a hint about which tests to run, the full tier-1 set takes a long time, and showed a few failures for me, even with no change to the code?

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 16, 2024

@srdo This pull request has been inactive for more than 8 weeks and will be automatically closed if another 8 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 10, 2024

@srdo This pull request has been inactive for more than 16 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Sep 10, 2024
@srdo
Copy link
Contributor Author

srdo commented Sep 30, 2024

/open

@openjdk openjdk bot reopened this Sep 30, 2024
@openjdk
Copy link

openjdk bot commented Sep 30, 2024

@srdo This pull request is now open

@srdo
Copy link
Contributor Author

srdo commented Sep 30, 2024

/signed

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Sep 30, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Sep 30, 2024

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

The span stored in each sample is not the calculated span, it's just the
object's byte size (`allocated`). That means as soon as any object falls
out of the queue, the spans in the queue no longer sum to cover the
allocation timeline. This causes all future samples to be added to be
unduly prioritized for adding to the queue, because they are given an
artificially high span. In effect, future samples are weighted as if they
cover both the interval between themselves and the older neighbor sample,
plus all "missing spans" from nodes that have been discarded since the
program started.

Changed object samples to store the calculated span rather than the bytes
allocated for the sampled object.

When a sample is removed from the queue because a sample with a larger
span is being added, the span of the removed node is not handed to the
younger neighbor, this only happens when a sample is removed due to GC.
This means that the span will be given to the next sample added to the
queue. When the sample being removed is the youngest sample, this is fine,
but when it's a sample that has a younger neighbor, the span should
probably be given to that neighbor rather than the newcomer. Handing it to
the newcomer gives the new sample a high weight it doesn't deserve. It ends
up covering not just the span to the older neighbor, but also the span of
the removed node, which is not what we want.

When replacing a sample in the queue, give the span of the removed sample
to the younger neighbor. If there is no such neighbor, because the youngest
sample is being replaced, give the span to the node being added instead,
as that will become the new youngest sample.
@srdo
Copy link
Contributor Author

srdo commented Sep 30, 2024

/covered

Employer: Crowdstrike

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 30, 2024

Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@egahlin
Copy link
Member

egahlin commented Oct 2, 2024

I created a small program with a constant memory leak, and I observed that the samples were fewer at the end. I will try your patch and see if it fixes the issue once the OCA clears.

I filed a bug to track the issue:
https://bugs.openjdk.org/browse/JDK-8341427

If you change the title of the PR to "8341427: JFR: Adjust object sampler span handling" they will be connected.

@srdo srdo changed the title Adjust object sampler span handling 8341427:JFR: Adjust object sampler span handling Oct 10, 2024
@srdo
Copy link
Contributor Author

srdo commented Oct 15, 2024

Anything I can do to speed up the OCA check? The agreement for my employer seems like it was approved a while ago, as it's listed on https://oca.opensource.oracle.com/?ojr=contrib-list.

@robilad
Copy link

robilad commented Oct 28, 2024

I've pinged the OCA signatory at your company again to verify your account is going to be contributing on their behalf.

@srdo
Copy link
Contributor Author

srdo commented Oct 29, 2024

@robilad Thanks. He forwarded it to my corporate email, and I've replied to your email from that corporate address, which I hope covers this verification.

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Nov 13, 2024
@srdo
Copy link
Contributor Author

srdo commented Nov 14, 2024

@egahlin For when you have time to look at this again, the OCA stuff should be handled now (thanks Dalibor)

@egahlin
Copy link
Member

egahlin commented Nov 21, 2024

I tried your change and it seems to fix the problem. Could you move the PR to review state, so it will be sent out on the JFR mailing list and a webrev will be generated.

@srdo srdo marked this pull request as ready for review November 22, 2024 12:07
@srdo
Copy link
Contributor Author

srdo commented Nov 22, 2024

Yes, done

@egahlin
Copy link
Member

egahlin commented Nov 22, 2024

Yes, done

I think there is a space missing in the title, try "8341427: JFR: Adjust object sampler span handling".

@srdo srdo changed the title 8341427:JFR: Adjust object sampler span handling 8341427: JFR: Adjust object sampler span handling Nov 22, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 22, 2024
@mlbridge
Copy link

mlbridge bot commented Nov 22, 2024

Webrevs

Copy link
Member

@egahlin egahlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can sponsor this change.

@openjdk
Copy link

openjdk bot commented Nov 22, 2024

⚠️ @srdo the full name on your profile does not match the author name in this pull requests' HEAD commit. If this pull request gets integrated then the author name from this pull requests' HEAD commit will be used for the resulting commit. If you wish to push a new commit with a different author name, then please run the following commands in a local repository of your personal fork:

$ git checkout object-sampling
$ git commit --author='Preferred Full Name <you@example.com>' --allow-empty -m 'Update full name'
$ git push

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 22, 2024
@srdo
Copy link
Contributor Author

srdo commented Nov 22, 2024

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 22, 2024
@openjdk
Copy link

openjdk bot commented Nov 22, 2024

@srdo
Your change (at version 376f047) is now ready to be sponsored by a Committer.

@roberttoyonaga
Copy link
Contributor

Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)?

@egahlin
Copy link
Member

egahlin commented Nov 22, 2024

Just a question, when a node is removed, why is the span pushed onto the younger neighbor? Wouldn't be better to emphasize the older neighbor since they've survived longer (and so are more likely to be a leak)?'

My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size was larger, and the span belongs to the next sample in time. I have a vague memory that the span at some point was split into younger and older samples, but I didn't go with that solution.

Let me think about it.

@roberttoyonaga
Copy link
Contributor

My thinking was that if it is removed, it's like it was never sampled. It would be as if the TLAB size were larger, and the span belongs to the next sample in time

Thanks for the explanation. Ok I see what you mean. It think splitting it evenly to both makes sense as well.

@srdo
Copy link
Contributor Author

srdo commented Nov 22, 2024

@roberttoyonaga

Background thread describing how I understand this algorithm to be intended to work https://mail.openjdk.org/pipermail/hotspot-jfr-dev/2024-May/006255.html

The goal is to get samples evenly spread over the entire allocation timeline. My understanding is that we want samples to account for the span "to their left" on the allocation timeline. A fresh sample will cover the span between itself and the previous sample.

By giving the span of a removed sample to the younger neighbor, we get the spans adjusted as if we never had the removed sample.

That's not the case if we give the span to the older neighbor or split the span between the two neighbors.

A small example:

Sample 1 (span 0...10)
Sample 2 (span 10...20)
Sample 3 (span 20...30)

If we add another sample at byte 40 on the timeline and drop sample 2, I think we'd like to get this:

Sample 1 (0...10)
Sample 3 (10...30)
Sample 4 (30...40)

The spans of the samples are accurately representing which span of "time" on the allocation timeline the sample represents. In this case we'd be very likely to want to keep sample 3 because it covers a large span.

If we split the span instead, we'd get

Sample 1 (0...15)
Sample 3 (15...30)
Sample 4 (30...40)

Sample 1 now claims to represent 0...15 on the timeline, even though the sample was actually created before the end of that interval. I think the effect this could have is to allow older samples an advantage in being kept, which might skew which samples we keep toward older samples, which causes the distribution of samples over the timeline to be uneven.

Edit:
What I'm getting at is that if the goal is to keep evenly distributed samples over the allocation timeline, then sample 3 should be preferred over sample 1 when future samples arrive, and if we split the span, it won't be.

When more samples arrive, I think it is better to keep a sample taken at byte 30 (sample 3) than to keep a sample taken at byte 10 (sample 1), if we're going for an even distribution of the samples.

@egahlin
Copy link
Member

egahlin commented Nov 23, 2024

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 23, 2024

Going to push as commit 822a155.
Since your change was applied there have been 896 commits pushed to the master branch:

  • d00f311: 8343150: Change URLClassLoader.getPermissions to return empty PermissionCollection
  • effee12: 8344533: CTW: Add option to remove clinits before loading
  • 70c4e2c: 8344587: Reduce number of "jdk.jpackage.internal" classes used from other packages
  • 1114704: 6672644: JComboBox still scrolling if switch to another window and return back
  • 98b6678: 8343741: SA jstack --mixed should print information about VM locks
  • 1b2d9ca: 8344881: Problemlist java/awt/Robot/InfiniteLoopException.java on Linux
  • 6aec2dc: 8344788: Specify that the access control context parameters of Subject.doAsPrivileged are ignored
  • 079f503: 8344568: Renaming ceil_log2 to log2i_ceil
  • 51763b6: 8344525: Fix leftover ExceptionOccurred in java.base
  • 4b16530: 8344795: Remove uses of AccessControlContext in java.desktop module
  • ... and 886 more: https://git.openjdk.org/jdk/compare/f1bf469b4ee07b48b629a126111e307d3cab7fd7...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 23, 2024
@openjdk openjdk bot closed this Nov 23, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 23, 2024
@openjdk
Copy link

openjdk bot commented Nov 23, 2024

@egahlin @srdo Pushed as commit 822a155.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@egahlin
Copy link
Member

egahlin commented Nov 24, 2024

@srdo Thanks for you contribution!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
hotspot-jfr hotspot-jfr-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants