Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bazel Memory Tracker shows incorrect memory consumption #24782

Open
AlexanderGolovlev opened this issue Dec 20, 2024 · 2 comments · May be fixed by #24783
Open

Bazel Memory Tracker shows incorrect memory consumption #24782

AlexanderGolovlev opened this issue Dec 20, 2024 · 2 comments · May be fixed by #24783
Labels
team-Performance Issues for Performance teams type: bug untriaged

Comments

@AlexanderGolovlev
Copy link
Contributor

Description of the bug:

According to documentation Bazel allows to instrument memory allocations. It seems that now this feature provides an incorrect information.
All values of memory consumption for rules are either 0 or values which are roughly multiples of 256kB. For example, the text dump for prof.gz shows:

Showing nodes accounting for 5888.52kB, 100% of 5888.52kB total
      flat  flat%   sum%        cum   cum%
 4352.59kB 73.92% 73.92%  4352.59kB 73.92%  repository_rule <builtin>
  512.02kB  8.70% 82.61%   512.02kB  8.70%  config_setting <builtin>
  256.07kB  4.35% 86.96%   256.07kB  4.35%  create_linking_context_from_compilation_outputs <builtin>
  255.99kB  4.35% 91.31%   255.99kB  4.35%  cc_toolchain_features <builtin>
  255.93kB  4.35% 95.65%   255.93kB  4.35%  cc_object_library <builtin>
...

The total memory consumption is also far from expected values.
It looks like the values of memory consumption are taken from shifted position in memory. It might be caused by changes in internal Java structures.

I believe, the problem is in outdated version of java-allocation-instrumenter-3.3.0.jar used for instrumenting. According to Release notes the support for Java 17 was added in 3.3.2, and support for Java 21 was added in 3.3.4.
However, my attempts to use a newer version of instrumenter were unsuccessful. Bazel crashes with any version higher than 3.3.0. The error message is unclear:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
	at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(Unknown Source)
	at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(Unknown Source)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException: Cannot invoke "java.net.URL.openConnection()" because "url" is null
	at com.google.monitoring.runtime.instrumentation.AllocationInstrumenterBootstrap.premain(AllocationInstrumenterBootstrap.java:50)
	... 4 more
Caused by: java.lang.NullPointerException: Cannot invoke "java.net.URL.openConnection()" because "url" is null
	at com.google.monitoring.runtime.instrumentation.AllocationInstrumenterBootstrap.premain(AllocationInstrumenterBootstrap.java:40)
	... 4 more
*** java.lang.instrument ASSERTION FAILED ***: "!errorOutstanding" with message Outstanding error when calling method in invokeJavaAgentMainMethod at s\src\java.instrument\share\native\libinstrument\JPLISAgent.c line: 627
*** java.lang.instrument ASSERTION FAILED ***: "success" with message invokeJavaAgentMainMethod failed at s\src\java.instrument\share\native\libinstrument\JPLISAgent.c line: 466
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at s\src\java.instrument\share\native\libinstrument\JPLISAgent.c line: 429
FATAL ERROR in native method: processing of -javaagent failed, processJavaStart failed

Which category does this issue belong to?

Performance

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Perform steps according to documentation:

  1. Pass STARTUP_FLAGS to any Bazel invocation:
  STARTUP_FLAGS=\
  --host_jvm_args=-javaagent:<path to java-allocation-instrumenter-3.3.0.jar> \
  --host_jvm_args=-DRULE_MEMORY_TRACKER=1
  1. Run any bazel build
  2. Run bazel dump --rules or bazel dump --skylark_memory=$HOME/prof.gz

Which operating system are you running Bazel on?

Windows, Linux, Mac

What is the output of bazel info release?

release 7.4.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator

fmeum commented Dec 20, 2024

AllocationTrackerModule forces a sample size of 256 KB, which results in all measurements being a multiple of that. I updated the version in #24783, but I can't reproduce the crash with it (on macOS). Could you try again with Bazel from that PR?

@AlexanderGolovlev
Copy link
Contributor Author

AlexanderGolovlev commented Dec 23, 2024

Thank you, @fmeum
I tried with your changes applied over Bazel 7.4.0 and java-allocation-instrumenter-3.3.4.jar. It doesn't crash any more. Also, memory consumption now looks much more real than before. For the same target as in initial comment, it shows 106.48Mb instead of 5.88Mb:

Showing nodes accounting for 94.42MB, 88.72% of 106.42MB total
Dropped 362 nodes (cum <= 0.53MB)
      flat  flat%   sum%        cum   cum%
   58.94MB 55.38% 55.38%    58.94MB 55.38%  create_compilation_context <builtin>
      15MB 14.10% 69.48%       15MB 14.10%  repository_rule <builtin>
    3.76MB  3.54% 73.01%     3.76MB  3.54%  create_linking_context_from_compilation_outputs <builtin>
    3.71MB  3.48% 76.50%     3.71MB  3.48%  depset <builtin>
    3.50MB  3.29% 79.79%     3.50MB  3.29%  glob <builtin>
    1.50MB  1.41% 81.20%     1.50MB  1.41%  format <builtin>
    1.25MB  1.17% 82.37%     1.25MB  1.17%  register_toolchains <builtin>
    1.25MB  1.17% 83.55%     1.25MB  1.17%  declare_file <builtin>
    1.25MB  1.17% 84.72%     1.25MB  1.17%  filegroup <builtin>
    1.01MB  0.94% 85.67%     2.51MB  2.36%  compile <builtin>
       1MB  0.94% 86.61%        1MB  0.94%  precompiled_headers <builtin>
...

It seems that this PR fully resolves the issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
team-Performance Issues for Performance teams type: bug untriaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants