Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Isolated agent classloader #2109

Merged
merged 29 commits into from
Sep 20, 2021
Merged

Isolated agent classloader #2109

merged 29 commits into from
Sep 20, 2021

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Sep 7, 2021

What does this PR do?

Module changes

  • Renames elastic-apm-agent to apm-agent.
    • This module doesn't contain Premain-Class/Agent-Class manifest entries (it cannot be used via -javaagent)
    • It gets published to maven central for the sole purpose of making it easier to debug the agent.
    • This module contains the bulk of the agent, including the core and all instrumentation plugins.
    • Because this module will be loaded from an isolated class loader hierarchy, we don't need to shade/relocate dependencies anymore.
  • Renames apm-agent-premain to elastic-apm-agent.
    • This one does have Premain-Class/Agent-Class manifest entries.
    • This module contains just a minimal set of classes that are needed for bootstrapping the agent
    • It contains the classes from apm-agent as shaded resource in the agent folder.
    • All .class files are renamed to .esclass so that they can't be loaded by regular class loaders
    • It creates an ShadedClassLoader, aka the agent class loader, which can load the .esclass files.
    • This makes all the agent classes, besides java.lang.IndyBootstrapDispatcher invisible to the regular class loader hierarchy.
      • This ensures that there can be no conflicts with equally named classes in regular class loader hierarchy, so shading/relocating is not necessary anymore.
      • Also, class path scanning tools don't get slowed down or trip over agent classes.
      • In addition, this provides a better solution for the OSGi problem (or similar filtering class loaders) that doesn't rely on instrumenting all class loaders and sun.misc.Unsafe-based injection (InjectionStrategy.UsingUnsafe/ InjectionStrategy.UsingReflection).
  • Adds a new apm-agent-common module.
    • Contains classes that are needed during agent startup, in the agent itself, and in the attacher module
    • One example is the ResourceExtractionUtil that extracts a class path resource to the temp dir, adds a md5 hash and re-uses already extracted resources if the hashes match.

This is how the new structure looks like in detail:
Screen Shot 2021-09-10 at 14 57 22

More details about the indy-dispatching approach can be found in the Javadocs of IndyBootstrap

Checklist

What has previously been named elastic-apm-agent is now apm-agent.
The new elastic-apm-agent module is just responsible for the startup (agentmain/premain) and loads the actual agent from an isolated class loader hierarchy.
The log shading config is more peculiar
@felixbarny felixbarny self-assigned this Sep 7, 2021
@apmmachine
Copy link
Contributor

apmmachine commented Sep 7, 2021

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Reason: Aborted from #38

  • Start Time: 2021-09-20T06:37:52.831+0000

  • Duration: 10 min 26 sec

  • Commit: 9cfd1ac

Trends 🧪

Image of Build Times

Log output

Expand to view the last 100 lines of log output

[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-spring-webmvc-plugin ............ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-urlconnection-plugin ............ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-scheduled-annotation-plugin ..... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-quartz-job-plugin ............... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-process-plugin .................. SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-dubbo-plugin .................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-scala-concurrent-plugin ......... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-struts-plugin ................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-vertx ........................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-vertx-common .................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-vertx3-plugin ................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-vertx4-plugin ................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-sparkjava-plugin ................ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-jdk-httpserver-plugin ........... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-javalin-plugin .................. SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-agent ........................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:elastic-apm-agent ................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-agent-benchmarks ................ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-spring-resttemplate-test ........ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-logback-plugin-legacy-tests ..... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-es-restclient-plugin-7_x ........ SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-jms-spring-plugin ............... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-jedis-2-tests ................... SKIPPED
[2021-09-20T06:46:33.487Z] [INFO] co.elastic.apm:apm-jedis-3-tests ................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-lettuce-3-tests ................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-grpc-test-1.6.1 ................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-grpc-test-latest ................ SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-rabbitmq-test-3 ................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-rabbitmq-test-4 ................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-okhttp-test ..................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-vertx3-test-latest .............. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-opentracing ..................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:integration-tests ................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:simple-webapp ....................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:jsf-app ............................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:jsf-app-dependent ................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:jsf-app-standalone .................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:cdi-app ............................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:cdi-app-dependent ................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:cdi-app-standalone .................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:soap-test ........................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:external-plugin-test ................ SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:plugin-instrumentation-target ....... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:external-plugin-app ................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:external-plugin ..................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-agent-attach .................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:apm-agent-attach-cli ................ SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:application-server-integration-tests  SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:spring-boot-1-5 ..................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:spring-boot-2 ....................... SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:spring-boot-2-base .................. SKIPPED
[2021-09-20T06:46:33.488Z] [INFO] co.elastic.apm:spring-boot-2-jetty ................. SKIPPED
[2021-09-20T06:46:33.489Z] [INFO] co.elastic.apm:spring-boot-2-tomcat ................ SKIPPED
[2021-09-20T06:46:33.489Z] [INFO] co.elastic.apm:spring-boot-2-undertow .............. SKIPPED
[2021-09-20T06:46:33.489Z] [INFO] ------------------------------------------------------------------------
[2021-09-20T06:46:33.489Z] [INFO] BUILD SUCCESS
[2021-09-20T06:46:33.489Z] [INFO] ------------------------------------------------------------------------
[2021-09-20T06:46:33.489Z] [INFO] Total time:  25.700 s
[2021-09-20T06:46:33.489Z] [INFO] Finished at: 2021-09-20T06:46:33Z
[2021-09-20T06:46:33.489Z] [INFO] ------------------------------------------------------------------------
[2021-09-20T06:48:17.717Z] Stage "Tests" skipped due to earlier failure(s)
[2021-09-20T06:48:17.789Z] Stage "Unit Tests" skipped due to earlier failure(s)
[2021-09-20T06:48:17.791Z] Stage "Smoke Tests 01" skipped due to earlier failure(s)
[2021-09-20T06:48:17.793Z] Stage "Smoke Tests 02" skipped due to earlier failure(s)
[2021-09-20T06:48:17.795Z] Stage "Benchmarks" skipped due to earlier failure(s)
[2021-09-20T06:48:17.796Z] Stage "Javadoc" skipped due to earlier failure(s)
[2021-09-20T06:48:17.872Z] Failed in branch Unit Tests
[2021-09-20T06:48:17.873Z] Failed in branch Smoke Tests 01
[2021-09-20T06:48:17.874Z] Failed in branch Smoke Tests 02
[2021-09-20T06:48:17.874Z] Failed in branch Benchmarks
[2021-09-20T06:48:17.875Z] Failed in branch Javadoc
[2021-09-20T06:48:17.935Z] Stage "Integration Tests" skipped due to earlier failure(s)
[2021-09-20T06:48:17.972Z] Stage "JDK Compatibility Tests" skipped due to earlier failure(s)
[2021-09-20T06:48:18.029Z] Stage "Matrix - JAVA_VERSION = 'openjdk12'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.029Z] Stage "Matrix - JAVA_VERSION = 'openjdk13'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.030Z] Stage "Matrix - JAVA_VERSION = 'openjdk14'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.030Z] Stage "Matrix - JAVA_VERSION = 'openjdk15'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.031Z] Stage "Matrix - JAVA_VERSION = 'openjdk16'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.063Z] Stage "Matrix - JAVA_VERSION = 'openjdk12'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.064Z] Stage "Matrix - JAVA_VERSION = 'openjdk13'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.065Z] Stage "Matrix - JAVA_VERSION = 'openjdk14'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.066Z] Stage "Matrix - JAVA_VERSION = 'openjdk15'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.067Z] Stage "Matrix - JAVA_VERSION = 'openjdk16'" skipped due to earlier failure(s)
[2021-09-20T06:48:18.197Z] Failed in branch Matrix - JAVA_VERSION = 'openjdk12'
[2021-09-20T06:48:18.198Z] Failed in branch Matrix - JAVA_VERSION = 'openjdk13'
[2021-09-20T06:48:18.199Z] Failed in branch Matrix - JAVA_VERSION = 'openjdk14'
[2021-09-20T06:48:18.200Z] Failed in branch Matrix - JAVA_VERSION = 'openjdk15'
[2021-09-20T06:48:18.201Z] Failed in branch Matrix - JAVA_VERSION = 'openjdk16'
[2021-09-20T06:48:18.261Z] Stage "Stable" skipped due to earlier failure(s)
[2021-09-20T06:48:18.300Z] Stage "AfterRelease" skipped due to earlier failure(s)
[2021-09-20T06:48:18.317Z] Stage "AfterRelease" skipped due to earlier failure(s)
[2021-09-20T06:48:18.648Z] Running on Jenkins in /var/lib/jenkins/workspace/_java_apm-agent-java-mbp_PR-2109
[2021-09-20T06:48:18.753Z] [INFO] getVaultSecret: Getting secrets
[2021-09-20T06:48:18.802Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-09-20T06:48:19.523Z] + chmod 755 generate-build-data.sh
[2021-09-20T06:48:19.524Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-agent-java/apm-agent-java-mbp/PR-2109/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-agent-java/apm-agent-java-mbp/PR-2109/runs/37 ABORTED 626429
[2021-09-20T06:48:19.524Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-agent-java/apm-agent-java-mbp/PR-2109/runs/37/steps/?limit=10000 -o steps-info.json
[2021-09-20T06:48:19.774Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-agent-java/apm-agent-java-mbp/PR-2109/runs/37/tests/?status=FAILED -o tests-errors.json
[2021-09-20T06:48:19.774Z] Retry 1/3 exited 22, retrying in 1 seconds...
[2021-09-20T06:48:21.117Z] Retry 2/3 exited 22, retrying in 2 seconds...

@felixbarny felixbarny force-pushed the isolated-agent-classloader branch from 956268d to 775e97b Compare September 8, 2021 07:52
@felixbarny felixbarny linked an issue Sep 8, 2021 that may be closed by this pull request
Copy link
Contributor

@raphw raphw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@felixbarny
Copy link
Member Author

One remaining issue is that the LatentKeys in WeakConcurrentMap aren't re-used anymore as the agent class loader is not a persistent class loader as far as WeakConcurrentMap can tell.
Currently, the agent does not support to be detached, therefore the agent CL is effectively persistent. While it's possible to provide a flag to WeakConcurrentMap to override the auto-detection of persistent class loaders, it's not possible to provide a custom WeakConcurrentMap to WeakConcurrentSet and DetachedThreadLocal. The latter would be difficult to change as its constructor creates an anonymous subclass of WeakConcurrentMap to override defaultValue.

@raphw
Copy link
Contributor

raphw commented Sep 13, 2021

Are you using the latest version of it? I factored out an AbstractWeakConcurrentMap at some point to make it easier to achieve this for custom class loader scenarios.

@felixbarny
Copy link
Member Author

Could you elaborate on how a custom implementation would look like in our case that reuses lookup keys but doesn’t rely on the agent class loader to be persistent? Does that include creating a persistent class loader as the parent of the agent class loader?

@raphw
Copy link
Contributor

raphw commented Sep 13, 2021

Yes, you would need a separate class loader that serves as the parent for your agent. In your trampoline, you would create this class loader and store it in a field. If the field value is already set, you skip this step and reuse this class loader with the weak keys as a parent to your agent class loader. This way, you can get rid of the agent and still have your keys sticky. As the key class loader is referenced by the system class loader, it is persistent just as the system loader itself. (The key loader would in your case still have the platform loader as its parent.)

@felixbarny felixbarny force-pushed the isolated-agent-classloader branch from 6727b0d to 74c9132 Compare September 16, 2021 13:23
@felixbarny
Copy link
Member Author

felixbarny commented Sep 16, 2021

Yes, you would need a separate class loader that serves as the parent for your agent.

I've implemented that in this commit: felixbarny@771c27e

I'll create a PR once this one is merged.
Detachment is still not yet supported, but I sketched out how it may look like (in AgentMain::detach) and what needs to be done to fully support it.
However, the class loader architecture is ready for it now.

The biggest shift will be that the trampoline agent can't ship with the main agent. Instead, the main agent needs to be on the file system and updating the agent needs to happen by dropping in a new agent version in the agents folder (similar to hot-deploying a webapp).
One alternative may be to append the version/git hash to the directory containing the shaded agent classes.

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
One request for extending the CL test.
Also, please add to the PR description which manual tests were applied.

@felixbarny felixbarny mentioned this pull request Sep 20, 2021
2 tasks
@felixbarny felixbarny merged commit c5fede1 into master Sep 20, 2021
@felixbarny felixbarny deleted the isolated-agent-classloader branch September 20, 2021 07:09
@apmmachine
Copy link
Contributor

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-09-20T06:45:18.094+0000

  • Duration: 61 min 36 sec

  • Commit: 3cef676

Test stats 🧪

Test Results
Failed 0
Passed 2403
Skipped 19
Total 2422

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2403
Skipped 19
Total 2422

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
5 participants