Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add AnnotationOverlay option to cache all annotation instances #494

Closed
wants to merge 1 commit into from

Conversation

Postremus
Copy link

@Postremus Postremus commented Feb 9, 2025

Instead of only the transformed ones of a declaration site

When activated, this saves about 20ms of hot reload startup time.

During investigation for quarkusio/quarkus#45631, I noticed that getOriginalAnnotations was called a lot of times (99k times).
After the sentinel value, and hard coding this getAnnotationsFor to always cache everything, this reduced to 4,8k times, and showed about 20ms of improvement.
My guess is, that this reduction happens mostly because, getOriginalAnnotations construct a hashset for the annotations of a declaration. Additionally, e.g. ClassInfo#declaredAnnotations also constructs an additional arraylist of the ClassInfos annotation.

The original mission statement (#255) of the AnnotationOverlay stated for Sentinel:

In case no annotation transformation affected given declaration, we should be able to store a sentinel value that means "just look it up from the passed annotation target". That would conserve memory.

I decided to still preserve this implementation detail. Could be usefull for e.g. Environments, where jandex is still accessible at runtime (maybe wildfly? no idea).

Therefore I implemented this change as an addtional option for the builder, called for now cacheAll. Please suggest other names!
When not activated, the annotation overlay caches only the annotations of declarations, where an annotation transformer had any impact.
When activated, the annotation overlay additonally also cached any non transformer affected declarations annotations.

I took the about 20ms improvement measurement, by starting quarkus in dev mode (mvn clean compile quarkus:dev), and then pressing s 6 times, which forces hot reloads.
Number is taken from the "(Aesh InputStream Reader) Live reload total time:" line since that is the time that matters to users.

For Quarkus 3.18.2:

673ms
618ms
610ms
583ms
590ms
655ms
avg 621,5ms

For Quarkus 3.18.2 with this patch:

639ms
613ms
657ms
578ms
543ms
539ms
avg 594,8333333ms

My plan would be to activate this option for now only in quarkus dev mode.

@Postremus
Copy link
Author

/cc @gsmet would love to hear your thoughts about this.


FieldInfo field = clazz.field("field");
assertNotNull(field);
for (boolean cacheAll : Arrays.asList(true, false)) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea how to test my changes properly, any ideas?
For now just expand the assertions, to make sure nothing breaks with cacheAll activated

Copy link
Contributor

@gsmet gsmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, first, thanks for having a look at this. It's always good to have new eyes shaking things and questioning the status quo (even if we end up with different patches in the end).

Here are a few thoughts. Note that I don't know this code at all so it's more questions than bright ideas. I'm pretty sure @Ladicek will have a lot more to say about this.

From your use case, does it happen often that there are no annotations? Because if so:

  1. We could optimize getOriginalAnnotations to not create the HashSet if there are no annotations - and I think even if there is only one. I wouldn't make the code unreadable for that but... if in some easy cases (such as declaration.declaredAnnotations() being empty or size 1), we could improve things, that might help
  2. Caching a Set.of() for these the case where we don't have any annotations shouldn't take much memory - and could be done always. Now I have no idea if it happens often. You still store the key though, might need some measurements.
  3. It's a bit unfortunate that runtime annotations are in the same bucket because we could have returned the annotations without having to filter them. But it might get a bit complex to change that now and I suppose there are good reasons for that, @Ladicek?
  4. Is it just me or for the first call, we trigger getOriginalAnnotations twice? We should probably try to avoid that.
  5. From my experience, when storing things like that, it might be a good idea to check if results are null or only one element to use empty sets or Set.of(element) for the storage.
  6. Given we don't use the index at runtime in Quarkus, we could use this cache always. Now probably a good idea to measure the cost.

All this might be quite naive and probably best to wait for @Ladicek to chime in.

I think it would also make sense to take a step back and check if we aren't calling this infrastructure too much i.e. do we have things in Quarkus that should be optimized to avoid doing so many calls. Note that I have no idea if it's the case but if you have an asyncprofiler profile, maybe we could have a better idea.

Instead of only the transformed ones of a declaration site
@Postremus Postremus force-pushed the issues/45631-all-cache branch from 3e70e62 to b9de48e Compare February 10, 2025 05:43
@Postremus
Copy link
Author

We could optimize getOriginalAnnotations to not create the HashSet if there are no annotations - and I think even if there is only one. I wouldn't make the code unreadable for that but... if in some easy cases (such as declaration.declaredAnnotations() being empty or size 1), we could improve things, that might help

Yep, that makes sense. I pushed a change for this.
declaredAnnotations returns an empty set about 3,2k times, even with cacheAll. So this should also benefit everyone.

In light of these numbers, I also did:

Caching a Set.of() for these the case where we don't have any annotations shouldn't take much memory - and could be done always. Now I have no idea if it happens often. You still store the key though, might need some measurements.

@Ladicek
Copy link
Contributor

Ladicek commented Feb 10, 2025

Interesting. Could you please share the application you used to get those numbers? I'm wondering if those memory savings I was thinking about are actually significant. It's quite possible most of the cached objects already exist anyway, so maybe we don't really conserve as much as I thought.

@gsmet
Copy link
Contributor

gsmet commented Feb 10, 2025

Yeah my advice would be to:

  • Try to reduce the size of the cache (by using Set.of() and Set.of(annotation) for small sizes - it made quite a difference in HV)
  • Try to reduce allocations of getOriginalAnnotations in the simple cases

Now, the approach of not caching the entries when they are identical makes perfect sense if the operation is transparent when identical but currently the cost is not exactly null given you create a new HashSet.

Also interested in your feedback on point 4. above.

@Postremus
Copy link
Author

@Ladicek I used the reproducer from quarkusio/quarkus#45631
And added quarkus.keycloak.devservices.enabled=false to application.properties, just to prevent some of the noise.

@Ladicek
Copy link
Contributor

Ladicek commented Feb 10, 2025

Ah I see the reproducer now. Thanks, will take a look.

@Ladicek
Copy link
Contributor

Ladicek commented Feb 10, 2025

Also interested in your feedback on point 4. above.

You mean

4. Is it just me or for the first call, we trigger getOriginalAnnotations twice? We should probably try to avoid that.

?

If so, yes, that seems correct in case there's no actual transformation and the result is SENTINEL. It's possible we don't need that at all, and if we do, there should be a way to get rid of the second invocation. I'll check.

@Ladicek
Copy link
Contributor

Ladicek commented Feb 10, 2025

I tried dumping some heap dumps using jmap and analyze them using MAT, but that can hardly be considered precise, because all the Quarkus build-time stuff comes and goes. I avoided the live option of jmap and configured MAT to keep unreachable objects, but there's still a lot of noise. In any case, The basic reproducer from above results in annotation overlays of retained heap size of about 2 MB each, with virtually no difference between the previous behavior and the behavior introduced in this patch. So I'll just drop the concept of a sentinel value altogether, because it doesn't seem to add any value.

@Ladicek
Copy link
Contributor

Ladicek commented Feb 11, 2025

Thanks for this PR! I took the liberty of creating a simple (simplistic, rather) microbenchmark and did several simple optimizations (including the one from this PR) that show even somewhat better numbers than this PR. They also significantly lower the allocation rate. This is in #495.

Hence, closing this PR. Thanks again!

@Ladicek Ladicek closed this Feb 11, 2025
@Postremus
Copy link
Author

Thank you for taking over, @Ladicek <3

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants