-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Evaluator Optimizer not optimizing as good as it could #5444
Comments
just dumping my findings here: The reason the first image is not coming from the cache is that the current implementation only considers caching when it encounters the same element again, only that second copy is cached, the first one is not:
|
ok, if caching is fixed, I assume the optimizer improvements are less important. Will send PR for caching soon. |
As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. Also the previous cache implementation was only checking the last used image. Thus the sequence A1 B1 A2 B2 A3 B3 would be 6 instances of images, even when there are only two different ones. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. Also this implementation will now keep hold of any cachable images. The two examples from above would now be A A A A and A1 B1 A1 B1 A1 B1, which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)
As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The two examples from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)
As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The example from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)
As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)
As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)
Closing as resolved for now since the caching has been fixed. Feel free to create PRs for follow-up performance fixes. |
its fine, i did not realize the real cause for the optimizer not working. while reordering could help its not of a common that real world use case, I assume. |
I was experimenting with the pdf from #2618 which showcases quite a few issues nicely (not only in pdf.js, but all readers struggle heavily).
It turns out that this PDF uses large sequences of "(save, transform, paintImageMaskXObject, restore)" to draw the road pattern.
When I looked into where reuse of calculations could be performed, I noticed that the optimizer does build groups:
however in almost all cases of this document these groups contain a large number of identical images.
For example when breaking in that optimize method, only the first image of that group seems to be different.
@nnethercote already noticed something similar on this document.
I am opening this issue to allow others who have more experience with evaluator.js to comment on the issue.
It seems to me that it should be possible to optimize subsequences for the same image a lot better.
The text was updated successfully, but these errors were encountered: