Evaluator Optimizer not optimizing as good as it could #5444

CodingFabian · 2014-10-26T14:08:40Z

I was experimenting with the pdf from #2618 which showcases quite a few issues nicely (not only in pdf.js, but all readers struggle heavily).

It turns out that this PDF uses large sequences of "(save, transform, paintImageMaskXObject, restore)" to draw the road pattern.
When I looked into where reuse of calculations could be performed, I noticed that the optimizer does build groups:

    fnArray.splice(iFirstSave, count * 4, OPS.paintImageMaskXObjectGroup);
    argsArray.splice(iFirstSave, count * 4, [images]);

however in almost all cases of this document these groups contain a large number of identical images.
For example when breaking in that optimize method, only the first image of that group seems to be different.
@nnethercote already noticed something similar on this document.

I am opening this issue to allow others who have more experience with evaluator.js to comment on the issue.
It seems to me that it should be possible to optimize subsequences for the same image a lot better.

The text was updated successfully, but these errors were encountered:

CodingFabian · 2014-10-26T14:30:10Z

Also:

      if (argsArray[iPIMXO][0] !== firstPIMXOArg0 ||
          transformArgs[0] !== firstTransformArg0 ||
          transformArgs[1] !== 0 ||
          transformArgs[2] !== 0 ||
          transformArgs[3] !== firstTransformArg3) {

will do a strict equality check on the objects:

argsArray[iPIMXO][0] !== firstPIMXOArg0

which will fail even if they contain the same content:

Either this should check for content equality (like my json compare, but faster) or the code generating the ops already should ensure object identity.

CodingFabian · 2014-10-26T14:50:30Z

just dumping my findings here:
the first paintImageMaskXObject does not have a cacheKey, thus it is not cached. the second one does have is cached and is reused.
thus the === check works, but not for the first one.

The reason the first image is not coming from the cache is that the current implementation only considers caching when it encounters the same element again, only that second copy is cached, the first one is not:

  // trying to cache repeat images, first we are trying to "warm up" caching
  // using length, then comparing adler32
  var MAX_LENGTH_TO_CACHE = 1000;
  var cacheImage = false, adler32;
  if (length < MAX_LENGTH_TO_CACHE && this.imageCache.length === length) {

CodingFabian · 2014-10-26T15:13:38Z

ok, if caching is fixed, I assume the optimizer improvements are less important. Will send PR for caching soon.

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. Also the previous cache implementation was only checking the last used image. Thus the sequence A1 B1 A2 B2 A3 B3 would be 6 instances of images, even when there are only two different ones. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. Also this implementation will now keep hold of any cachable images. The two examples from above would now be A A A A and A1 B1 A1 B1 A1 B1, which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The two examples from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The example from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

timvandermeij · 2014-12-15T18:38:00Z

Closing as resolved for now since the caching has been fixed. Feel free to create PRs for follow-up performance fixes.

CodingFabian · 2014-12-15T18:50:56Z

its fine, i did not realize the real cause for the optimizer not working. while reordering could help its not of a common that real world use case, I assume.

Snuffleupagus added the core label Oct 26, 2014

CodingFabian mentioned this issue Oct 26, 2014

Fixes caching of inline images during parsing. #5445

Merged

timvandermeij closed this as completed Dec 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluator Optimizer not optimizing as good as it could #5444

Evaluator Optimizer not optimizing as good as it could #5444

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

timvandermeij commented Dec 15, 2014

CodingFabian commented Dec 15, 2014

Evaluator Optimizer not optimizing as good as it could #5444

Evaluator Optimizer not optimizing as good as it could #5444

Comments

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

CodingFabian commented Oct 26, 2014

timvandermeij commented Dec 15, 2014

CodingFabian commented Dec 15, 2014