PDF.js slow at rendering complex image #2618

fmms · 2013-01-27T21:34:54Z

I tried to look at http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471. The progress bar moves to 100% and then nothing happens.

I am using Firefox 19 and pdf.js 0.7.99 on Ubuntu 13.04.

gigaherz · 2013-01-27T21:56:38Z

[22:56:16.345] Error: bad XRef entry @ resource://pdf.js/build/pdf.js:656

gigaherz · 2013-01-27T21:57:35Z

Same as #2388 ?

fmms · 2013-05-26T09:12:32Z

Still there with todays version:

[11:10:46.743] An error occured while loading the file
PDF.js Version 0.8.180 (build: 3641c22)
Message: bad XRef entry

timvandermeij · 2013-05-26T11:44:32Z

More console output. Error is the same as above, only in Dutch ;-)

error@resource://pdf.js/build/pdf.js:1184
XRef_fetch@resource://pdf.js/build/pdf.js:5554
XRef_fetchIfRef@resource://pdf.js/build/pdf.js:5521
Dict_get@resource://pdf.js/build/pdf.js:4768
pdfjsWrapper/Catalog.prototype.documentOutline@resource://pdf.js/build/pdf.js:4900
LocalPdfManager_ensure@resource://pdf.js/build/pdf.js:542
BasePdfManager_ensureCatalog@resource://pdf.js/build/pdf.js:497
parseSuccess@resource://pdf.js/build/pdf.js:36266
Promise_then@resource://pdf.js/build/pdf.js:1923
pdfjsWrapper/wphSetup/loadDocument//@resource://pdf.js/build/pdf.js:36297
Promise_then@resource://pdf.js/build/pdf.js:1923
pdfjsWrapper/wphSetup/loadDocument/@resource://pdf.js/build/pdf.js:36298
Promise_then@resource://pdf.js/build/pdf.js:1923
loadDocument@resource://pdf.js/build/pdf.js:36299
pdfManagerReady@resource://pdf.js/build/pdf.js:36441
Promise_then@resource://pdf.js/build/pdf.js:1923
wphSetupDoc@resource://pdf.js/build/pdf.js:36440
messageHandlerComObjOnMessage@resource://pdf.js/build/pdf.js:36220
Er is een fout opgetreden bij het laden van het PDF-bestand. PDF.js versie 0.8.180 (build 3641c22) Bericht: bad XRef entry

timvandermeij · 2013-06-28T19:10:22Z

Still renders forever, but there are less (or no) errors in the console now:

Warning: Unable to read document outline
PDF c79c336f9cc7a242a4cd36f2317f7b59 [1.4 PDFsharp 1.31.1789-g (www.pdfsharp.com) / PDFsharp 1.31.1789-g (www.pdfsharp.com)] (PDF.js: 0.8.291)

Note that this PDF, http://march12.rsf.org/i/Report_EnemiesoftheInternet_2012.pdf, has the same console warnings, but does render. Maybe that helps to track down the problem.

Snuffleupagus · 2013-06-30T10:54:19Z

An update: using PDF.js 0.8.296 the file (http://bugzilla-attachments.gnome.org/attachment.cgi?id=226471) actually renders, but it takes approximately 1.5 minutes before it finishes.
As a comparison, it's even slow in Adobe Reader, taking 15 seconds before finishing rendering.

Snuffleupagus · 2013-08-05T19:31:59Z

After #3461 landed, this file renders somewhat quicker than before. In my testing, the rendering time has gone down from approximately 90 to 75 seconds. More importantly than the speed-up, is that with the latest version of PDF.js the text is rendered immediately and you only have to wait for the figure to finish rendering.

timvandermeij · 2013-08-05T19:57:05Z

Renders in 37 seconds for me using PDF.js 0.8.396. Maybe the difference with your results is a difference in configuration, @Snuffleupagus? Indeed, the text is now available immediately, so that is a big advantage.

fmms · 2013-08-11T20:18:17Z

This is indeed just a performance problem now, somebody should either rename or close this bug.

timvandermeij · 2013-08-12T09:41:39Z

@fmms You can rename it yourself since you opened the issue ;-) The issue title should have an Edit button next to it. Just rename it to 'PDF.js slow at rendering complex image' or something like that.

fmms · 2013-12-14T08:55:18Z

Still there with todays version:

"Warning: Unable to read document outline" pdf.js:198
 "PDF c79c336f9cc7a242a4cd36f2317f7b59 [1.4 PDFsharp 1.31.1789-g (www.pdfsharp.com) / PDFsharp 1.31.1789-g (www.pdfsharp.com)] (PDF.js: 0.8.773)"

nnethercote · 2014-08-14T15:09:41Z

This document is notable for having millions of inline images, which are used to build up the textured parts of the picture.

nnethercote · 2014-08-14T15:19:32Z

And disabling QueueOptimizer slows it down a lot, because the paintImageMaskXObjectGroup optimization can't be performed.

makeInlineImage() has a "are the next five chars ASCII?" check which is run after an "EI" sequence has been found. This check involves the creation of a new object because peekBytes() calls subarray(). Unfortunately, the check is currently run on whitespace chars even when an "EI" sequence has not yet been found, i.e. when it's not needed. For the PDF in mozilla#2618, there are over 820,000 such checks. This change reworks the relevant loop so that the check is only done once an "EI" sequence has been seen. This reduces the number of checks to 157,000, and speeds up rendering by somewhere between 2% and 7% (the measurements are noisy).

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. Also the previous cache implementation was only checking the last used image. Thus the sequence A1 B1 A2 B2 A3 B3 would be 6 instances of images, even when there are only two different ones. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. Also this implementation will now keep hold of any cachable images. The two examples from above would now be A A A A and A1 B1 A1 B1 A1 B1, which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The two examples from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A B C D would be seen as A B B B by the evaluator, which prevents using the "repeat" optimization. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. The example from above would now be A A A A which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

As described in mozilla#5444, the evaluator will perform identity checking of paintImageMaskXObjects to decide if it can use paintImageMaskXObjectRepeat instead of paintImageMaskXObjectGroup. This can only ever work if the entry is a cache hit. However the previous caching implementation was doing a lazy caching, which would only consider a image cache worthy if it is repeated. Only then the repeated instance would be cached. As a result of this the sequence of identical images A1 A2 A3 A4 would be seen as A1 A2 A2 A2 by the evaluator, which prevents using the "repeat" optimization. Also only the last encountered image is cached, so A1 B1 A2 B2, would stay A1 B1 A2 B2. The new implementation drops the "lazy" init of the cache. The threshold for enabling an image to be cached is rather small, so the potential waste in storage and adler32 calculation is rather low. It also caches any eligible image by its adler32. The two example from above would now be A1 A1 A1 A1 and A1 B1 A1 B1 which not only saves temporary storage, but also prevents computing identical masks over and over again (which is the main performance impact of mozilla#2618)

CodingFabian · 2014-12-15T16:52:49Z

#5445 has landed, i would go so far that the rendering is now acceptable. what do you guys think?

yurydelendik · 2014-12-15T16:59:04Z

I think it is good. Other viewers (Preview and Chrome) have performance issues with that as well -- try zooming. Closing as resolved

nnethercote · 2014-12-15T21:58:59Z

Nice work, Fabian.

…andard classes This patch was tested using the PDF file from [issue 2618](mozilla#2618), i.e. https://bug570667.bugzilla-attachments.gnome.org/attachment.cgi?id=226471, with the following manifest file: ``` [ { "id": "issue2618", "file": "../web/pdfs/issue2618.pdf", "md5": "", "rounds": 50, "type": "eq" } ] ``` which gave the following results when comparing this patch against the `master` branch: ``` -- Grouped By browser, stat -- browser | stat | Count | Baseline(ms) | Current(ms) | +/- | % | Result(P<.05) ------- | ------------ | ----- | ------------ | ----------- | --- | ---- | ------------- firefox | Overall | 50 | 3417 | 3426 | 9 | 0.27 | firefox | Page Request | 50 | 1 | 1 | 0 | 5.41 | firefox | Rendering | 50 | 3416 | 3426 | 9 | 0.27 | ``` Based on these results, there's not significant performance regression from using standard classes and this patch should thus be OK.

1799927) *Please note:* This is a tentative patch, which only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. To avoid that we replace it with the `MurmurHash3_64` class instead, which is already used in various other parts of the code-base. The one disadvantage of doing this is that it's slightly slower, which in some cases will lead to a performance regression.[1] However I believe that we'll have to accept a smaller regression here, since the alternative is much worse (i.e. broken rendering). One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images. --- [1] Doing some quick benchmarking in the viewer, using `#pdfBug=Stats`, with the PDF document from issue mozilla#2618 shows at least a 10 percent regression.

1799927) *Please note:* This is a tentative patch, which only fixes the "wrong letter" part of bug 1799927. It appears that the simple `computeAdler32` function, used when caching inline images, generates hash collisions for some (very short) TypedArrays. In this case that leads to some of the "letters", which are actually inline images, being rendered incorrectly. To avoid that we replace it with the `MurmurHash3_64` class instead, which is already used in other parts of the code-base. The one disadvantage of doing this is that it's slightly slower, which in some cases will lead to a performance regression.[1] However I believe that we'll have to accept a smaller regression here, since the alternative is much worse (i.e. broken rendering). One small benefit of these changes is that we can avoid creating lots of `Stream`-instances for already cached inline images. --- [1] Doing some quick benchmarking in the viewer, using `#pdfBug=Stats`, with the PDF document from issue mozilla#2618 shows at least a 10 percent regression.

Snuffleupagus mentioned this issue Jun 22, 2013

Misc fixes for corrupted PDFs #3376

Merged

fmms mentioned this issue Sep 14, 2013

[17:29:56.765] TypeError: container is null @ resource://pdf.js/web/viewer.js:1348 #3690

Closed

nnethercote mentioned this issue Aug 14, 2014

Reduce ASCII checks in makeInlineImage(). #5187

Merged

CodingFabian mentioned this issue Oct 26, 2014

Evaluator Optimizer not optimizing as good as it could #5444

Closed

CodingFabian mentioned this issue Oct 26, 2014

Fixes caching of inline images during parsing. #5445

Merged

yurydelendik closed this as completed Dec 15, 2014

Snuffleupagus mentioned this issue Aug 18, 2017

Fix caching of small inline images in Parser.makeInlineImage (issue 8790) #8792

Merged

This was referenced Jan 23, 2018

Regression in issue #8496 with canvas backend #9398

Closed

Take the dictionary, and not just the image data, into account when caching inline images (issue 9398) #9420

Merged

This was referenced Jul 22, 2019

Inline the isCmd check in the Parser.shift method #11001

Merged

Reduce the number of function calls in EvaluatorPreprocessor.read #11012

Merged

Snuffleupagus mentioned this issue Aug 16, 2019

Inline the isString check in the Parser.getObj method #11070

Merged

Snuffleupagus mentioned this issue Jun 26, 2020

Attempt to detect inline images which contain "EI" sequence in the actual image data (issue 11124) #12028

Merged

Snuffleupagus mentioned this issue May 12, 2021

Convert the remaining functions in src/core/primitives.js to use standard classes #13366

Merged

Snuffleupagus mentioned this issue Nov 9, 2022

Use MurmurHash3_64 when computing the cacheKey for inline images (bug 1799927) #15677

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF.js slow at rendering complex image #2618

PDF.js slow at rendering complex image #2618

fmms commented Jan 27, 2013

gigaherz commented Jan 27, 2013

gigaherz commented Jan 27, 2013

fmms commented May 26, 2013

timvandermeij commented May 26, 2013

timvandermeij commented Jun 28, 2013

Snuffleupagus commented Jun 30, 2013

Snuffleupagus commented Aug 5, 2013

timvandermeij commented Aug 5, 2013

fmms commented Aug 11, 2013

timvandermeij commented Aug 12, 2013

fmms commented Dec 14, 2013

nnethercote commented Aug 14, 2014

nnethercote commented Aug 14, 2014

CodingFabian commented Dec 15, 2014

yurydelendik commented Dec 15, 2014

nnethercote commented Dec 15, 2014

PDF.js slow at rendering complex image #2618

PDF.js slow at rendering complex image #2618

Comments

fmms commented Jan 27, 2013

gigaherz commented Jan 27, 2013

gigaherz commented Jan 27, 2013

fmms commented May 26, 2013

timvandermeij commented May 26, 2013

timvandermeij commented Jun 28, 2013

Snuffleupagus commented Jun 30, 2013

Snuffleupagus commented Aug 5, 2013

timvandermeij commented Aug 5, 2013

fmms commented Aug 11, 2013

timvandermeij commented Aug 12, 2013

fmms commented Dec 14, 2013

nnethercote commented Aug 14, 2014

nnethercote commented Aug 14, 2014

CodingFabian commented Dec 15, 2014

yurydelendik commented Dec 15, 2014

nnethercote commented Dec 15, 2014