Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Full PDF doc loaded before single page could be rendered #9537

Closed
mustafa0x opened this issue Mar 6, 2018 · 12 comments
Closed

Full PDF doc loaded before single page could be rendered #9537

mustafa0x opened this issue Mar 6, 2018 · 12 comments

Comments

@mustafa0x
Copy link

mustafa0x commented Mar 6, 2018

I understood from the FAQ that pdf.js only downloads what it needs, however, using the sample code from the docs, I noticed, via Chrome DevTools' network panel, that the entire document was loaded first, even though a single page was drawn.

Some things I tried to no avail:

  • Run it on my server.
  • Run it on another pdf file.
  • Run qpdf --linearize on that pdf file.
  • Try HEAD, which gave the same results but the doc was downloaded in 64kb chunks.

My use case is displaying specific pages from very long PDF files (1000+ pages). If pdf.js is not the right tool please let me know.

Related issues: #1108, #2719, #1923, #1375, #2470, #3461, #6104, #8897.

@timvandermeij
Copy link
Contributor

timvandermeij commented Mar 6, 2018

The default range chunk size is

var DEFAULT_RANGE_CHUNK_SIZE = 65536; // 2^16 = 65536

Only PDF files larger than that will use range (chunked) loading. The server must support range requests and the PDF file must be optimized for web (linearized). If that is the case, then range loading should work just fine for your use case (just try it out with some of your own PDF files to make sure).

@mustafa0x
Copy link
Author

mustafa0x commented Mar 7, 2018

Thanks @timvandermeij!

See: http://159.89.108.117/pdfjs-1.10.88/web/load-single.html. I'm using the PDF spec: http://159.89.108.117/PDF32000_2008.pdf.

Using HEAD, the first page was rendered when ~4.5MB was downloaded, but it did continue to download the entire file (22.5MB). See: http://159.89.108.117/pdf.js-master/examples/helloworld/load-single.html

@mustafa0x
Copy link
Author

I ran pdfinfo on PDF32000_2008.pdf, and it told me Optimized: no, so I downloaded a PDF which was optimized (http://159.89.108.117/annual_report_2009.pdf) but the results didn't change

@mustafa0x
Copy link
Author

@timvandermeij Mind giving some input on this?

@timvandermeij
Copy link
Contributor

Your example does use range requests (indicated by response codes 206 in the network tab of the console), so that looks fine to me. I think you may want to disable auto-fetching; see:

* @property {boolean} disableAutoFetch - (optional) Disable pre-fetching of PDF

Disable pre-fetching of PDF file data. When range requests are enabled PDF.js will automatically keep fetching more data even if it isn't needed to display the current page. The default value is false. NOTE: It is also necessary to disable streaming, see above, in order for disabling of pre-fetching to work correctly.

@pravid
Copy link

pravid commented Jan 16, 2019

Will this work in case of pdf stored on other domain?
My pdf is stored on cloud server, all the headers are set properly. It loads the pdf completely before rendering. I'm using normal function,
pdfjsLib.getDocument({ url: DEFAULT_URL, password: "abc", disableStream: false, disableAutoFetch: true, })

Am I missing any parameters here, how can I specify 'range' here?

@Hao-Wu
Copy link

Hao-Wu commented Jul 4, 2019

Will this work in case of pdf stored on other domain?
My pdf is stored on cloud server, all the headers are set properly. It loads the pdf completely before rendering. I'm using normal function,
pdfjsLib.getDocument({ url: DEFAULT_URL, password: "abc", disableStream: false, disableAutoFetch: true, })

Am I missing any parameters here, how can I specify 'range' here?

Hi @pravid , Did you make it work with pdf being stored on other domains?

@pravid
Copy link

pravid commented Jul 4, 2019

@Hao-Wu , Yes in a way. I used cors proxy to sort this issue.
So, while setting default_url, I had to prefix proxy path to it.
Check this link : https://github.com/Rob--W/cors-anywhere (Node JS proxy server)
For more details on CORS : https://humanwhocodes.com/blog/2010/05/25/cross-domain-ajax-with-cross-origin-resource-sharing/

Hope this helps.

@shivamsharmabtp
Copy link

Hi, @pravid i am loading pdf from different source using url /pdfjs/web/viewer.html?file= . It is working fine but loading entire pdf before rendering it. Is it possible to start rendering pdf before complete load? Thank you.

@pravid
Copy link

pravid commented Dec 6, 2019

Yes, check above answer for details. #9537 (comment)
I used CORS to solve this loading issue.
You need to set up nodejs proxy server (ex. https://myproxyserver.com) and
while calling your pdf, set file path as,
var pdfPath = "https://myproxyserver.com/" + "https://mypdffilepath.com/my.pdf";

Hope this helps.

@shivamsharmabtp
Copy link

shivamsharmabtp commented Dec 6, 2019

Hi @pravid , Sorry, i didn't get much. For example you can look at this link of my website . It loads pdf from gcp bucket and perfectly renders it. But the only issue is, it loads complete pdf before rendering it. For large pdfs it takes some time and the reader is blank for minutes. As far as i think you are telling how to render pdf from source not in domain. But this is not my issue. I have already resolved it by commenting out the line which throws error. As i understood i should use https://myproxyserver.com/https://storage.googleapis.com/... pdf url instead of https://storage.googleapis.com/... which is working fine.. i am not able to understand how this will help and how to implement it. I hope you understood my doubt. Thank you.

@pravid
Copy link

pravid commented Dec 6, 2019

Have you setup your node js server?
Pls go through the link to understand how CORS works. They have also given sample files to check for. https://github.com/Rob--W/cors-anywhere (Node JS proxy server)
'https://myproxyserver.com' will be the url of your server you set.
Also do note that, your pdf also should be optimized.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants