Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improved memory usage #1032

Merged
merged 1 commit into from
Nov 6, 2021
Merged

Improved memory usage #1032

merged 1 commit into from
Nov 6, 2021

Conversation

Masterwow3
Copy link
Contributor

We have been using pdf-lib for the last 4 months to generate our PDFs on the WEB and on the server. I noticed that for a PDF with 180 pages and high resolution PNGs, the RAM usage increases a lot (~6GB).

My code change ensures that after a page.drawImage you can directly call an image.embed().

This decodes and encodes the image immediately and remove the decoded image from RAM immediately.
With this change I have the same performance but a maximum RAM usage of 220MB.

Usage without these changes:
image
Max RAM usage:
image

Usage with these changes:
image
Max RAM usage:
image

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Masterwow3! Thanks for investigating this issue and opening a PR! I suspect this is a problem lots of people have run into, so it'll be great to release a fix 😃.

I like the general approach to took to eliminate the memory leak. However, I would like to change a few of the implementation details before merging:

  • Please use undefined instead of null. The null value isn't used in the pdf-lib codebase.
  • Instead of clearing the PngEmbedder.image property, we should clear the PDFImage.embedder property after the image has been embedded.
  • Please add unit tests to ensure that (1) the property is cleared after the first embed, and (2) a single PDFDocument object with an embedded image can still be updated and saved multiple times (unless there's already a test for this).

Let me know if you have any questions!

@ritterzk
Copy link

ritterzk commented Nov 2, 2021

Hello @Hopding,

i am pleased that the amendment is accepted. I have adapted the commit as you requested.

The unneeded field "alreadyEmbedded" has been removed.
(https://github.com/Hopding/pdf-lib/pull/1032/files#diff-2e6b05e8490d2c6cc7ca78f78e8fa58bcaef509fbc97754da0808bca0f9d47ffL38)

Additionally, I made the embed() method parellel safe.
(https://github.com/Hopding/pdf-lib/pull/1032/files#diff-2e6b05e8490d2c6cc7ca78f78e8fa58bcaef509fbc97754da0808bca0f9d47ffR131)

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks @Masterwow3!

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants