Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

PdfDocumentBuilder creates broken file when copying pages from specific source PDFs #936

Open
cremor opened this issue Nov 12, 2024 · 3 comments
Assignees
Labels
bug document-editing Related to creating or editing/modifying documents

Comments

@cremor
Copy link

cremor commented Nov 12, 2024

I have an application that uses PdfPig to merge multiple PDF files. I'm using PdfDocumentBuilder.AddPage for this.
The users of the application have reported a case where the created (merged) PDF is invalid and contains broken/garbled text. When I open the created PDF in Acrobat Reader I even get an error message saying that the PDF page contains errors.

Sample input files:

  • Input.pdf
    According to its metadata this file was created with OpenOffice.
  • Input with Comments.pdf
    This is based on the same file as the first, but was edited with Wondershare PDFelement to add some comments as annotations.

Sample code:

string inputFile = @"C:\Data\Input.pdf";
string outputFile = @"C:\Data\Output.pdf";

using var targetStream = File.Open(outputFile, FileMode.Create, FileAccess.Write);
using var outputDocument = new PdfDocumentBuilder(targetStream);
using var inputDocument = PdfDocument.Open(inputFile);

for (int i = 1; i <= inputDocument.NumberOfPages; i++)
{
    outputDocument.AddPage(inputDocument, i);
}

I've tested the following versions of PdfPig, all are affected:

  • 0.1.8
  • 0.1.9
  • 0.1.10-alpha-20241103-132ad
  • 0.1.10-alpha-20241121-7db34

Input:
grafik

Output when shown in Acrobat Reader:
grafik

Ouput when shown in Microsoft Edge:
grafik

@BobLd BobLd self-assigned this Nov 12, 2024
@BobLd BobLd added bug document-editing Related to creating or editing/modifying documents labels Nov 12, 2024
@cremor
Copy link
Author

cremor commented Nov 18, 2024

This might be related to an embedded font. Sometimes the end users get an error message like "The embedded font "EVPYXN+NotoSerifCJKjp-Regular-Identity-H" coud not be loaded..." from Acrobat Reader. But that error isn't shown every time and I haven't seen it myself yet.

Also, if the PDF (either of the two) is resaved with Acrobat Reader then the problem doesn't happen any more.

@BobLd
Copy link
Collaborator

BobLd commented Nov 18, 2024

@cremor thanks for the added context. I'll try to have a look soon but any help here would be really appreciated 😄

@cremor
Copy link
Author

cremor commented Dec 2, 2024

There are additional details which point to a problem with fonts:
If I try to open the copied/created PDF with PdfPig again, the following exception is thrown:

System.InvalidOperationException: Could not find the font with name /TT0 in the resource store. It has not been loaded yet.
   at UglyToad.PdfPig.Graphics.BaseStreamProcessor`1.ShowPositionedText(IReadOnlyList`1 tokens)

I'd like to help to fix this, but don't know where to start. Any tips?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug document-editing Related to creating or editing/modifying documents
Projects
None yet
Development

No branches or pull requests

2 participants