Closed
Description
Hello,
Since update 2.10.6, some PDF documents are not merged correctly. Same with version 2.10.7.
Previous versions (2.10.5 and below) behave correctly.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-Ubuntu-20.04-focal
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-debian-11.2
$ python3 -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.6
Code + PDF
This is a minimal, complete example that shows the issue:
# requirements.txt
diff-pdf-visually==1.7.0
PyPDF2==2.10.6
pytest==7.1.3
# test_same_pdf.py
import io
import shutil
import tempfile
from PyPDF2 import PdfMerger, PdfReader
from diff_pdf_visually import pdf_similar
import os
import pytest
import logging
logger = logging.getLogger(__name__)
FILE_INPUT_URI = "arret_maladie.pdf"
FILE_OUTPUT_URI = "output.pdf"
def file_as_bytesio(filepath: str):
"""Open a file as BytesIO, read only."""
with open(filepath, "rb") as f:
return io.BytesIO(f.read())
def test_pdf_merger():
# Open document and merge it into a temporary file
merger = PdfMerger()
merger.append(PdfReader(file_as_bytesio(FILE_INPUT_URI)))
# Write the final merged document
temp_file = tempfile.NamedTemporaryFile()
merger.write(temp_file.name)
temp_file_path = temp_file.name
# Compare VISUALLY the content of the newly generated file with the expected content
if not pdf_similar(temp_file_path, FILE_INPUT_URI):
# If files don't match visually, the test fails.
# Copy the newly generated file to the current directory to manually check what is wrong
current_dir = os.path.dirname(__file__)
new_file_path = os.path.join(current_dir, FILE_OUTPUT_URI)
shutil.copy2(temp_file.name, new_file_path)
logger.error(f"The newly merged file does not match with the intput file.")
# Fail the test on purpose
assert False
Here is the PDF that caused the issue:
input.pdf
Here is the output (simple PdfReader -> PdfMerger):
output.pdf
Traceback
This is the complete Traceback I see:
Converting each page of the PDFs to an image...
PDFs have same number of pages. Checking each pair of converted images...
Min sig = 13.4442, significant?=True. The PDFs are different. The most different pages are: page 1 (sgf. 13.4442), page 2 (sgf. 13.6684), page 3 (sgf. 13.9239).
ERROR test_same_pdf:test_same_pdf.py:44 The newly merged file does not match with the intput file.
Thank you for taking the time to investigate.
The document is a French official form, I guess it's fine for using it in automated tests, but not sure.
Metadata
Metadata
Assignees
Labels
No labels