Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pillow 10.3.0 breaks test_filters.test_rgba #2568

Closed
stefan6419846 opened this issue Apr 2, 2024 · 7 comments
Closed

Pillow 10.3.0 breaks test_filters.test_rgba #2568

stefan6419846 opened this issue Apr 2, 2024 · 7 comments
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow

Comments

@stefan6419846
Copy link
Collaborator

Running the tests with Pillow==10.3.0 breaks test_filters.test_rgba. Pillow==10.2.0 works correctly.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('cryptography', '42.0.5'), PIL=10.3.0

Code + PDF

Just run pytest -k 'test_rgba'.

Expected image:

tika-972174_p0-im0

Actual image:

file

Traceback

This is the complete traceback I see:

__________________________________ test_rgba ___________________________________
[gw3] linux -- Python 3.12.2 /opt/hostedtoolcache/Python/3.12.2/x64/bin/python

    @pytest.mark.enable_socket()
    def test_rgba():
        """Decode rgb with transparency"""
        reader = PdfReader(BytesIO(get_data_from_url(name="tika-972174.pdf")))
        data = reader.pages[0].images[0]
        assert ".jp2" in data.name
        similarity = image_similarity(
            data.image, BytesIO(get_data_from_url(name="tika-972174_p0-im0.png"))
        )
>       assert similarity > 0.99
E       assert 0.6877076861263712 > 0.99

tests/test_filters.py:380: AssertionError
@stefan6419846 stefan6419846 added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow labels Apr 2, 2024
@stefan6419846
Copy link
Collaborator Author

There is an upstream fix available as a PR for the next Pillow release which fixes this.

This slightly breaks test_filters.test_rgba and test_workflows.py.test_image_extraction[https://corpora.tika.apache.org/base/docs/govdocs1/972/972174.pdf-tika-972174.pdf], but this can be fixed by setting ImageFile.LOAD_TRUNCATED_IMAGES = True for the scope of the corresponding test method.

I am not sure whether we should ban Pillow==10.3.0 from pypdf for now or whether we consider this an issue which does not occur too often and have no control over it anyway. From my perspective, I would probably not restrict this for now.

@pubpub-zz
Copy link
Collaborator

@stefan6419846 can you confirm that the transparency is correct?

@stefan6419846
Copy link
Collaborator Author

@pubpub-zz The alpha masking is done in a separate step and looks correct.

This is the newly rendered image after applying the patch:

file

The file size differs slightly, but I could not see any real visual difference when comparing it to the reference image.

@pubpub-zz
Copy link
Collaborator

Finally Pillow 10.4 has been released! 🎉🎉🎉
https://github.com/python-pillow/Pillow/releases/tag/10.4.0
the test now works.
We should close this issue

@stefan6419846
Copy link
Collaborator Author

I just did a standalone custom CI run with version 10.4.0 and it is working fine now.

@pubpub-zz
Copy link
Collaborator

Should we upgrade to 10.4 ?

@stefan6419846
Copy link
Collaborator Author

If ever, we should disallow 10.3.0. Upgrading our CI does not work as we would have to drop Python 3.7 for this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-images From a users perspective, image handling is the affected feature/workflow
Projects
None yet
Development

No branches or pull requests

2 participants