Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Possible License Issue #1063

Open
JohnnyVX opened this issue Jan 6, 2025 · 6 comments
Open

Possible License Issue #1063

JohnnyVX opened this issue Jan 6, 2025 · 6 comments

Comments

@JohnnyVX
Copy link

JohnnyVX commented Jan 6, 2025

Describe the bug
You are using PyMuPDF which has an AGPL-3.0 license.

AGPL Requires anyone who uses the library through a dynamic or static link and uses the program on a network (e.g. a SaaS product) to distribute their whole code base

To Reproduce
Steps to reproduce the behavior:

  1. Review Snyk Report for your library
  2. You will see that there are 2 High License issues
  3. You can click on "Get Started Free" and run your own scan

Expected behavior
There should be no AGPL licenes in an Apache License (either direct or transitive)

Additional context
There should be several replacement libraries that one can use

@assafelovic
Copy link
Owner

Thank you for this it’s a great point and we’ll look for other alternative libraries. It’s simply for converting markdown to pdf. Not critical for gptr use

@JohnnyVX
Copy link
Author

JohnnyVX commented Jan 6, 2025

Do you know if it is possible to exclude that library if people are not using it. We'd like to use this, but AGPL is a big problem.

@assafelovic
Copy link
Owner

@JohnnyVX you can simply run it without having the library installed. It should be catched by exception and just disable the pdf conversion.

@assafelovic
Copy link
Owner

There actually might be another issue to research here. I'll get to it later in the week

@kga245
Copy link
Contributor

kga245 commented Jan 9, 2025

@assafelovic This seems like the most important upstream dependency: https://python.langchain.com/docs/how_to/document_loader_pdf/

Here's the list of options: https://python.langchain.com/docs/integrations/document_loaders/#pdfs

If I had to recommend one, I am very much liking converting PDFs to base64-encoded images and relying on multi-modal from here on out.

Solution: use PyMuPDF for local installs and notebooks. Use multimodal conversion for SaaS.

@Mizokuiam
Copy link

Thank you for raising this issue! I'll look into it and try to help if I can.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants