Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update Tika and wkhtmltopdf #239

Merged
merged 2 commits into from
Sep 12, 2024
Merged

Conversation

tagliala
Copy link
Member

No description provided.

@tagliala
Copy link
Member Author

tagliala commented May 14, 2024

This run:

I, [2024-05-14T16:40:17.312218 #6903]  INFO -- : [7455] spawn 'tika --text /tmp/heathen20240514-6903-1kk65sx/heathen20240514-6903-7weqel'
I, [2024-05-14T16:40:18.788413 #6903]  INFO -- : [7455] completed in 1.4779

master run:

I, [2024-04-23T09:48:23.868395 #6596]  INFO -- : [7150] spawn 'tika --text /tmp/heathen20240423-6596-1pqo4f3/heathen20240423-6596-1rj7sx2'
I, [2024-04-23T09:48:25.297595 #6596]  INFO -- : [7150] completed in 1.4304
I, [2024-04-23T09:48:25.297653 #6596]  INFO -- : [7150] stderr: 'Apr 23, 2024 9:48:24 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Apr 23, 2024 9:48:24 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: Tesseract OCR is installed and will be automatically applied to image files unless
you've excluded the TesseractOCRParser from the default parser.
Tesseract may dramatically slow down content extraction (TIKA-2359).
As of Tika 1.15 (and prior versions), Tesseract is automatically called.
In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig.
Apr 23, 2024 9:48:24 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.'

@tagliala tagliala force-pushed the feature/upgrade-tika-and-wkhtmltopdf branch from ed613c7 to 3a6615b Compare September 12, 2024 07:23
@tagliala tagliala marked this pull request as ready for review September 12, 2024 07:23
@tagliala tagliala merged commit d1f5c89 into master Sep 12, 2024
1 check passed
@tagliala tagliala deleted the feature/upgrade-tika-and-wkhtmltopdf branch September 12, 2024 07:28
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant