Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Run tesseract with OMP_THREAD_LIMIT=1 #203

Open
3 tasks
kirkkwang opened this issue Mar 28, 2023 · 1 comment
Open
3 tasks

Run tesseract with OMP_THREAD_LIMIT=1 #203

kirkkwang opened this issue Mar 28, 2023 · 1 comment
Assignees

Comments

@kirkkwang
Copy link
Contributor

kirkkwang commented Mar 28, 2023

Story

Indiana University has a set up in their code where they run tesseract with OMP_THREAD_LIMIT=1. This would be nice to bring over to IIIF Print.

Acceptance Criteria

  • Tesseract runs with OMP_THREAD_LIMIT=1

Testing Instructions and Sample Files

GOAL: Test if tesseract runs faster in a deployed environment. It should take about 10-12 mins per page instead of over 30+ mins.

sample pdf: service-rbc-rbc0001-2015-2015gen56010-2015gen56010 (1).pdf

  • ingest the sample PDF into UTK (I believe UTK is the last project that doesn't have OMP_THREAD_LIMIT=1
  • observe that each page doesn't take more than 10-12 mins to run.

Notes

Close this ticket after verifying it works.

@ShanaLMoore
Copy link
Contributor

ShanaLMoore commented Apr 3, 2023

TODO: We need to upgrade iiif_print version of the various applications and deploy them to staging, in order to test this work.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants