Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

diskspace leak when extracting text from pdf #151

Open
KHMtravel opened this issue Mar 28, 2019 · 1 comment
Open

diskspace leak when extracting text from pdf #151

KHMtravel opened this issue Mar 28, 2019 · 1 comment

Comments

@KHMtravel
Copy link

I try to extract the text of this pdf https://gofile.io/?c=6U8qE8. I have a rack application inside a docker container running on Ubuntu 18.04.

After calling Docsplit.extract_text('spec/test.pdf', ocr: true, language: 'eng', output: 'spec/output.txt') I see the process gs uses the most cpu power and I lose 1GB of diskspace every 5 seconds until there is no space left.

Maybe someone has an idea what is going wrong here?

@justinperkins
Copy link

While investigating an issue with a long-running Docsplit job, which was on a PDF that contained no text, I ran into this same issue on my local dev machine. Rails app running on a vagrant instance running Ubuntu. After running for 10+ minutes, I ran out of disk space. Killed the job and restarted my host machine to get 40 GB back.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants