Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Docsplit::TextExtractor#extract_text should return the path of the output text file? #139

Open
nruth opened this issue Jan 25, 2016 · 2 comments

Comments

@nruth
Copy link

nruth commented Jan 25, 2016

related to #42

After extracting the text from a PDF or Doc file I need to do something with it. I understand not loading the string into ruby (it could be huge), but it'd be helpful to get the output file path as a return value. Otherwise we have to use different output dirs or try to reconstruct its path based on other information, which feels wrong.

Currently Docsplit::TextExtractor#extract_text is returning the source file paths. For Transparent doc(x) file conversion it returns the intermediary tempfile pdf.
E.g. when I map over an array with a pdf and a doc in my project's tmp dir I get back

[
"/var/folders/_j/q3pr8b3s1vj85mhqvyb06gr40000gn/T/docsplit/sample.docx20160125-29577-go3upi.pdf",
"/Users/nruth/dev/monitor/tmp/AISB08.pdf20160125-29577-1svhpfo.pdf"
]

Instead I'd like to be given the path of the output text files, so I can open them.

Would this be a good PR, or is there a deliberate reason to return these other file paths that could be documented?

@harssh
Copy link

harssh commented Mar 18, 2016

👍 Are we going ahead with this or is this already implemented ?

@nruth
Copy link
Author

nruth commented Mar 20, 2016

I didn't make a PR. I worked around the problem by putting the document into its own temporary subdirectory then using ls. I do think it's something that can be fixed, as it's just a forgot-to-think-about-the-return-value problem. But the PR backlog is growing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants