Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Get pageid of a search object #62

Open
khoivan88 opened this issue Jan 11, 2018 · 0 comments
Open

Get pageid of a search object #62

khoivan88 opened this issue Jan 11, 2018 · 0 comments

Comments

@khoivan88
Copy link

khoivan88 commented Jan 11, 2018

Hi, I am a newbie with Python and pdfquery . I am writing a python program to extract info from pdf files and then insert into a word document. I am having trouble with a particular object: "minor spill". Specifically, I am trying to scrap the content of the paragraph underneath "6.3 Methods and materials for containment and cleaning up" (the content I want is "Contain spillage, and then collect with an electrically protected vacuum cleaner or by wet-brushing and place in
container for disposal according to local regulations (see section 13). Keep in suitable, closed containers for disposal.", on page 2 of the pdf file. The problem is that for this particular pdf file, my code will also extract "Product This combustible material may be burned in a chemical incinerator equipped with an afterburner and scrubber. Offer surplus and non-recyclable solutions to a licensed disposal company." on p.5. Because I want to work with many pdf files that might have "6.3..." content on different page, I figure if I can pass the pageid in the extract then it should be fine.
My question is, is there a way you can get the pageid of a object (for example: "minor_spill" in my code.
My code is below and I also attach the pdf file:
https://pastebin.com/rwseBSZV

Thank you very much!
PDF file:
932-66-1.pdf

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant