leaf-focus

Extract structured text from pdf files.

Install

Install from PyPI using pip:

pip install leaf-focus

Download the Xpdf command line tools and extract the executable files.

Provide the directory containing the executable files as --exe-dir.

Usage

usage: leaf-focus [-h] [--version] --exe-dir EXE_DIR [--page-images] [--ocr]
                  [--first FIRST] [--last LAST]
                  [--log-level {debug,info,warning,error,critical}]
                  input_pdf output_dir

Extract structured text from a pdf file.

positional arguments:
  input_pdf             path to the pdf file to read
  output_dir            path to the directory to save the extracted text files

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --exe-dir EXE_DIR     path to the directory containing xpdf executable files
  --page-images         save each page of the pdf as a separate image
  --ocr                 run optical character recognition on each page of the
                        pdf
  --first FIRST         the first pdf page to process
  --last LAST           the last pdf page to process
  --log-level {debug,info,warning,error,critical}
                        the log level: debug, info, warning, error, critical

Examples

# Extract the pdf information and embedded text.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages

# Extract the pdf information, embedded text, an image of each page, and Optical Character Recognition results of each page.
leaf-focus --exe-dir [path-to-xpdf-exe-dir] file.pdf file-pages --ocr

Dependencies

xpdf
keras-ocr
Tensorflow (can optionally be run more efficiently using one or more GPUs)

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github		.github
src/leaf_focus		src/leaf_focus
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION		VERSION
install_xpdf.py		install_xpdf.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

leaf-focus

Install

Usage

Examples

Dependencies

About

Releases 2

Packages

Contributors 2

Languages

License

anotherbyte-net/leaf-focus

Folders and files

Latest commit

History

Repository files navigation

leaf-focus

Install

Usage

Examples

Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages