CLI

Usage

The general form of the CLI usage is:

python3 -m pdfsyntax COMMAND FILE

You can get quick insights on a PDF file with these commands:

overview outputs text data about the structure and the metadata.
disasm outputs a dump of the file structure on the terminal.
text spatially extracts text content on all pages, as if it was a kind of scan.
browse outputs static html data that lets you browse the internal structure of the PDF file: the PDF source is pretty-printed and augmented with hyperlinks.

`overview`

The output shows information about:

the structure : Version, Pages, Revisions, etc...
the metadata : Title, Author, Subject, etc...

`disasm`

The output shows a terse and greppable view of the file internal structure. Please refer to the Disassembler article for details.

`text`

The output shows a full extract of the text content, with a spatial awareness: the algorithm tries to respect the original layout, as if characters of all sizes were approximately rendered on a fixed-size grid.

`fonts`

The output shows a list of fonts used in the file, with the following tabular data:

Name
Type
Encoding
Object number and generation number, comma separated
Number of pages where it occurs

`browse`

This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects. Redirect the standard output to a file that you can open in your browser:

python3 -m pdfsyntax browse file.pdf > inspection_file.html

Please refer to the Browse article for details.

TO BE CONTINUED

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli.md

cli.md

CLI

Usage

`overview`

`disasm`

`text`

`fonts`

`browse`

Files

cli.md

Latest commit

History

cli.md

File metadata and controls

CLI

Usage

overview

disasm

text

fonts

browse

`overview`

`disasm`

`text`

`fonts`

`browse`