Skip to content

Workflow Guide ocr evaluation

Konstantin Baierer edited this page Sep 30, 2020 · 4 revisions

In this processing step, the text output of the OCR or post-correction can be evaluated by aligning with ground truth text and measuring the error rates.

Available processors

Processor Parameter Remarks Call
ocrd-dinglehopper   First input group should point to the ground truth. ocrd-dinglehopper -I OCR-D-GT,OCR-D-OCR -O OCR-D-EVAL
ocrd-cor-asv-ann-evaluate

{"metric": "Levenshtein" (default), "NFC", "NFKC", "historic-latin"} {"confusion": integer}

First input group should point to the ground truth. There is no output file group, it only uses logging. If you want to save the evaluation findings in a file, you could e.g. add `2> eval.txt` at the end of your command ocrd-cor-asv-ann-evaluate -I OCR-D-GT,OCR-D-OCR

Notes on parameter usage

E.g.

  • which parameters do you use with what values?
  • which parameters are insufficiently documented?
  • which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally