Skip to content
Konstantin Baierer edited this page Mar 22, 2018 · 4 revisions

OCRD XML API

This document describes an application programming interface to the input and output format used for processes within the OCR-D project. The format itself is based on METS as a container and for descriptive metadata and PAGE XML for the content.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Input can be either a single METS XML file or a ZIP container with a single mets.xml plus referenced files

Conventions

fileGrp USE attribute

#9 #7

Every METS file must have 1..n <fileGrp>.

At least one <fileGrp> must have USE="INPUT".

A METS file can have 0..1 <fileGrp USE="OUTPUT">.

One PAGE XML document per document page

A single PAGE XML file represents one page in the original document.

Every <pc:Page> element MUST have an attribute image which MUST always be the source image.

The PAGE XML root element <pc:PcGts> MUST have at least one <pc:Page>.

It MAY have more than one <pc:Page>. TODO this is not valid in current PAGE-XML

Images and coordinates

Coordinates are always absolute, i.e. relative to extent defined in the imageWidth/imageHeight attribute of the nearest <pc:Page>.

When a processor wants to access the image of a layout element like a TextRegion or TextLine, the algorithm should be:

  • If the element in question has an attribute imageFilename, resolve this value
  • If the element has a <pc:Coords> subelement, resolve by passing the attribute imageFilename of the nearest <pc:Page> and the text value of the <pc:Coords> element

API

📦TODO📦 https://github.com/PRImA-Research-Lab/prima-core-libs and its apidocs.


Resolver

📦TODO📦 Describe

  • Data Repository
  • backend for the transparency in handling input and output
  • cutting out images
  • etc.

new Ocrd.Resolver()

Creates a resolver and sets e.g. the ZIP it should resolve file-URL in etc.

OcrdPage resolvePage(String url)

Resolve a URL to an OcrdPage.

OcrdMets resolveMets(String url)

Resolve a URL to an OcrdMets.

OcrdImage resolveImage(String url)

Resolve a URL to an OcrdImage.

OcrdImage resolveImage(String url, OcrdCoords coords)

Resolve a URL to an image, then crop it to the coordinates provided.


OcrdMets

Represents the METS file as used for input and output of the processors.

List<OcrdPage> listInputPages()

If fileGrp USE="INPUT" contains file mimetype="text/xml", parse them (OcrdPage) and list them.

Otherwise, if fileGrp USE="INPUT" contains file mimetype="image/*", generate empty PAGE XML from these by

  • Creating an pc:PcGts and therein
  • an empty pc:Page element with image="<URL>"

listVariants

📦TODO📦 Wrong here

Lists all variants, i.e. nested METS files used as INPUT. In the common case that there is no nesting, this will return just one variant with all the files listed in INPUT.

OcrdPage getInputPage(i)

List<OcrdPage> listOutputs()

addOutput(OcrdPage page)


OcrdPage

Should be generated by the resolver.

Image getImage()

Image getAlternativeImage(type)


TextRegion

Image getImage()


TextLine

Image getAlternativeImage(type)

Glossary

Processor

A processor is a tool that accepts METSPAGE input and produces METSPAGE output according to this spec.

Clone this wiki locally