You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How do we serve/reference the resulting PAGE XML? (Serving METS is trivial: It is the return value of the POST request to the webserver)
Option 1: Alter the URLs in the full text section to point to the modified PAGE on the server.
Option 2: Add another file section (USE=FULLTEXT_RESULT) for the modified PAGE.
Option 3: Leave out the FULLTEXT file section right from the beginning.
The text was updated successfully, but these errors were encountered:
I would propose to have just two USE labels, INPUT and OUTPUT. The mets:file withing the mets:fileGrp must have their mimetype set, so it's clear from that which files are images and which are PAGE XML (or ALTO in the future, or TEI or whatever)
(Aside: We should use a more specific media type for PAGE XML than text/xml. If there is none, let's use a vendor-specific one, like text/vnd.ocrd.pagexml)
The result of passing a METS file to a processor is another METS file with a fileGrp with USE="OUTPUT" which should only contain references to PAGE XML files.
The provenance of the files, such as the URL of the job that created them, runtime and logging information etc. is outside the scope of the fileGrp. It should be set as part of the metadata of the METS file.
Functionality like caching, performance analysis etc. that require knowledge of the full chain of processors must happen in the orchestration layer. Since the processors are stateless, their output is fully defined by their input, i.e. mostly by the METS. Keeping provenance info out of the METS should make it easier to cache results.
How do we serve/reference the resulting PAGE XML? (Serving METS is trivial: It is the return value of the POST request to the webserver)
Option 1: Alter the URLs in the full text section to point to the modified PAGE on the server.
Option 2: Add another file section (
USE=FULLTEXT_RESULT
) for the modified PAGE.Option 3: Leave out the
FULLTEXT
file section right from the beginning.The text was updated successfully, but these errors were encountered: