Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

METS: Result section? #7

Closed
wrznr opened this issue Feb 23, 2018 · 4 comments
Closed

METS: Result section? #7

wrznr opened this issue Feb 23, 2018 · 4 comments
Assignees

Comments

@wrznr
Copy link
Contributor

wrznr commented Feb 23, 2018

How do we serve/reference the resulting PAGE XML? (Serving METS is trivial: It is the return value of the POST request to the webserver)

Option 1: Alter the URLs in the full text section to point to the modified PAGE on the server.
Option 2: Add another file section (USE=FULLTEXT_RESULT) for the modified PAGE.
Option 3: Leave out the FULLTEXT file section right from the beginning.

@wrznr
Copy link
Contributor Author

wrznr commented Feb 23, 2018

FYI @haoess

@cneud
Copy link
Member

cneud commented Mar 4, 2018

IMHO, the only practical and non-intrusive action is the use of multiple file sections for the modified images, so option 2.

@kba
Copy link
Member

kba commented Mar 22, 2018

I would propose to have just two USE labels, INPUT and OUTPUT. The mets:file withing the mets:fileGrp must have their mimetype set, so it's clear from that which files are images and which are PAGE XML (or ALTO in the future, or TEI or whatever)

(Aside: We should use a more specific media type for PAGE XML than text/xml. If there is none, let's use a vendor-specific one, like text/vnd.ocrd.pagexml)

The result of passing a METS file to a processor is another METS file with a fileGrp with USE="OUTPUT" which should only contain references to PAGE XML files.

The provenance of the files, such as the URL of the job that created them, runtime and logging information etc. is outside the scope of the fileGrp. It should be set as part of the metadata of the METS file.

Functionality like caching, performance analysis etc. that require knowledge of the full chain of processors must happen in the orchestration layer. Since the processors are stateless, their output is fully defined by their input, i.e. mostly by the METS. Keeping provenance info out of the METS should make it easier to cache results.

@cneud
Copy link
Member

cneud commented Apr 4, 2018

Fixed in 2f8a390 according to #9 (comment)

@cneud cneud closed this as completed Apr 4, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants