Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Default media type of documents #70

Closed
balmas opened this issue Sep 15, 2017 · 14 comments
Closed

Default media type of documents #70

balmas opened this issue Sep 15, 2017 · 14 comments
Milestone

Comments

@balmas
Copy link
Contributor

balmas commented Sep 15, 2017

From @jonathanrobie on September 9, 2017 20:23

Thibault proposes that the default media type of a document should be 'The default format of answer should be application/tei+xml , leaving other media types to the implementation.

Copied from original issue: distributed-text-services/distributed-text-services.github.io#4

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @jonathanrobie on September 9, 2017 20:26

I think that the media type should be determined by standard HTTP content negotiation. A server should give the most precise media type that a client is willing to accept for a given document, based on its knowledge of the document. If the most precise type of a document is application/tei+xml, that is the media type that the server will provide unless the client provides ACCEPT headers that do not allow this type. But not all documents are TEI.

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @PonteIneptique on September 9, 2017 21:4

I think that the media type should be determined by standard HTTP content negotiation.

I agree.

But not all documents are TEI.

I don't "agree" :). This is where I think we need to enforce this thing.

One of the failure of CTS was to accept as many schemes as there is implementations, without enforcing any kind of correct format. This would lead to not knowing how to parse the content, to not being really able to foresee what you would get. We have the chance to have a format for textual edition : TEI. Right now, there is not a single standard API (as far as I know) that relies on it. We discussed this a while ago and while I think we should allow implementers to provide other mimetypes, I think XML/TEI should always be accessible.

One of the reason for this, unlike images (I look kindly at you, IIIF) and structured metadata, is that textual edition is broad, wide, and should I say wide enough to be already complex in TEI. When you query most standard LOD APIs, you can expect to get RDF, in different kind of expression : json-ld, xml/rdf, turtle, etc. We do not have the same standard for textual edition. HTML, raw text, csv, non grammar based xml, etc. are way too different to be treated the same way.

Having TEI, a scholarly accepted standard, as a standard and obligatory output, is required for the API to really have an impact for these reasons. Let's be clear in terms of implementation duties : the server must have the ability to reply a valid content to Accept: application/tei+xml and can expand to other mime.

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @jonathanrobie on September 9, 2017 21:14

Suppose I want to serve syntax trees in a format that is semantically different from TEI's standard. I cannot really convert that to a TEI format, and doing so would require effort for the data producer. If both the producer and the consumer want that format for these documents, do they need to use a different protocol?

Or suppose the only document available is HTML5. Should a server be required to convert it to TEI?

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @PonteIneptique on September 10, 2017 5:41

Any format transformation in the case of text serving is a loss of
information. Serving syntax tree has never been the focus of DTS, and has
never been discussed as such. This is for example not a use case or user
story we had set up at the beginning of our discussion.

As such, and to be clear, I am not against tree serving, but historically,
we already discussed the format of xml tei to be the standard. Otherwise,
it makes no sense to differentiate ourselves from other initiatives. Any
data transformation would require effort, but if you are willing to deploy
the API and do not have tei, then you have to pay this entry fee.

I do not believe it will be that complicated to transform part of the
content, with loss or gain, into tei.

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @jonathanrobie on September 14, 2017 13:6

DTS is a replacement for CTS, which is not restricted to TEI.

I don't know what level of design decision has been made on this. I was not aware of a firm decision, but there may have been one. We'll discuss in the meeting.

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @PonteIneptique on September 14, 2017 13:13

DTS is a replacement for CTS, which is not restricted to TEI.

I agree. And CTS has been a failing API also by not providing clear contents type restriction (or base content type) which led to having really few API that can be parsed.

I don't know what level of design decision has been made on this. I was not aware of a firm decision, but there may have been one. We'll discuss in the meeting.

There was no hard decision written in stone, but I definitely remember a meeting where this was discussed, we clearly leaned towards at least TEI as the basis, potentially to propose this whole work to TEI-C, and even assigned the poor @hcayless to the passage endpoint discussing chairing (at the time where we had multiple chairs).

@balmas
Copy link
Contributor Author

balmas commented Sep 15, 2017

From @PonteIneptique on September 15, 2017 6:54

To recap and build on some discussion #9 out of two comments from @hcayless ( 1 2 ) :

We could, at least, make the following rules:

  • A DTS API should be able to reply with application/tei+xml content on the passage/document endpoint(s)
  • A DTS API can serve any other content types (See below)
  • The response should be well-formed XML
  • The response could either be a full TEI document or a fragment (See below)
  • The response can be expected to be valid TEI, though it is not required

Fragment

A fragment could be either a new node, proposed to TEI, or an already existing node (ab was proposed on the TEI List ( http://tei-l.970651.n3.nabble.com/Returning-fragments-from-TEI-documents-td4030077.html ). If we were to adopt this requirement for the API, we would need to discuss the proposal of a fragment node to the TEI list and agree on it here (aka open a new issue)

Other content types

There is a secondary question which could be : do we want to limit content types, ie should we forbid to reply with binary formats such as PDF or images (which are not "minable") ?

I have no specific point of view on the question but it felt like a question that can be asked.

@balmas balmas added this to the Call 1 milestone Sep 15, 2017
@jeffreycwitt
Copy link

My instinctual vote would be for default TEI response when making a passage/document response.

I would vote for default json-ld when we're talking about metadata about the passage/document.

I generally like the rules proposed immediately above, though I wonder if allowing fragments creates more confusion for a consuming client. In would be easier on a client if it could simply expect a well formed TEI response in all cases.

If we want to respond with a small fragment of a text, can we demand that the service wrap this fragment in a TEI wrapper that includes a TEI header and text/body?

@jonathanrobie
Copy link
Contributor

We could, at least, make the following rules:

A DTS API should be able to reply with application/tei+xml content on the passage/document endpoint(s)
A DTS API can serve any other content types (See below)
The response should be well-formed XML
The response could either be a full TEI document or a fragment (See below)
The response can be expected to be valid TEI, though it is not required

I think this is close, but I would either (1) require TEI for document data, or (2) add "if available" to the first point:

A DTS API should be able to reply with application/tei+xml content on the passage/document endpoint(s) if available

I would require any XML to be well-formed:

The response must be well-formed XML

I think we should specify how fragments are wrapped.

@jonathanrobie
Copy link
Contributor

A fragment could be either a new node, proposed to TEI, or an already existing node (ab was proposed on the TEI List ( http://tei-l.970651.n3.nabble.com/Returning-fragments-from-TEI-documents-td4030077.html ). If we were to adopt this requirement for the API, we would need to discuss the proposal of a fragment node to the TEI list and agree on it here (aka open a new issue)

I think we can either:

  1. Use our own wrapper, or
  2. Use an wrapper

Neither one requires TEI to do anything new, as I understand the discussion.

@PonteIneptique
Copy link
Member

PonteIneptique commented Oct 9, 2017

Clearer Proposal

After last meeting, my* proposal is actually :

  • A DTS API must be able to reply with application/tei+xml content on the passage/document endpoint(s)
  • A DTS API can serve any other content types
  • In case of XML media type, the response must be well-formed xml
  • In case of a TEI Media Type, the response must be well-formed XML
  • In case of a TEI Media Type, the response must either be a full TEI document or a fragment
  • In case of a TEI Media Type, the response should be valid TEI

Fragment

As for the fragment, there is multiple choices from the discussion:

  • the use of <ab type="extractedFragment"> or similar
  • the use of an existing wrapper outside of TEI
  • the use of a new wrapper for DTS
  • the request for a new tag in TEI

I definitely lean towards the new tag (specifically, thinkin about all this, an element which would be at expath and name /TEI/fragment or /TEI/excerpt that would allow any tag allowed in /TEI/text and their direct children while also allowing people to feed teiHeader if needed) but we should not be tied to this option. What we can agree on though is to sort what we feel is the best (eg. new tag in TEI, new wrapper in DTS, outside wrapper, ab) and depending on TEI-C's choice, falling back on our second choice.

My personal ranking would be 1. new tag in TEI, 2. ab, 3. DTS.

@PonteIneptique
Copy link
Member

Note, I think we could have a good thing for the ab fragment, to avoid a root node :

<TEI>
<text><body><ab /></body></text>
</TEI>

The only thing I don't like with <ab /> is the lack of option to share teiHeader information (hence why I like /TEI/fragment)

@balmas
Copy link
Contributor Author

balmas commented Oct 19, 2017

@jonathanrobie
Copy link
Contributor

We closed this at the last meeting - see decisions marked with the string ** Decision ** in this decision tree:

https://github.com/distributed-text-services/collection-api/wiki/Decision-Tree:-Issue-70

** Decision **: Support for TEI is mandatory, we can open it up later if there is a reason to.
** Decision **: We will allow a DTS API to serve other content types.
** Decision **: Restrict to well formed for #3 and #4, we can loosen this later if need be. (i.e., all responses must be well-formed, not merely well-balanced.

We will make up our own wrapper and propose it to TEI, but will not wait for them.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants