Skip to content

What are we Modeling

Jonathan Robie edited this page Jul 22, 2016 · 6 revisions

Things

Anything that can be identified with an identity URI is called a thing. A thing contains both metadata and data.

This section lists the basic things we need to model. Each entity has an identity URN, which is a URN with the following format:

urn:dts:{component}:{id}

The identity URN is completely orthogonal to ontologies or physical locations. Any thing can exist in any number of ontologies. Ontologies are represented using collections (see below).

In addition to the identity URN, a thing can be identified by any number of URNs or URLs assigned to it by a system, e.g. to identify its location on a server or in an ontology

Relationships to other things are described using metadata. In the descriptions below, the phrase "of a" is used to indicate such relationships. For instance, a transcription of an image would contain metadata identifying the image that it transcribes.

  • Abstract work. No digital instances), optional in some domains.

    • Examples: The Gospel of John. Homer’s Odyssey.

    • URN: urn:dts:work:{id}

  • Manuscript. Corresponds to a physical artifact such as a papyrus or a scroll.

    • Example: P66

    • URN: urn:dts:manuscript:{id}

  • Image - of a manuscript

    • Example: A digital image of P66

    • URN: urn:dts:image:{id}

  • Transcription

    • of a manuscript or an image

    • may or may not have positional data

    • may or may not support aligned texts (e.g. using milestones or a table)

    • Example: a transcription of P66

    • URN urn:dts:transcription:{id}

  • Critical text

    • of one or more manuscripts, e.g. Sinaiticus, Papyrii.info, NA28

    • modeling the relationship to manuscripts is optional

    • Example: Nestle-Aland 28

    • URN: +urn:dts:criticaltext:{id}

    • Apparatus is an optional part of a critical text

      • The Apparatus is not always captured in OCR, even when it does exist

  • Translation

    • of an abstract work or critical edition

    • Example: The HCSB translation of the Gospel of John

    • URN: urn:dts:translation:{language}:{id}

  • Commentary

    • of an abstract work or critical edition or translation

    • URN: urn:dts:commentary:{id}

Collection

A collection is also a thing, and has an identifier URN of the form:

urn:dts:collection:{id}

It can also have any number of URLs.

  • Collections mirror ontologies. Nested collections represent the levels of hierarchy in an ontology.

  • We don’t tell you what ontology to use in the system. Each domain can use an ontology appropriate to its needs.

  • Domains can agree on ontologies that are standard within a given domain.

  • One item can exist in any number of ontologies.

API

The following operations are likely to be needed in the API. We should identify the highest priority forms of search and specify them first.

  • Navigation - Collections and resources, each have metadata

  • Queries on metadata - name/value pairs

    • globally

    • in collections

    • in collections with given metadata

  • Full text search

    • in practice, SOLR or Elastic Search - ideally with knowledge of ancient Greek

  • Search by lemma

  • Search by morphology

  • Syntactic search

  • Search by annotations

  • XPath / XQuery API?

Citation resolution service

Any service needs to be able to translate citations between its formats and the canonical format.