-
Notifications
You must be signed in to change notification settings - Fork 10
What are we Modeling
Anything that can be identified with an identity URI is called a thing. A thing contains both metadata and data.
This section lists the basic things we need to model. Each entity has an identity URN, which is a URN with the following format:
urn:dts:{naming authority}:{id}
The identity URN is completely orthogonal to ontologies or physical locations. Any thing can exist in any number of ontologies. Ontologies are represented using collections (see below).
In addition to the identity URN, a thing can be identified by any number of URNs or URLs assigned to it by a system, e.g. to identify its location on a server or in an ontology
Relationships to other things are described using metadata. In the descriptions below, the phrase "of a" is used to indicate such relationships. For instance, a transcription of an image would contain metadata identifying the image that it transcribes.
-
Abstract work. No digital instances), optional in some domains.
-
Examples: The Gospel of John. Homer’s Odyssey.
-
-
Manuscript. Corresponds to a physical artifact such as a papyrus or a scroll.
-
Example: P66
-
URN: urn:dts:manuscript:{id}
-
-
Image
-
an image can be of anything
-
an image of a manuscript identifies the corresponding manuscript (see above)
-
Example: A digital image of P66
-
##
-
-
Transcription
-
of a manuscript or an image
-
may or may not have positional data
-
may contain milestones which can be addressed directly or retrieved in ranges
-
may or may not support aligned texts (e.g. using milestones or a table)
-
may contain multiple representations of the base text, e.g. raw, with morphological analysis, with syntactic analysis
-
Example: a transcription of P66
-
-
Critical text
-
of one or more manuscripts, e.g. Sinaiticus, Papyrii.info, NA28
-
modeling the relationship to manuscripts is optional
-
may contain multiple representations of the base text, e.g. raw, with morphological analysis, with syntactic analysis
-
may contain milestones which can be addressed directly or retrieved in ranges
-
Example: Nestle-Aland 28
-
Apparatus is an optional part of a critical text
-
The Apparatus is not always captured in OCR, even when it does exist
-
-
-
Translation
-
of an abstract work or critical edition
-
Example: The HCSB translation of the Gospel of John
-
URN: urn:dts:translation:{language}:{id}
-
-
Commentary
-
of an abstract work or critical edition or translation
-
-
Lexicon
-
Grammar
A collection is also a thing, and has an identifier URN of the form:
urn:dts:collection:{id}
It can also have any number of URLs.
-
Collections mirror ontologies. Nested collections represent the levels of hierarchy in an ontology.
-
We don’t tell you what ontology to use in the system. Each domain can use an ontology appropriate to its needs.
-
Domains can agree on ontologies that are standard within a given domain.
-
One item can exist in any number of ontologies.
The following operations are likely to be needed in the API. We should identify the highest priority forms of search and specify them first.
-
Navigation - Collections and resources, each have metadata
-
Queries and searches can be scoped
-
globally
-
in collections
-
in collections with given metadata
-
by range, identified by milestones
-
-
Queries on metadata - name/value pairs
-
Full text search
-
in practice, SOLR or Elastic Search - ideally with knowledge of ancient Greek
-
-
Retrieve text in a range identified by milestones
-
Search by lemma
-
Search by morphology
-
Syntactic search
-
Search by annotations
-
XPath / XQuery API?
Any service needs to be able to translate citations between its formats and the canonical format. A standard API call will be provided for this service.
-
Example: Given a canonical DTS URI, a CTS system needs to be able to furnish the equivalent CTS URI
-
Example: Given a canonical CTS URI, a CTS system needs to are able to furnish the equivalent DTS URI