The modelling includes more detail than we usually need, and less than we might want. We would usually be happy to leave the roles and some of the devices implicit, but we would like to have links to the protocols used.
NOTE: The code in the following examples has not been tested.
Chris Mungall, Alan Ruttenberg, David Osumi-Sutherland, and collaborators have built a macro-expansion system for OWL that is used for shortcut relations in a number of OBO ontologies. These are two relevant papers, one on implementation and one on its use:
The implementation is part of the OBO file format codebase, now included in the OWLAPI: code directory.
This work could used for OBI, but first we should evaluate our specific requirements. This document lays out some ideas about what we want.
In a database or spreadsheet we might represent the glucose experiment as follows. First we would have a table linking specimens to their subjects.
specimen | protocol | subject |
blood specimen 1 | protocol 1 | mouse 1 |
Then we would have a table of assays and results.
assay | protocol | evaluant | result | unit |
analyte assay 1 | protocol 2 | blood specimen 1 | 1.2 | mg/ml |
In both of these tables we would also include an investigator, timestamp, and other information.
A simple translation of these tables to linked data would look like this:
protocol 1
type: protocol
protocol 2
type: protocol
collection process 1
type: collecting specimen from organism
executes protocol: protocol 1
has input subject: mouse 1
has output specimen: blood specimen 1
analyte assay 1
type: assay
executes protocol: protocol 2
has evaluant: blood specimen 1
has measured value: 1.2
has measured unit: mg/ml
This is a convenient "shape" to query with SPARQL, but a long way from the detailed modelling in
We could use SPARQL to expand the simplified modelling to detailed modelling, by associating a SPARQL Update query with each protocol.
NOTE: The follow code is not strictly SPARQL, but uses familiar labels, like the example in, and can be automatically translated to proper SPARQL.
The WHERE block corresponds to the simplified modelling of the collection process above. The INSERT block corresponds to the first part of the detailed modelling in Many of the instances are anonymous (blank nodes). This might not be exactly the right level of detail for a given purpose, but it demonstrates the approach.
type: Mus musculus
type: glucose
part of: ?specimen
type: syringe
type: test tube
has specified input: ?subject
has specified input: _:syringe
has specified input: _:test-tube
has specified output: ?specimen
type: blood specimen
located in: _:test-tube
type: collecting specimen from organism
executes protocol: protocol 1
has subject: ?subject
has specified output: ?specimen
Similarly, we can start with the detailed modelling and extract the simplified modelling using SPARQL.
The WHERE block becomes the INSERT block, without change. The INSERT block becomes the WHERE block, replacing blank nodes with SPARQL variables (so _:
becomes ?
). Because the expansion and contraction are syntactically so similar, we could specify just the expansion and automatically generate the contraction.
type: collecting specimen from organism
executes protocol: protocol 1
has subject: ?subject
has specified output: ?specimen
type: Mus musculus
type: glucose
part of: ?specimen
type: syringe
type: test tube
has specified input: ?subject
has specified input: ?syringe
has specified input: ?test-tube
has specified output: ?specimen
type: blood specimen
located in: ?test-tube
Likewise, we can expand the simple modelling into the second part of
type: glucometer
type: analyte assay
has specified input: ?specimen
has specified input: _:glucometer
has specified output: _:measurement-datum
type: evaluant role
inheres in: ?specimen
realized in: ?assay
type: analyte role
inheres in: ?glucose
realized in: ?assay
type: measurement datum
has value specification: _:value-specification
type: scalar value specification
has specified value: ?value^^xsd:real
has measurement unit label: ?unit
type: assay
executes protocol: protocol 2
has evaluant: ?specimen
has measured value: ?value
has measured unit: ?unit
has part: ?glucose
type: assay
executes protocol: protocol 2
has evaluant: ?specimen
has measured value: ?value
has measured unit: ?unit
has part: ?glucose
type: glucometer
type: analyte assay
has specified input: ?specimen
has specified input: ?glucometer
has specified output: ?measurement-datum
type: evaluant role
inheres in: ?specimen
realized in: ?assay
type: analyte role
inheres in: ?glucose
realized in: ?assay
type: measurement datum
has value specification: ?value-specification
type: scalar value specification
has specified value: ?value^^xsd:real
has measurement unit label: ?unit
This approach requires specifying SPARQL Update queries for the expansion of every protocol. Can we just specify an expansion for every shortcut relation? If so, we could add this information to an annotation on the OWL Object Property 'has evaluant' (for example), as done for the macro-expansion system linked to above.
In the WHERE clause we use a SPARQL property chain to restrict the target to an instance of a subclass of 'assay' -- the domain of 'has evaluant'. In the INSERT block we create an anonymous individual evaluant role, and link it.
has specified input: ?evaluant
type: evaluant role
inheres in: ?evaluant
realized in: ?assay
type / subClassOf*: assay
has evaluant: ?evaluant
In the WHERE block we match assays that realize an evaluant role for some input, and assert 'has evaluant'.
has evaluant: ?evaluant
type / subClassOf*: assay
has specified input: ?evaluant
type: evaluant role
inheres in: ?evaluant
realized in: ?assay
Expanding 'has measured value' seems similar...
has specified output: _:measurement-datum
type: measurement datum
has value specification: _:value-specification
type: value specification
has specified value: ?value
type / subClassOf*: assay
has measured value: ?value
Now we run into some trouble. If an assay 'has measured unit' then its output is a measurement datum with a scalar value specification. If we adopt the same approach as in the previous example and expand both relations in parallel, then the _:measurement-datum
blank nodes will not be identical. This is because RDF blank node identity is local to a graph, and does not extend between graphs -- in this case each SPARQL query is a different graph. So we will end up asserting that the one assay has two different output measurement datum nodes, one from the 'has measured value' expansion and another from the 'has measured unit' expansion. This is still technically correct, since RDF allows that the two nodes could be identical, but misleading because we do not assert the identity. It would make subsequent SPARQL queries more complex.
has specified output: _:measurement-datum
type: measurement datum
has value specification: _:value-specification
type: scalar value specification
has measurement unit label: ?unit
type / subClassOf*: assay
has measured unit: ?unit
We can work around the problem in this specific case, either with a combined query or by blocking the 'has measured value' expansion from matching the scalar value case. But it raises a general problem with interactions between the expansions and contractions. Interactions always add complexity.
One solution is to use different predicates when linking an assay to a plain value (without a unit) and a scalar value (with a unit). Other cases will have to be considered, including categorical measurements.
Another solution to the specific problem is to use SPARQL MINUS to restrict the plain value query so that it fails to match the scalar value case. The order of the queries would not matter, but we still have to coordinate the two expansion queries when writing them.
Expand for plain values, with has measured value
but without (MINUS) has measured unit
has specified output: _:measurement-datum
type: measurement datum
has value specification: _:value-specification
type: value specification
has specified value: ?value
type / subClassOf*: assay
has measured value: ?value
has measured unit: ?unit
Now expand for scalar values, including both has measured value
and has measured unit
has specified output: _:measurement-datum
type: measurement datum
has value specification: _:value-specification
type: scalar value specification
has specified value: ?value
has measurement unit label: ?unit
type / subClassOf*: assay
has measured value: ?value
has measured unit: ?unit
Unfortunately, the use of the MINUS block means that the contraction of 'has measured value' is no longer a simple syntactic transformation of its expansion (another example of interactions adding complexity):
type: assay
has measured value: ?value
has specified output: _:measurement-datum
type / subClassOf*: measurement datum
has value specification: ?value-specification
type / subClassOf*: value specification
has specified value: ?value
has measurement unit label: ?unit
The 'has measured unit' predicate could be included in the OBI OWL file as an Object Property:
- domain: assay
- range: measurement unit label
- superPropertyOf (Chain): has specified output o has value specification o has measurement unit label
If the expansion query is small, the contraction query can be automatically generated, and the order of the expansions/contractions does not matter, then it would be convenient to store the expansion query in the ontology an OWL Annotation Property. Otherwise, it would probably be better to store the queries in separate SPARQL files, with Annotation Property linking to the query files. The SPARQL files could be numbered to indicate the required sequence.
Higher-Level Language: One of the goals of this approach is to provide two "languages" for working with OBI. The first is the low-level language using general relations that are shared with other OBO ontologies, thus promoting interoperability. The second is the high-level language using specialized relations that are specific to OBI, thus making it easier to use OBI for modelling and querying data. Ideally there would be a perfect translation (i.e. an isomorphism) between the two languages, allowing us to "round-trip" between them without losing information.
First-Order Logic: We use SPARQL here to do things that cannot be done in OWL. We could probably use first-order logic (FOL) to express everything said in the SPARQL queries. Unfortunately we do not have good tool support for FOL. Still, it might be best to express the meaning of each shortcut relations in FOL fist, include it as an OWL annotation, and insist that the SPARQL queries are secondary to the FOL.
Specialized Notation: Rather than writing SPARQL directly, we could develop a specialized notation for expressing expansion/contraction rules, and translate that to SPARQL. Another level of abstraction could add convenience, but also complexity.