Skip to content
John Graybeal edited this page Aug 25, 2014 · 36 revisions

Goal for UDUNITS2 Vocabularies

The creation of UDUNITS vocabularies suitable for MMI's ORR had 3 functional goals and one reproducibility goal:

  1. Provide a resolvable URI for each unit, name, or alias in UDUNITS.
  2. Provide an easily-viewed list of UDUNITS terms in each of the vocabularies.
  3. Align the resulting concepts with the original XML files' model (that is, don't break anything).
  4. Implement these features in an automated way, so human intervention was not required to update the vocabularies when the on-line XML changed.

The first two goals were primary, as the authors have always wanted this kind of reference, and felt MMI's ORR was a good way to present this information. The last goal is an operational constraint.

Notwithstanding Goal 3, we were not trying to create a units model ontology following UDUNITS. There are several units ontologies already (e.g., QUDT being highly regarded, and SWEET perhaps more broadly known), and we felt no need to pursue that concern.

We also were not hoping to build a new development paradigm around UDUNITS. The existing packages seem perfectly adequate for that purpose.

Vocabularies vs Ontologies

Yes, these two concepts are similar, but a useful distinction in terms of our modeling approach. The fact that the original UDUNITS presents 5 separate XML files encouraged us to think of our work as capturing separate vocabularies, rather than representing a full unified model. However, the UDUNITS are not strictly vocabularies, so our model required thought.

The initial separation of concepts into Unit, UnitName, and Prefix (see Modeling below) came from the desire to be clear and reasonably precise in representing the original material. (The lack of a proper name for one Unit made the strategy more obvious.) At the same time, there are a number of more detailed modeling elements (Singular vs Plural names, alias Symbols, Dimensionless Units, and of course the inclusion of core units in the definition of other units) that were beyond what was needed or easily feasible.

We are confident the results provide a handy reference meeting Goals (1) and (2) above. Possibly they will form a useful starting point for someone to work on a unified ontology of UDUNITS, though this is not a requirement.

We encourage your comments on our approach, either as issues are as email to the authors.

Presentation in ORR

As many vocabularies have instances from both Unit and UnitName concepts, ideally separate tables should be shown for each concept type. Unfortunately the ORR presentation can't yet present this separation, but we hope to add it in the future.

Modeling

We have implemented the classes Unit, UnitName, and Prefix to capture the core concepts in the original XML. Each of these classes, and the relations between them, are described below, with examples following.

Unit class

An instance of the Unit class captures one main unit entry in the XML. The identifier of the unit class is a unique sequence of hex characters preceded by an underscore. (See [Unit and UnitName identification](#Unit and UnitName identification) below.)

A Unit may have zero or one primary name (via property hasName) and zero or more aliases (via hasAlias property).

hasDefinition property

This is a functional property to capture the <def> element from the XML description. This element strictly defines the unit, by combining more fundamental units, numbers, and basic math operations.

hasName property

This is a functional property that indicates the primary name of a given Unit instance. A unit usually has a primary name, but not always.

hasAlias property

Indicates an alternate name for a given Unit instance. A unit can have zero or more aliases associated.

hasSymbol property

Indicates a symbol associated with a given Unit instance. A symbol is a short string of characters used to label the unit. A unit can have zero or more symbols associated. While the original XML files sometimes indicate a symbol is an alias, we do not capture that information.

UnitName class

Instances of the UnitName class capture a name or alias associated with an instance of a class Unit.

Remarks:

  • The approach clearly separates the concept of unit from any associated specific names/aliases.
  • Those names/aliases will have URIs that include their names (so they can be self-resolvable).

namesUnit property

With UnitName as domain and 'Unit' as range, this functional property indicates the Unit instance associated with the name.

hasCardinality property

With UnitName as domain, this functional property indicates whether the name is "singular" or "plural".

Prefix class

Instances of the Prefix class capture a prefix that can be used in front of any instance of class Unit. The prefix name is used to identify each instance of the class.

hasValue property

This is a functional property to capture the <value> element from the XML description, which defines the mathematical value (multiplier effect) of the prefix.

hasSymbol property

Indicates a symbol associated with a given Prefix instance. A prefix can have one or more symbols associated; the symbols are generally a single ASCII or Unicode character.

Unit and UnitName identification

The Unit, UnitName, and Prefix instance URIs will share the same namespace. The id part for each UnitName and Prefix instance will simply be the associated name itself; this is appropriate because these names are both user- and web-friendly.

The Unit identifier needs some extra processing. Although the primary names would be a good candidate for identifiers, some units lack such names. Even if such names were always available, we need to avoid conceptual and syntactic collision with the corresponding UnitName instances, which would also have those names as identifiers unless we put them in a different namespace.

Since all units must have unique <def> strings, we will use these strings as basis for an identification term. For each Unit instance, the conversion tool will applies a deterministic translation (sha1 hash) of the <def> string, using the first 8 characters. (Future) If a collision results, the tool can add a '+' to the string and re-hash, repeating until a unique name results.

Comment and Definition Handling

The UDUNIT2 XML files include free-text comments throughout the file, indicating definitions, context for a particular entry, or context for a group of entries. Because these comments are not coded consistently, they can not be used to automatically generate additional information for the vocabulary.

We have submitted a request to the UDUNITS support team to enter comments using a different syntax, either as attributes on particular XML elements, or within their own element.

Examples

Example 1

The following entry from the original vocabulary:

        <unit>
            <def>'/60</def>
            <name><singular>arc_second</singular></name>
            <symbol>"</symbol>
            <symbol>&#x2033;</symbol>           <!-- DOUBLE PRIME -->
            <aliases>
                <name><singular>angular_second</singular></name>
                <name><singular>arcsecond</singular></name>
                <name><singular>arcsec</singular></name>
            </aliases>
        </unit>

will result in the RDF representation:

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-accepted/> .
@prefix prop:    <http://mmisw.org/ont/mmitest/udunits2-prop/> .

:_2a231369 
      a                      :Unit ;
      prop:hasDefinition     "'/60" ;
      prop:hasName           :arc_second ;
      prop:hasAlias          :arcsec, :angular_second, :arcsecond ;
      prop:hasSymbol         "\"", "″" ;

:arc_second
      a                      :UnitName ;
      prop:namesUnit         :_2a231369;
      prop:hasCardinality    "singular";

:arcsec
      a                      :UnitName ;
      prop:namesUnit         :_2a231369;
      prop:hasCardinality    "singular";

:angular_second
      a                      :UnitName ;
      prop:namesUnit         :_2a231369;
      prop:hasCardinality    "singular";

:arcsecond
      a                      :UnitName ;
      prop:namesUnit         :_2a231369;
      prop:hasCardinality    "singular";

Example 2 (with comments, alias symbols)

        <unit>
            <!-- The following is exact.  From 1901 to 1964, however, 1
                 liter was 1.000028 dm^3 -->
            <def>dm^3</def>		        <!-- exact -->
            <name><singular>liter</singular></name>
            <symbol>L</symbol>                  <!-- NIST recommendation -->
            <aliases>
                <name><singular>litre</singular></name>
                <symbol>l</symbol>
            </aliases>
        </unit>

will result in the RDF representation (handling of comments is a Future implementation):

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-accepted/> .
@prefix prop:    <http://mmisw.org/ont/mmitest/udunits2-prop/> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .

:_4b023412 
      a                      :Unit ;
      prop:hasDefinition     "dm^3" ;
      prop:hasName           :liter ;
      prop:hasAlias          :litre ;
      prop:hasSymbol         "L", "l" ;   
      rdfs:comment          "The following is exact.  From 1901 to 1964, however, 1 liter was 1.000028 dm^3" ;
      prop:hasSymbolComment  "NIST recommendation" .

:liter
      a                      :UnitName ;
      prop:namesUnit         :_4b023412;
      prop:hasCardinality    "singular".

:litre
      a                      :UnitName ;
      prop:namesUnit         :_4b023412;
      prop:hasCardinality    "singular".

Example 3: Prefixes

    <prefix>
        <value>1e-6</value>
        <name>micro</name>
        <symbol>&#xB5;</symbol>         <!-- MICRO SIGN -->
        <symbol>&#x3BC;</symbol>	<!-- Greek small letter "mu" -->
        <symbol>u</symbol>
    </prefix>

will result in the RDF representation:

@prefix :        <http://mmisw.org/ont/mmitest/udunits2-prefixes/> .
@prefix u2:      <http://mmisw.org/ont/mmitest/udunits2/> .

:micro 
  a           u2:prefix  ;
  u2:hasValue:    1e24   ;  
  u2:hasSymbol:  "&#xB5"  ;  
  u2:hasSymbol:  "&#x3BC" ;  
  u2:hasSymbol:  "u"   .

## Appendix

1) Our [initial modeling discussion](Initial-modeling-discussion)