-
Notifications
You must be signed in to change notification settings - Fork 7
Type system
Everything concerning Types in Nemo is still under heavy development and subject to changes that might not be immediately reflected in this Wiki page. This Wiki Page is all around the type system in Nemo. This concerns the available types, type checks, type inference and also the interplay with values when reading and writing.
First, we describe the behavior that is currently implemented.
We have four PrimitiveTypes
in the logical layer of Nemo (in theory, there is also a Tuple
Type but this one is not used yet.)
-
Any
- Read as "any rdf literal" - stored asString
in the physical layer -
String
- a plain string - stored asString
in the physical layer -
Integer
- a 64bit integer - stored asi64
in the physical layer -
Float64
- a double precision floating point number - stored asDouble
in the physical layer
The physical type of a PrimitiveType
is determined by impl From<PrimitiveType> for DataTypeName
.
Nemo assigns each predicate that occurs in the input program a list of the above types (one for each position).
This can be made explicit by the user using e.g. @declare P(any, string)
.
For more information on how these types are determined if not given explicitly, see Type Checks and Inference below.
The types of a predicate determine how a value is handled by Nemo and processed internally. At the moment this also determines how the value is written later.
However, it does not determine how the value is read!
Literals that occur directly in the program in rules or facts are processed by the parser, which stores them in an enum named Constant
that reflects the syntax of the value that has been parsed.
Note that literals in the rule file are always put into the Constant
data structure.
For values that are read from sources
like csv
can be handled differently and skip the Constant
data structure in some cases.
Just like predicates, sources have a type for each predicate position can be annotated with types like @source P(any, integer) load-csv(...)
that can be different from the types of the predicate.
(If no types are annotated for a source, it falls back to a defined default; for csv
this would be string
for example.)
Source-types do not (immediately) determine how a value is handled by Nemo but it just specifies how a value should be interpreted when reading it.
For example, with the above predicate- and source-declarations for P
, the first column of P
would be parsed as an rdf-literal and also handled as such by Nemo.
The second column is read as integers (throwing an error if there is a malformed integer in the column) but then stringify the values and only treat the values as strings internally.
When all values have been read from the program and the sources (and mapped to the desired logical representation), the logical values are converted into their physical representation.
This conversion is rather straightforward at least for Integer
and Float64
.
For the Any
type, i.e. rdf-literals stored in the Constant
data structure, the Constant
enum is stringified using the enum variants as prefixes to be able to reverse the mapping later.
For example, a numeric integer literal 3
is stored as the string INTEGER:3
and a string literal "my string"
is stored as STRING:my string
.
Internally, we want to be able to combine Any
and String
in their physical representations. Therefore, we store strings in logical String
columns also with a STRING:
prefix.
After the reasoning process in the physical layer is finished, the values are mapped back to the logical representations according to their logical type.
There are output iterators can provide directly these logical representations (for the API) or serialize these logical representations to strings directly (for csv
output).
The serialization of a Constant
in particular is determined by its Display
implementation.
All predicate types are checked for consistency. Before this check happens, types are inferred for wherever possible.
Unknown types are set to the default type Any
.
The inferences and checks consider type requirements. A TypeRequirement
can be Hard
, Soft
or None
.
Hard
requirements have to be matched and will error on conflicting requirements.
Soft
requirements give a type hint but can be overridden by type inference.
None
requirements are simply unknown and can freely be overridden.
First, all explicit predicate declarations (@declare
) are converted into hard type requirements.
Type declarations from @sources
are interpreted as type hints and therefore converted into soft type requirements if no explicit declaration has been provided.
Literals from rules and facts are also used as type hints by assigning each literal a suitable type, which is again turned into a soft type requirement.
Predicate positions with existential variables are assigned a hard type requirement of Any
since we cannot use nulls otherwise.
If this hard Any
requirement leads to a clash with another requirement an error is thrown.
Based on the type requirements, type information is now propagated from rule bodies to rule heads for shared variables and aborted on conflicts that occur in the process.
All type requirements that are still None
afterwards are set to Any
.
Afterwards body positions with the same variable are checked for compatibility and a few additional consistency checks e.g. for arithmetic operations are carried out.
Here, we keep track of decisions that have been made regarding the type system. Those shall be implemented in the long run.
TODO