Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Datatype triple patterns #182

Open
IS4Code opened this issue Apr 12, 2023 · 3 comments
Open

Datatype triple patterns #182

IS4Code opened this issue Apr 12, 2023 · 3 comments

Comments

@IS4Code
Copy link

IS4Code commented Apr 12, 2023

Why?

Presently, SPARQL triple syntax does not offer enough granularity when matching literals ‒ the object of a property can be specified in a triple pattern as (among others) a variable or a fixed literal, but it cannot be something in between. If you want to find triples based on the datatypes, you have to match all of them, and then filter:

?s ?p ?o .
FILTER (DATATYPE(?o) = xsd:integer)

A trivial SPARQL engine implementation would load all triples in existence into a set and then filter them based on the criteria.
It is possible, of course, to optimize this query and retrieve all matching triples as a single step, but not all engines might do that, and still this is something that could be expressed more concisely.

Proposed solution

I propose an extension of the node syntax like this:

?s ?p ?o^^xsd:integer .

Such a query would only match triples with the xsd:integer datatype, and bind ?o to the lexical value of the literal (as a simple literal/xsd:string).

Of course there are other options for the object:

# any literal with this lexical value, regardless of datatype (?dt is bound to IRI)
?s ?p "10"^^?dt .

# any literal (?o is simple literal, ?dt is IRI)
?s ?p ?o^^?dt .

I think this would add more expressiveness to the language, and a possibility for better optimizations to triple store that can index by the datatype and evaluate such queries efficiently.

Language-tagged literals

These are literals as well, but I am unsure whether they should be matched by a pattern like ?o^^?dt. However, if they are allowed, I think the natural solution (if not deemed too convoluted) might be to bind ?dt to a literal with the xsd:language datatype, essentially treating something like "hello world"@en as ""hello world"^^"en"^^xsd:language" (not a proposed syntax), binding ?o to "hello world" and ?dt to "en"^^xsd:language. After all, a language tag is the datatype of a language-tagged literal, at least syntactically.

Other SPARQL functions, such as STRDT, could be modified to allow either IRI or xsd:language as the datatype.

Considerations for backward compatibility

I don't think ^^ in this position could have been valid previously, so all existing valid queries should remain valid and unambiguous.

@namedgraph
Copy link

Isn't rdfs:langString the datatype of language-typed literals?

@IS4Code
Copy link
Author

IS4Code commented Apr 12, 2023

Isn't rdfs:langString the datatype of language-typed literals?

Semantically yes, but it is not written in the triple. Such a thing should be produced only from something like "hello world"^^rdf:langString, not from an actual language-tagged literal.

Specifically, I was aiming for ?o^^?dt to produce something where STRDT(?o, ?dt) gives back the original object. With rdf:langString, information about the actual language tag is lost.

@redmer
Copy link

redmer commented Apr 17, 2023

This proposal would make working with typed strings better, so I appreciate it. Combined with ?o@?lang to match the language tag, it would match both ways that (typed and language tagged) strings are available in SPARQL and Turtle. Then you'd use STRDT() to reconstitute a string from ?o^^?dt and STRLANG() with ?o@?lang. This was proposed in #17.

Having STRDT(?o, "en^^xsd:language) return "hello world"@en would be a (breaking?) change from SPARQL 1.1.

Keeping datatype(""@en) = ?dt = rdfs:langString the same would give fewer differences with the Turtle parsing rules and SPARQL's datatype(). It would indeed entail that ^^?dt "shadows" the language tag information. But with also a parallel literal syntax to match languages, this may not be a true problem.

Looking at other proposals, #34 suggest ?o@* to match all language tagged (≟ only rdfs:langStringdatatype) strings or?o^^* for all datatype tagged (≟ all non-xsd:string and/or all non-rdfs:langString`). #112 looks only tangentially related.

I think that would result in, for the following data, the bindings as in the following table:

:s :p "Hello, world!"@en .
SPARQL ?o ?dt ?lang
:s :p ?o "Hello, world!"@en
:s :p ?o^^?dt "Hello, world!" rdfs:langString
:s :p ?o^^rdfs:langString "Hello, world!"
:s :p ?o@?lang "Hello, world!" "en"
:s :p ?o@en "Hello, world!"
? :s :p ?o^^?dt , ?o@?lang "Hello, world!" rdfs:langString "en"

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants