Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Aligning with semantics of RDF Datasets #1

Open
sandhawke opened this issue Dec 17, 2018 · 8 comments
Open

Aligning with semantics of RDF Datasets #1

sandhawke opened this issue Dec 17, 2018 · 8 comments
Labels
defer Deferring this issue until we have established the N3 standard.

Comments

@sandhawke
Copy link
Contributor

sandhawke commented Dec 17, 2018

From mailing list thread

The semantics of "named graphs" (part of a formalism called "RDF Datasets") are not exactly what one might expect, or perhaps what one might want. As RDF 1.1 Concepts and Abstract Syntax says:

Despite the use of the word “name” in “named graph”, the graph name is not required to denote the graph. It is merely syntactically paired with the graph. RDF does not place any formal restrictions on what resource the graph name may denote, nor on the relationship between that resource and the graph. A discussion of different RDF dataset semantics can be found in [RDF 1.1: On Semantics of RDF Datasets].

This is going to make it hard to use named graphs for N3. We can't just say that:

{ ...g1... } log:implies { ...g2... }

is represented in an RDF Dataset like this (using TriG with the optional GRAPH keyword for clarity and to match SPARQL):

GRAPH _:g1 { ...g1...}
GRAPH _:g2 { ...g2...}
_:g1 log:implies _:g2

because the blank nodes _:g1 and _:g2 aren't actually constrained by any specification to denote the graphs they are paired with.

This may look like a foolish oversight in the design of RDF Datasets, but the reality at the time was people had implemented and were using Named Graph / Quad systems with a variety of different semantics (as enumerated in RDF 1.1: On Semantics of RDF Datasets, and there was no proposal for a single semantics that had anything like consensus.

So, what are our options now? Here's what I see:

Option 1. Override and/or update that decision in the specification of RDF Datasets. This might be possible if there is currently a consensus that did not exist at the time. I think this is unlikely, but if someone wants to pursue this, they should start by building a complete survey of folks using RDF Datasets and the semantics of their usage. If it turns out here's rough consensus, there might be a path forward here.

Option 2. Override and/or update that decision in the specification of RDF Datasets for some specific cases. I'm thinking in particular that when using blank nodes as graph labels in datasets it's much more likely everyone is using the same semantics. But this is still probably too hard. Still, I find this very tempting.

Option 3. Explicitly convey the intended semantics of each RDF Dataset. One example is provided in Section 4. Declaring the intended sematics, and Ruben's email gives another It's a bit of work, but not too bad, I think. One challenge is that the "<>" construct doesn't work in N-Quads. (More specifically, I don't think there's any way to convey metadata in N-Quads that doesn't depend on knowing exactly the URL for that N-Quads content. Maybe worth adding a construct like "#self=<...>".)

Option 4. Convey the intended semantics of RDF Datasets more subtly, eg by using certain predicates. For example, consider

_:g1 ex:fetchedFrom <https://example.org/g1>.
_:g1 { ... some triples }    

and imagine we define ex:fetchedFrom to include our intended dataset semantics. If we're making up a new predicate, this can't conflict with any current usage, so I'm guessing this would work find in practice, even if it feels odds, like it's breaking levels. But I think it works, and we'll have the same issue with defining variables.

Option 5. Use lists instead of RDF Datasets. For example:

( (:s1 :p1 :o1) (:s2 :p2 :o2) ... ) ex2:fetchedFrom <https://example.org/g1>.

Let's call that thing on the left an SPOList, a list of triples, where each triple is a list of subject, predicate, object. This is looking pretty good to me right now. At this point, we can break out some more of LISP and represent formulas rather elegantly: (:forall :x (implies ((:x a :Man)) ((:x a :Mortal))))

Again, this is just a way to represent graphs in RDF triples; the surface syntax can still use ? for variables and { } for graphs.

Any other options?

@dbooth-boston
Copy link

Option 5 looks equivalent to RDF reification, except that it uses list predicates to attach SPO together (with a slightly different topology), instead of using rdf:subject, rdf:predicate and rdf:object. That would feel like a step backward to me, since I'd like to deprecate RDF reification in favor of named graphs.

I'm curious what are the different semantics that different implementations are using. Was this collected into a list somewhere?

@sandhawke
Copy link
Contributor Author

sandhawke commented Dec 17, 2018

@dbooth-boston There are of course several variation within Option 5. That should probably be a separate issue (now raised as #2). This issue is about how to align with RDF Datasets, and I listed Option 5 as a straw proposal for basically keeping this issue out of the critical path, because it's a tar pit. I specifically am not advocating for RDF's standard reification vocabulary, for reasons we could discuss there.

I feel a little offended at, "That would feel like a step backward to me, since I'd like to deprecate RDF reification in favor of named graphs." It's like you go to the doctor, explain in detail how it hurts when put weight on your left ankle, you've tried everything, it's not getting better, and the doctor says, "I'd like to see you walking on it, so you just probably just do that. Don't worry about the pain." If he's going to say that, he has to first make sure the diagnosis is correct.

I'm not aware of a list of who is using which semantics. You could ask the author, Antoine Zimmermann. I'm pretty sure the variations listed in his document were all discussed in the WG as something someone wanted to do, but I don't know which ones were (let alone still are) actually in use.

But doesn't that only matter for Option 1 and 2, which are both kind of ... unattractive?

@dbooth-boston
Copy link

Oh gosh, I did not mean to offend! I sincerely apologize for sounding offensive. I certainly agree that Option 5 is a legitimate option, and bears inclusion, even if I am hoping that we won't end up going there.

Understanding the semantics of current implementations bears most directly on Options 1 and 2, yes, but it also seems that it would provide more complete context for the issue as a whole.

@sandhawke
Copy link
Contributor Author

sandhawke commented Dec 17, 2018

Okay, no worries.

I do wish Named Graphs were a bit better. Like, how do they relate to Property Graphs? And why did wikidata decide to use a custom reification solution? And what on earth should I do with credibility data?

Maybe option-3 or option-4 gives us an answer.

@gkellogg
Copy link
Member

While my opinion is that RDF 1.1 WG missed the boat by not explicitly going with option 1, option 2 is pretty close to how named graphs work i JSON-LD as well. In JSON-LD 1.1, we added graph contains that specifically use an anonymous graph as the value of a property, with the expectation that the blank node name of the graph as a value of the property directly relates to the associated named graph.

In fact, I believe linked data signatures (or at least Verifiable Claims) depends on this interpretation.

@william-vw
Copy link
Collaborator

Note that @doerthe and myself presented a paper at RuleML about how the different dataset semantics could be represented using N3 (with supporting rules): http://ceur-ws.org/Vol-2438/paper6.pdf

This could be a neat extension to the core N3 standard (will mention it in the current spec regardless) but for now I'd defer this issue to a later point, at least, if nobody disagrees.

@gkellogg
Copy link
Member

gkellogg commented Aug 5, 2020

I support this effort. My own N3 implantation uses named graphs as the basis for formulae, where a blank node naming a graph and used in the subject/object position of some other triple denotes that named graph. Of course, to some, this is an apocryphal appropriate of blank nodes, but I suggest that we can define that interpretation within the scope of Notation-3.

Otherwise, I suspect the blank node skolemization suggested in your paper might be problematic, but I confess that I didn't look closely enough at it. (A hypothetical JSON-LD-based reasoner would face a similar problem with blank nodes).

@william-vw
Copy link
Collaborator

Otherwise, I suspect the blank node skolemization suggested in your paper might be problematic, but I confess that I didn't look closely enough at it. (A hypothetical JSON-LD-based reasoner would face a similar problem with blank nodes).

I believe we talk about skolemization in the context of a particular case (union semantics when assuming that named graphs partition triples), but indeed, this is still a work in progress.

@william-vw william-vw added the defer Deferring this issue until we have established the N3 standard. label Aug 5, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
defer Deferring this issue until we have established the N3 standard.
Projects
None yet
Development

No branches or pull requests

4 participants