-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Aligning with semantics of RDF Datasets #1
Comments
Option 5 looks equivalent to RDF reification, except that it uses list predicates to attach SPO together (with a slightly different topology), instead of using rdf:subject, rdf:predicate and rdf:object. That would feel like a step backward to me, since I'd like to deprecate RDF reification in favor of named graphs. I'm curious what are the different semantics that different implementations are using. Was this collected into a list somewhere? |
@dbooth-boston There are of course several variation within Option 5. That should probably be a separate issue (now raised as #2). This issue is about how to align with RDF Datasets, and I listed Option 5 as a straw proposal for basically keeping this issue out of the critical path, because it's a tar pit. I specifically am not advocating for RDF's standard reification vocabulary, for reasons we could discuss there. I feel a little offended at, "That would feel like a step backward to me, since I'd like to deprecate RDF reification in favor of named graphs." It's like you go to the doctor, explain in detail how it hurts when put weight on your left ankle, you've tried everything, it's not getting better, and the doctor says, "I'd like to see you walking on it, so you just probably just do that. Don't worry about the pain." If he's going to say that, he has to first make sure the diagnosis is correct. I'm not aware of a list of who is using which semantics. You could ask the author, Antoine Zimmermann. I'm pretty sure the variations listed in his document were all discussed in the WG as something someone wanted to do, but I don't know which ones were (let alone still are) actually in use. But doesn't that only matter for Option 1 and 2, which are both kind of ... unattractive? |
Oh gosh, I did not mean to offend! I sincerely apologize for sounding offensive. I certainly agree that Option 5 is a legitimate option, and bears inclusion, even if I am hoping that we won't end up going there. Understanding the semantics of current implementations bears most directly on Options 1 and 2, yes, but it also seems that it would provide more complete context for the issue as a whole. |
Okay, no worries. I do wish Named Graphs were a bit better. Like, how do they relate to Property Graphs? And why did wikidata decide to use a custom reification solution? And what on earth should I do with credibility data? Maybe option-3 or option-4 gives us an answer. |
While my opinion is that RDF 1.1 WG missed the boat by not explicitly going with option 1, option 2 is pretty close to how named graphs work i JSON-LD as well. In JSON-LD 1.1, we added graph contains that specifically use an anonymous graph as the value of a property, with the expectation that the blank node name of the graph as a value of the property directly relates to the associated named graph. In fact, I believe linked data signatures (or at least Verifiable Claims) depends on this interpretation. |
Note that @doerthe and myself presented a paper at RuleML about how the different dataset semantics could be represented using N3 (with supporting rules): http://ceur-ws.org/Vol-2438/paper6.pdf This could be a neat extension to the core N3 standard (will mention it in the current spec regardless) but for now I'd defer this issue to a later point, at least, if nobody disagrees. |
I support this effort. My own N3 implantation uses named graphs as the basis for formulae, where a blank node naming a graph and used in the subject/object position of some other triple denotes that named graph. Of course, to some, this is an apocryphal appropriate of blank nodes, but I suggest that we can define that interpretation within the scope of Notation-3. Otherwise, I suspect the blank node skolemization suggested in your paper might be problematic, but I confess that I didn't look closely enough at it. (A hypothetical JSON-LD-based reasoner would face a similar problem with blank nodes). |
I believe we talk about skolemization in the context of a particular case (union semantics when assuming that named graphs partition triples), but indeed, this is still a work in progress. |
From mailing list thread
The semantics of "named graphs" (part of a formalism called "RDF Datasets") are not exactly what one might expect, or perhaps what one might want. As RDF 1.1 Concepts and Abstract Syntax says:
This is going to make it hard to use named graphs for N3. We can't just say that:
is represented in an RDF Dataset like this (using TriG with the optional GRAPH keyword for clarity and to match SPARQL):
because the blank nodes _:g1 and _:g2 aren't actually constrained by any specification to denote the graphs they are paired with.
This may look like a foolish oversight in the design of RDF Datasets, but the reality at the time was people had implemented and were using Named Graph / Quad systems with a variety of different semantics (as enumerated in RDF 1.1: On Semantics of RDF Datasets, and there was no proposal for a single semantics that had anything like consensus.
So, what are our options now? Here's what I see:
Option 1. Override and/or update that decision in the specification of RDF Datasets. This might be possible if there is currently a consensus that did not exist at the time. I think this is unlikely, but if someone wants to pursue this, they should start by building a complete survey of folks using RDF Datasets and the semantics of their usage. If it turns out here's rough consensus, there might be a path forward here.
Option 2. Override and/or update that decision in the specification of RDF Datasets for some specific cases. I'm thinking in particular that when using blank nodes as graph labels in datasets it's much more likely everyone is using the same semantics. But this is still probably too hard. Still, I find this very tempting.
Option 3. Explicitly convey the intended semantics of each RDF Dataset. One example is provided in Section 4. Declaring the intended sematics, and Ruben's email gives another It's a bit of work, but not too bad, I think. One challenge is that the "<>" construct doesn't work in N-Quads. (More specifically, I don't think there's any way to convey metadata in N-Quads that doesn't depend on knowing exactly the URL for that N-Quads content. Maybe worth adding a construct like "#self=<...>".)
Option 4. Convey the intended semantics of RDF Datasets more subtly, eg by using certain predicates. For example, consider
and imagine we define ex:fetchedFrom to include our intended dataset semantics. If we're making up a new predicate, this can't conflict with any current usage, so I'm guessing this would work find in practice, even if it feels odds, like it's breaking levels. But I think it works, and we'll have the same issue with defining variables.
Option 5. Use lists instead of RDF Datasets. For example:
Let's call that thing on the left an SPOList, a list of triples, where each triple is a list of subject, predicate, object. This is looking pretty good to me right now. At this point, we can break out some more of LISP and represent formulas rather elegantly: (:forall :x (implies ((:x a :Man)) ((:x a :Mortal))))
Again, this is just a way to represent graphs in RDF triples; the surface syntax can still use ? for variables and { } for graphs.
Any other options?
The text was updated successfully, but these errors were encountered: