-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Triple which is not a Quad #144
Comments
Issues in brief:
|
If you have 'triple in some graph', I think you already have a quad there. Any application can implement functions to compare equality only based on
once again, quad means a triple in a graph (named or default)
I understand that sometimes you see need to only consider the In practice, that seems to mean that Turtle parser would emit instances of Triple, while Trig parser would emit instances of Quad, some of them with
You can do it by having two different datasets, each having its distinct default graph. Since blank node labels stay scoped to dataset, one has to stay careful when merging data from two different datasets. If you have two distinct graphs from the same dataset, at least one of them needs to be named graph.
@jacoscaz could you please comment on that based on your experience with |
A Quad specifically means a Triple is a member of a named graph (potentially including a single default graph with no URI). I scarcely use named graphs. Imagine I parse two identical Turtle files with a single triple, and a TriG file representing this: a.ttl
b.ttl
dataset.trig
In the Turtle case, Quad#equals returns Since there's no concept of a Triple, our current
Why should parsers for different media types emit the same thing? Nobody ever complained that JSON parsers can't return a DOM because HTML does. The paragraph you reference specifically suggests how to convert the two data models between one another for compatibility, an admission they're not the same thing:
This is not intuitive. The point of having a Dataset is to manage multiple graphs; if you add "... unless you need multiple unnamed graphs, then use multiple Datasets" now I have to decide what to do in the case two datasets have conflicting information about the same named graph. |
@awwright This spec is not about covering everything for every library which does RDF, it's about a common set of interfaces most of us can agree on. That requires to make it open enough for custom features which are still spec compliant and allow custom features without (big) performance drawbacks. I think what you want to do is possible in a spec compliant way. Maybe it must be implemented in a not so nice way, but it's possible. e.g. you can make a It looks like there is a consensus that triple will be removed (#124). If there is any chance to convince people of your point, I think code examples would be the best option with a comparison of doing it the spec way vs. your way. |
@bergos That's a fair point, but then the question becomes why Quad? I see evidence that Triple is somewhat simpler, more intuitive, has more uses, and so is more likely to see adoption. Up until this one, every RDF library has exposed Triples, and now that we've had some implementation experience, what evidence do we have so far that Quads produces a more successful API? And consider Quad#equals, which has little need for standardization (I can't use it to compare the atoms of the statement, it's not necessary for interoperability, the few libraries that need can just implement their own, yes?) But that's assuming it's a one-or-the-other proposition. We can have both; it's the natural progression of standards to expand in scope as different implementations realize they're implementing the same features. |
Parsers for JSON-LD, TriG or N-Quads require having a Quad. I don't know how an option/compromise for Triples handling Quads could look like without being a Quad. The other way round by using DefaultGraph for the graph looks much less than a compromise. Also @timbl said there is always a 4th property for a statement (graph or why in rdflib.js). |
I think handling clearly #117 would get affected by having another option of
I consider to proposing not to include |
In #153 we have removed #154 show further motivation for it, |
@awwright Quad is the more generic class of Triple. I think it would be easy to support your idea of Triple via subclassing Quad in some impl, but getting something like this into the spec is likely much harder. However, I do think it would be less cumbersome to propose (yet another) term type, e.g., |
@blake-regalia What's the behavior for What about implementations that rely on using |
If you 'find it in a graph' then you have a Quad. As I understand @blake-regalia a BTW personally I don't see need for suggested |
A Quad is an assertion that a given Triple exists in a given Graph. It makes sense to talk about two Quads I find and ask if they're the same Triple! |
function equalTriples(some, other) {
return some.subject.equals(other.subject) &&
some.predicate.equals(other.predicate) &&
some.object.equals(other.object)
} |
@elf-pavlik Sure, but you also have to explain why we instead have a throughly useless |
For illustration, here's an example of a problem I just ran into: I've got a Dataset of RDF statements, organized by source file. I aggregate this dataset into a single graph, and use this graph to build a search index, sitemaps, tables of data, and other queries across the whole collection. Doing this in a Dataset is possible. (Dataset/Quad is, indeed, a superset of Graph/Triple.) However, it's clunky: In some cases, I would find the same RDF statement serialized to my Turtle file multiple times. I have to map/reduce the Dataset to another Dataset, changing the graph property to a constant. I have to write an assertion to check that the data I pull out of this aggregate dataset has the correct graph property (to protect against future changes). And having this in a separate Dataset sort of defies the point of a Dataset (which is to store multiple graphs). Some of the statements will be found in multiple graphs, but Quad doesn't have a mechanism to specify more than one graph. So instead I use a Triple. And if (for some reason) I need to determine what graphs the triple is found in, I can query the Dataset's SPOG index. If I get a Quad, I don't know if the graph property is significant or not. If I get a Quad in the default graph, can I assume future Quads will also be in the default graph? Or will I have to add code to handle different graphs? To safely process a Quad I always have to handle all four properties. But the application doesn't always require this, and sometimes the semantics are undefined or under-constrained. The solution here seems to be to throw if the Dataset defines more than one graph. I find this dubious. |
@awwright similar to my point here: #159 (comment) |
If the point of the RDF/JS spec is to form consensus around the API, why is it going against what established RDF APIs have been doing for 20 years? Is it a case of NIH syndrome? Take RDF4J, Jena, ruby-rdf - every single one of them contain an abstraction for graph and triple. That is because RDF 1.0 only standardized those. Datasets and quads came much later, with SPARQL. Eventually they made it into RDF 1.1. So if a developer is familiar with RDF at all, there is a much bigger chance s/he is familiar with triples and not quads. And this API does not even contain such terms. Why alienate and confuse potential users? |
For the sake of argument, here's a couple considerations:
|
RDF 1.1 , Trig, N-Quads all have 2014 release date. I think APIs started 20 years ago might have not taken Datasets and named graphs into account. Let's think of this simple experiment, let's serve exactly same representation for What graph parser will assign when parsing
served with content type clue
|
@elf-pavlik is that a trick question? The default graph. What does that prove? I think a more relevant experiment is reading such data, then taking it from the default graph and storing it into named graph, which name is most likely the URI the data was read from. This just goes against working with graphs as units (and triples as their constituents). That is important because currently Linked Data is graph-based, not quad-based. |
Thinking about immutability conversation in #81 changing the graph component doesn't sound like a way to go. I think one would either use a transform stream which would create copy of each quad with different |
Something I either didn't see or forgot to mention:
This is not true because Quad always makes an assertion that some graph contains some triple; Triple does not do this. Therefore, the two classes have disjoint semantics. |
It came to my attention in #124 that we don't really have a way of talking about triples without implying that they're part of a graph. Since #124 is about a slightly different issue (if
triple
should be aliased toquad
), I'd like to separately raise adding aTriple
interface.I think it's important to have separate Triple and Quad instances, because they're not the same thing. A Triple is an axiomatic statement; a Quad additionally signifies a Triple exists in a single graph. But sometimes I want to be able to talk about an RDF statement without implying membership in a graph.
So far we've supposed the DefaultGraph should be sufficient if graph membership is unimportant—just treat it as extraneous information. Perhaps we add the requirement that RDF sources add configuration options on how to generate graph names. But this is a workaround; it adds additional complexity to many components of an ecosystem that could be dispensed with entirely.
For example, suppose I parse two Turtle documents and want to test if they're isomorphic. What does this mean if I'm returned a Dataset, without any interface-level guarantee all the triples will be in a single graph? Confusing Quad for Triple muddies the semantics of RDF, which does not define interpretations/entailment over anything other than a single graph. RDF uniquely identifies statements by (subject,predicate,object), and this triple is the same triple even if present in multiple graphs. But the current implementation considers them to be different quads; so there is no way to test for triple-equality.
Adding a graph property immediately doubles the memory requirements to have a fully indexed RDF store. For applications that don't need a graph property—such as testing isomorphism or entailment—this can be quite significant.
URIs/IRIs are supposed to be universal, and so this adds a requirement that each component agree on how to name graphs & treat graph names. While this shouldn't be a foreign concept to RDF developers, a fourth dimension of IRI to maintain is not insignificant, and in my experience working with RDF, not typically necessary; as a result, we now have to decide how to configure a parser that should be zero-configuration.
Sometimes I want to be able to hold multiple graphs in memory without naming them. What are the semantics of having two Quad stores with different information for the same graphs? It's probably possible to figure out, but it's not immediately apparent to me.
It appears to me that Quad stores and named graphs were invented for applications that can't store graphs without names; for example SPARQL, where the graph name is an alternative to a file on the filesystem. But we don't have this limitation in ECMAScript, and I don't think we should limit the data interface to things describable over SPARQL.
For some perspective: Presently I'm working on an application that uses and produces RDFa data. (In the future, it'll do the same with JSON-LD and JSON Hyper-schema.) It uses datasets and quads to identify which RDFa document makes which statements. This is done with a library I've maintained, itself derived from webr3's work.
First, I want the application to manage the namespace for the graphs, as opposed to libraries I call out to. I've tried managing the data a few different ways, and I've simply found it's simpler if I work with Triples when I'm dealing with graphs, and Quads in a single case where I'm aggregating all the information together or querying it.
Second, several of the document operations demands use for Triple, because I have a
Graph
implementation that provides useful methods that only make sense defined over graphs, things like unions, merges, equality/isomorphism testing, and so on. We're defining an OO interface, and so I would like to define methods that are defined over a graph and not an entire dataset.Additionally I've been considering adding these methods to Triple, because Triples can be considered a singleton Graph; but a Triple is not a Quad: Since Quad implies two pieces of axiomatic information (both a statement, and its membership to a single graph), and sometimes these methods are only defined over one or the other, not both.
I hope this makes a convincing point; I'm happy to answer any questions or consider any feedback. Thanks!
The text was updated successfully, but these errors were encountered: