Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep triple provenance as named graphs #123

Open
pchampin opened this issue Dec 6, 2023 · 8 comments
Open

keep triple provenance as named graphs #123

pchampin opened this issue Dec 6, 2023 · 8 comments

Comments

@pchampin
Copy link

pchampin commented Dec 6, 2023

Issue type:

  • ➕ Feature request

Description:

Currently, there is no way to know from which source the link traversal retrieved a given triple.
I would like, for example, to be able to ask the following query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT * {
    <https://champin.net/#pa> foaf:knows ?p.
    GRAPH ?g { ?p foaf:name ?name }
}

to determine whether the name of a person comes from their own profile or another source.

Of course, I would expect the default graph to be, by default, the merge of all named graphs, so that "flat" queries still work as expected.

cc @lecoqlibre @FabienGandon

@rubensworks
Copy link
Member

Hi @pchampin 👋

The functionality you are describing is available in this actor: https://github.com/comunica/comunica-feature-link-traversal/tree/master/packages/actor-rdf-resolve-hypermedia-links-traverse-annotate-source-graph

It's not part of the default configuration, but a separate one, which has a corresponding web client here: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-prov-sources/

We haven't done any experiments with it so far, so we don't know at the moment how much overhead the implementation causes.

There may also be some alternative approaches possible to achieve triple provenance, such as the quoted triples from RDF-star. (this has been on hold for a while, but now that Comunica supports RDF-star, we could theoretically start building such an implementation)

@pchampin
Copy link
Author

pchampin commented Dec 6, 2023

Great, thanks @rubensworks .

Is there a way to use the command-line tool with this specific configuration file ? (I tried the -c flag, but it does not seem to work...).

@rubensworks
Copy link
Member

Is there a way to use the command-line tool with this specific configuration file ? (I tried the -c flag, but it does not seem to work...).

That should be possibly using the dynamic variant of the CLI tool (I suspect comunica-dynamic-sparql-link-traversal-solid in your case) and setting the COMUNICA_CONFIG envir variable.

@pchampin
Copy link
Author

pchampin commented Dec 6, 2023

Thanks again @rubensworks but I had no luck with the config file. Below is the command line I used:

COMUNICA_CONFIG=config-solid-prov-sources.json \
    my-comunica \
    "PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?g { <https://champin.net/#pa> foaf:knows ?p. GRAPH ?g { ?p foaf:name ?name } }" \
    --lenient \
    -l debug 2>/tmp/comunica-log

Note that my-comunica is an alias to comunica-dynamic-sparql-link-traversal-solid.

I get no result. While when I remove the GRAPH ?g clause around the 2nd triple, I do get results. So the triples are retrieved, but not put in named graphs as I was expecting...

I tested it with the version installed from NPM (2.10.1) or with the version built from the master branch (
bb3fa62).

@rubensworks
Copy link
Member

@pchampin Could you try again with the flag --unionDefaultGraph?

It seems to be working here with this query: https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/solid-prov-sources/#transientDatasources=https%3A%2F%2Fwww.rubensworks.net%2F&query=SELECT%20DISTINCT%20*%20WHERE%20%7B%0A%20%20%20%20GRAPH%20%3Fsource%20%7B%0A%20%20%20%20%20%20%3Fperson%20foaf%3Aname%20%3Fname.%0A%09%7D%0A%7D
However, it looks like some results have an empty graph binding, so the implementation probably has some issues still. (it's quite old, so things may have broken with more recent changes)

@pchampin
Copy link
Author

pchampin commented Dec 7, 2023

I did try with --unionDefaultGraph already, and yes, it provides results, but for the wrong reason... In fact, even with the default configuration AND the --unionDefaultGraph option, I get exactly the same result (with an empty IRI bound to ?g).

My understanding is that, when --unionDefaultGraph is on, the default graph is a read-only view, so simple triples (as opposed to quads) are added in the graph named <> (empty IRI). If anything, the results we get when turning on this option shows that the 'annotate-source-graph' actor fails to add the triples in the right named graph...

@rubensworks
Copy link
Member

Ok, thanks for checking.
So something is definitely going wrong in the 'annotate-source-graph' actor then...

@pchampin
Copy link
Author

maybe things have changed since 2 weeks ago, but I now realize that your example above does provide some named graphs after a bunch for empty named graphs!

I can't reproduce this on the command line, though :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants