Slash separated ontologies cause unneeded traversals when used as source #61

Maximvdw · 2022-04-24T13:26:06Z

Issue type:

🐛 Bug

Description:

This issue is for fixing performance issues with slash separated ontologies. Lets say you are using X subjects
from the same ontology http://example.com/myontology/A, http://example.com/myontology/B. Currently they are treated as individual datasets and will be traversed individually (as they should). In a normal linked data front-end this would work fine and only fetch these concepts rather than a large dataset that might contain unneeded information.

In some use cases you might be using a lot of concepts from the same ontology, in which case one request to http://example.com/myontology/ would be preferable.

When putting this ontology in sources, I would expect only one request to be make. However, it seems Comunica will still try to fetch the subjects individually creating individual requests for every subject in http://example.com/myontology/.

I think it is similar to the 'similarity' prioritisation in #51 , however I was not certain it is the exact issue that appears here.

Try it out
https://comunica.github.io/comunica-feature-link-traversal-web-clients/builds/default/
Use the following source (http):
http://qudt.org/vocab/unit/
Enable the proxy (also tested it without proxy):
https://proxy.linkeddatafragments.org/
Make sure it is HTTPS and not the default HTTP

Test query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qudt: <http://qudt.org/schema/qudt/>
SELECT ?unitName WHERE {
    ?unit a qudt:Unit ;
          rdfs:label ?unitName .
}

For each concept that is available in the source, it will still create a request to the individual pages (which in the case for this ontology is a lot)

Environment:

I am using the default config along with "@comunica/query-sparql-link-traversal-solid": "0.0.2-alpha.4.0",

The text was updated successfully, but these errors were encountered:

github-actions · 2022-04-24T13:26:33Z

Thanks for reporting!

Maximvdw · 2022-04-24T13:47:43Z

Hmm I am not sure if it is intended that actor-rdf-resolve-hypermedia-links-traverse-prune-shapetrees would solve this issue with a fictional new ShapeTree(undefined, undefined, 'http://qudt.org/unit/{id}')?

rubensworks · 2022-04-25T13:30:29Z

Thanks for the issue. This is a very interesting case, which I hadn't considered before.

So the problem here is that too many requests are being done for this query. The link traversal algorithm will do lookups for each seperate unit document, even though all required information is actually already present in the initial source.
So we need a mechanism to indicate this fact somehow.

Shapetrees may indeed be a possible solution for this (perhaps using some trickery with cardinalities in shapes), but I'm not sure. In any case, the current shapetrees implementation is incomplete, so it definitely can not be used as-is. I'll report here once I've made some progress on the shapetrees implementation, and when I think it might be helpful here.

In the meantime, content policies may also do the trick, as it should be able to indicate specifically what links can be followed. But this is also still very experimental.

rubensworks · 2022-10-31T09:19:25Z

This problem was also mentioned by @jeswr in #84 for the FOAF vocabulary.

jeswr · 2022-11-26T06:55:09Z

even though all required information is actually already present in the initial source. So we need a mechanism to indicate this fact somehow.

One way of doing this is to make use of rdfs:isDefinedBy. In particular, when doing link traversal, all incoming patterns of the form ?s rdfs:isDefinedBy ?o should be stored in a lookup table or in-memory store, so that before a link is added to the queue from link traversal we can first see if it is in the isDefinedBy lookup table and that the document that it isDefinedBy has already been dereferenced.

This would indeed solve the case qudt above which has terms defined as follows:

<http://qudt.org/vocab/unit/AMD>
  a <http://qudt.org/schema/qudt/CurrencyUnit> ;
  a <http://qudt.org/schema/qudt/Unit> ;
  <http://purl.org/dc/terms/description> "Armenia"^^rdf:HTML ;
  <http://qudt.org/schema/qudt/currencyExponent> 0 ;
  <http://qudt.org/schema/qudt/dbpediaMatch> "http://dbpedia.org/resource/Armenian_dram"^^xsd:anyURI ;
  <http://qudt.org/schema/qudt/hasDimensionVector> <http://qudt.org/vocab/dimensionvector/A0E0L0I0M0H0T0D1> ;
  <http://qudt.org/schema/qudt/hasQuantityKind> <http://qudt.org/vocab/quantitykind/Currency> ;
  <http://qudt.org/schema/qudt/informativeReference> "http://en.wikipedia.org/wiki/Armenian_dram?oldid=492709723"^^xsd:anyURI ;
  rdfs:isDefinedBy <http://qudt.org/2.1/vocab/unit> ;
  rdfs:isDefinedBy <http://qudt.org/vocab/unit> ;
  rdfs:label "Armenian Dram"@en ;
.

Note in order for this to work properly all links the responseURL should also be added to the set of already dereferenced documents (though maybe this is the job of the http cache?) and ideally one would also trackRedirects if using a library with an API like follow-redirects to further optimise this process.

cc @pmcb55

rubensworks added the bug 🐛 label Apr 24, 2022

Maximvdw changed the title ~~Slash separated ontologies cause unneeded traversals~~ Slash separated ontologies cause unneeded traversals when used as source Apr 24, 2022

rubensworks added investigate performance 🐌 labels Apr 25, 2022

rubensworks mentioned this issue Oct 31, 2022

Excessive re-request of same resource when slash (/) uris redirect to the same page #84

Closed

jeswr mentioned this issue Jun 17, 2024

Use rdfs:isDefinedBy to prevent unecessary requests on vocabularies defined using / semantics #141

Closed

rubensworks removed the bug 🐛 label Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slash separated ontologies cause unneeded traversals when used as source #61

Slash separated ontologies cause unneeded traversals when used as source #61

Maximvdw commented Apr 24, 2022 •

edited

Loading

github-actions bot commented Apr 24, 2022

Maximvdw commented Apr 24, 2022 •

edited

Loading

rubensworks commented Apr 25, 2022

rubensworks commented Oct 31, 2022

jeswr commented Nov 26, 2022 •

edited

Loading

Slash separated ontologies cause unneeded traversals when used as source #61

Slash separated ontologies cause unneeded traversals when used as source #61

Comments

Maximvdw commented Apr 24, 2022 • edited Loading

Issue type:

Description:

Environment:

github-actions bot commented Apr 24, 2022

Maximvdw commented Apr 24, 2022 • edited Loading

rubensworks commented Apr 25, 2022

rubensworks commented Oct 31, 2022

jeswr commented Nov 26, 2022 • edited Loading

Maximvdw commented Apr 24, 2022 •

edited

Loading

Maximvdw commented Apr 24, 2022 •

edited

Loading

jeswr commented Nov 26, 2022 •

edited

Loading