Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom data indexes #9

Open
blake-regalia opened this issue Nov 3, 2021 · 1 comment
Open

Custom data indexes #9

blake-regalia opened this issue Nov 3, 2021 · 1 comment

Comments

@blake-regalia
Copy link

Something to keep in mind when developing filterable source is that SPARQL 1.1 is not readily capable of expressing all types of queries and this can get complicated when graph pattern matching has to be reconciled with arbitrary data indexes. Take the following exploratory work done on graphy for example which use indexes that are baked into the HDT file being queried:

k_store.pattern({
   '?place': {
      a: 'dbo:Place && (dbo:City || dbo:National_Park)',

      // use built-in data index "its:number" to perform numeric range filter
      dbo_population: '{its:number is > 100e3}',

      // use built-in text index "its:text" to perform string matching
      dbo_abstract: '{its:text contains /central/i}',

      // use registered custom data comparison algorithm to match terms whose contents are locale-dependent
      dbo_annual_cost: '{currency:worth is > $20m and <= $40m}',

      // use registered spatial index to solve topological query
      ago_footprint: '{ago:geometry is within ?state and contains ?park}',

      // use registered knn to find neighbors
      ago_centroid: '{ago:geometry closest 10 ?park}'
   },
})

Notice how the spatial queries cannot be adequately solved using a filter; for optimal performance, the query engine must be able to decide the order in which to solve the joins amongst the graph patterns and each of the topological queries.

@rubensworks
Copy link
Member

If I understand correctly, the need is to be able to push down filters into sources that may span multiple quad patterns (or even more higher-level operations).

Currently, the FilterableSource interface provides a method to do something like source.matchExpression(s, p, o, g, filter).
Instead (or additionally), we need something like the following:

source.matchOperation(
  bgp(
    pattern(s1, p1, o1, g1),
    pattern(s2, p2, o2, g2),
  ),
  filter,
);

which would allow the resolution of the operation (a BGP in this case) together with the filter expression to be handled by the index.

This matchOperation method could not exist inside the realm of pure RDF quads anymore, so we'll have to return a bindings stream here instead of a quad stream.

This would make it quite similar to the current QueryableAlgebra interface from #7. So I'm wondering if that already meets these needs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants