Source filtering use cases #1

rubensworks · 2020-02-05T10:30:24Z

As discussed in rdfjs/stream-spec#16, this issue serves as a place to collect use cases for why additional filtering capabilities are needed in the Source interface.

The text was updated successfully, but these errors were encountered:

jacoscaz · 2020-02-06T10:28:21Z

In our case, we'd like to have additional filtering capabilities for which to optimize at the persistence level in order to reduce the amount of in-memory filtering we would otherwise have to do in order to satisfy a given query.

This would bring RDF/JS a little closer to most common specs dealing with data management at a persistence level (SQL, SPARQL, ...). Practical use cases are near endless and I have a hard time thinking of a use case that would not benefit such a feature. In our specific case, we often work with large numbers of IoT devices and simple queries such as give me all sensors whose latest datapoint we have received within the last 24 hr often results in having to filter out thousands of records in-memory. The effects of this grow exponentially when we have filtering queries depending on other filtering queries.

rubensworks · 2020-02-13T14:15:15Z

Our use case would be on the optimization of query engines by pushing down filters within the query plan to the storage level. As such, this matches the use case of @jacoscaz very well.

bergos · 2020-02-23T10:17:21Z

The current interface allows accessing time series only if the interval is known and the timestamps are aligned. Then it's possible to do something like this:

const subject = source.match(null, null, rdf.literal('2020-01-01T01:00'))[0].subject
const observation = source.match(subject)

I would expect from a filter interface to be an evolution of the match method that allows finding items in a range without a query engine. Different people may have different opinions what is a query engine. I would use the term in the RDF context for a piece of software that combines the results of different triple patterns. Using an index to solve a triple patterns is not a query engine.

const subject = source.matchFilter(null, null, filter.and(
  filter.gt(rdf.literal('2020-01-01T00:55:00.000')
  filter.lte(rdf.literal('2020-01-01T01:00:00.000')
))[0].subject
const observation = source.match(subject)

bergos · 2020-02-23T10:20:28Z

It should be possible to define custom filters. Below is an example for a text search filter.

source.matchFilter(null, ns.rdfs.label, filterThisText)

Where the filterThisText could be defined like this:

function textSearchFilter (text) {
  const test = term => {
    return term.value.toLowerCase().includes(text.toLowerCase())
  }

  return {
    termType: 'Filter',
    type: 'CUSTOM_TEXT_SEARCH',
    args: text,
    test
  }
}

jacoscaz · 2020-08-18T15:22:15Z

Following from rdfjs/data-model-spec#167 (comment) and rdfjs/data-model-spec#167 (comment), I’d like to address @rubensworks’ following concern:

A pipeline-based architecture is interested, I hadn't thought of that before. I'm just wondering if it's expressive enough for all types of queries. I would suspect recursive filter definitions may be a bit more expressive, which is what SPARQL algebra does.

I think you are definitely right - recursive definitions are more expressive. However, they also seem to be significantly harder to deal with from a development perspective and for quadstore I‘ve intentionally opted for a compromise between the two that allows me to keep the optimization part of the codebase relatively straightforward. In particular, I have found that re-ordering queries by their approximate counts (something I am still working on) becomes a mess really quickly when using the recursive approach. Granted, this could be due to a deficiency on my side rather than an objective difference in handling complexity but be as it may, I find myself having a much easier time when adopting the pipeline approach.

I am not at all opposed to recursiveness, though, and I will get a chance to explore that further once I move to switch from sparqljs to sparqlalgebrajs in version 8.

rubensworks · 2021-04-26T06:29:09Z

Done in #4.

rubensworks transferred this issue from rdfjs/stream-spec Aug 18, 2020

rubensworks closed this as completed Apr 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source filtering use cases #1

Source filtering use cases #1

rubensworks commented Feb 5, 2020

jacoscaz commented Feb 6, 2020

rubensworks commented Feb 13, 2020

bergos commented Feb 23, 2020

bergos commented Feb 23, 2020

jacoscaz commented Aug 18, 2020 •

edited

Loading

rubensworks commented Apr 26, 2021

Source filtering use cases #1

Source filtering use cases #1

Comments

rubensworks commented Feb 5, 2020

jacoscaz commented Feb 6, 2020

rubensworks commented Feb 13, 2020

bergos commented Feb 23, 2020

bergos commented Feb 23, 2020

jacoscaz commented Aug 18, 2020 • edited Loading

rubensworks commented Apr 26, 2021

jacoscaz commented Aug 18, 2020 •

edited

Loading