[FilterableSource] Add order parameter? #8

rubensworks · 2021-10-29T12:29:05Z

An important aspect during query execution is the ability to exploit sort orders of result streams, as this can significantly improve performance during join processing.

To enable query engines to ask for certain orders, we may consider adding an additional parameter to the FilterableSource interface, which could look something like this:

interface FilterableSource {
  FilterResult matchExpression (
    optional Term? subject,
    optional Term? predicate,
    optional Term? obj,
    optional Term? graph,
    optional Expression? expression,
    optional { attribute QuadOrder? order } options,
  );
};
type QuadOrder = { term: string; order: 'asc' | 'desc' }[];

Furthermore, to help query engines out during query optimization, we could also add all available orders to the result metadata:

interface QueryResultMetadata {
  attribute QueryResultMetadataCount? count;
  attribute QuadOrder? order;
  attribute QuadOrder[]? alternativeOrders;
};

\cc @jacoscaz

The text was updated successfully, but these errors were encountered:

jacoscaz · 2021-10-29T20:26:16Z

I'm very much +1 on this. A thought: given that an engine might want to get available orderings before asking for a specific one, wouldn't the optional order param be more appropriate for the FilterResults#quads() method?

That would allow for something like the following:

const results = source.matchExpression(/* ... */);
const { availableOrders } = await results.metadata();

/* check available orders */

const stream = results.quad({ order: someSpecificOrder });

rubensworks · 2021-10-30T07:46:17Z

A thought: given that an engine might want to get available orderings before asking for a specific one, wouldn't the optional order param be more appropriate for the FilterResults#quads() method?

In general, that should work indeed.

However, I'm just wondering if there are cases where metadata would be dependent on the quads order. E.g., to indicate some cost, as certain orders may be more expensive to obtain (due to sorting or more expensive indexes).
In such cases, a new call to matchExpression might be beneficial.

jacoscaz · 2021-10-30T10:18:49Z

However, I'm just wondering if there are cases where metadata would be dependent on the quads order.

Good point. In quadstore's case, for example, orderings that do not match any of the configured indexes require in-memory sorting. The spec should provide the means for quadstore to relay this information to engines downstream.

However, I think there's a way to reconcile not having to call matchExpression() repeatedly for the same expression with passing of cost-related information between source and engine and index selection:

interface QuadOrder {
  cost: any; // additional cost, type TBD
  terms: { term: string, direction: 'asc' | 'desc' }[];
}

interface QueryResultMetadata {
  /* ... */
  cost: any; // basic cost, type TBD
  availableOrders?: QuadOrder;
}

I think decoupling the basic query cost (query setup, whether quads are filtered using an index or in-memory, ...) from the additional cost brought by sorting would make for a more expressive representation.

rubensworks · 2021-11-02T07:34:55Z

Good idea, we can take that approach indeed!
This would even allow us to keep the matchExpression interface unchanged.

Regarding this cost metadata, is assume this is something we want to include in the spec already?

In Comunica, we use the following fields to indicate costs:

    /**
     * An estimation of how many iterations over items are executed.
     * This is used to determine the CPU cost.
     */
    iterations: number;
    /**
     * An estimation of how many items are stored in memory.
     * This is used to determine the memory cost.
     */
    persistedItems: number;
    /**
     * An estimation of how many items block the stream.
     * This is used to determine the time the stream is not progressing anymore.
     */
    blockingItems: number;
    /**
     * An estimation of the time to request items from sources.
     * This estimation can be based on the `cardinality`, `pageSize`, and `requestTime` metadata entries.
     * This is used to determine the I/O cost.
     */
    requestTime: number;

All of these may also apply here, as they can help explain index access and sorting costs.

ericprud · 2021-11-02T09:29:31Z

Is there some intention that the ordering be combined with slicing to provide quasi-stable paging?

ORDER BY ?x DESC(?y)
OFFSET 150 LIMIT 50

rubensworks · 2021-11-02T09:37:38Z

Is there some intention that the ordering be combined with slicing to provide quasi-stable paging?

Ah yes, good point.

Stores like HDT (perhaps quadstore as well?) can do very efficient offsets (and limits), so it would definitely make sense to also include this next to the sort order.

So we could have something like results.quad({ order: someSpecificOrder, offset: 150, limit: 50 });

jacoscaz · 2021-11-02T10:47:15Z

I'm also very +1 on supporting offset and limit. However, these might be something to account for in the computation of costs (e.g. iterations would likely be influenced by limit). Perhaps these would be better off in the options argument to matchExpression?

jacoscaz · 2022-01-31T09:43:47Z

We've recently merged #7, which includes the start and length parameters in the Filterable interface. I think we can close this one.

rubensworks mentioned this issue Oct 29, 2021

Implement sort-merge join comunica/comunica#877

Open

rubensworks closed this as completed Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FilterableSource] Add order parameter? #8

[FilterableSource] Add order parameter? #8

rubensworks commented Oct 29, 2021

jacoscaz commented Oct 29, 2021

rubensworks commented Oct 30, 2021

jacoscaz commented Oct 30, 2021

rubensworks commented Nov 2, 2021

ericprud commented Nov 2, 2021

rubensworks commented Nov 2, 2021

jacoscaz commented Nov 2, 2021

jacoscaz commented Jan 31, 2022

[FilterableSource] Add order parameter? #8

[FilterableSource] Add order parameter? #8

Comments

rubensworks commented Oct 29, 2021

jacoscaz commented Oct 29, 2021

rubensworks commented Oct 30, 2021

jacoscaz commented Oct 30, 2021

rubensworks commented Nov 2, 2021

ericprud commented Nov 2, 2021

rubensworks commented Nov 2, 2021

jacoscaz commented Nov 2, 2021

jacoscaz commented Jan 31, 2022