-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FilterableSource] Add order parameter? #8
Comments
I'm very much +1 on this. A thought: given that an engine might want to get available orderings before asking for a specific one, wouldn't the optional That would allow for something like the following: const results = source.matchExpression(/* ... */);
const { availableOrders } = await results.metadata();
/* check available orders */
const stream = results.quad({ order: someSpecificOrder }); |
In general, that should work indeed. However, I'm just wondering if there are cases where metadata would be dependent on the quads order. E.g., to indicate some cost, as certain orders may be more expensive to obtain (due to sorting or more expensive indexes). |
Good point. In quadstore's case, for example, orderings that do not match any of the configured indexes require in-memory sorting. The spec should provide the means for quadstore to relay this information to engines downstream. However, I think there's a way to reconcile not having to call interface QuadOrder {
cost: any; // additional cost, type TBD
terms: { term: string, direction: 'asc' | 'desc' }[];
}
interface QueryResultMetadata {
/* ... */
cost: any; // basic cost, type TBD
availableOrders?: QuadOrder;
} I think decoupling the basic query cost (query setup, whether quads are filtered using an index or in-memory, ...) from the additional cost brought by sorting would make for a more expressive representation. |
Good idea, we can take that approach indeed! Regarding this cost metadata, is assume this is something we want to include in the spec already? In Comunica, we use the following fields to indicate costs: /**
* An estimation of how many iterations over items are executed.
* This is used to determine the CPU cost.
*/
iterations: number;
/**
* An estimation of how many items are stored in memory.
* This is used to determine the memory cost.
*/
persistedItems: number;
/**
* An estimation of how many items block the stream.
* This is used to determine the time the stream is not progressing anymore.
*/
blockingItems: number;
/**
* An estimation of the time to request items from sources.
* This estimation can be based on the `cardinality`, `pageSize`, and `requestTime` metadata entries.
* This is used to determine the I/O cost.
*/
requestTime: number; All of these may also apply here, as they can help explain index access and sorting costs. |
Is there some intention that the ordering be combined with slicing to provide quasi-stable paging? ORDER BY ?x DESC(?y)
OFFSET 150 LIMIT 50 |
Ah yes, good point. Stores like HDT (perhaps quadstore as well?) can do very efficient offsets (and limits), so it would definitely make sense to also include this next to the sort order. So we could have something like |
I'm also very +1 on supporting |
We've recently merged #7, which includes the |
An important aspect during query execution is the ability to exploit sort orders of result streams, as this can significantly improve performance during join processing.
To enable query engines to ask for certain orders, we may consider adding an additional parameter to the
FilterableSource
interface, which could look something like this:Furthermore, to help query engines out during query optimization, we could also add all available orders to the result metadata:
\cc @jacoscaz
The text was updated successfully, but these errors were encountered: