-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New article on how to force match ordering #180
base: master
Are you sure you want to change the base?
Conversation
Like SQL, Cypher is a declarative query language. This is most evident with Match patterns, which describe what you want to find in the graph. | ||
You do not dictate to it how to find these patterns, the way you might in an imperative programming language. | ||
|
||
Not only does the query planner decide how it will fulfill a single Match pattern, it has the ability to consider the entirety of patterns connected by common variables across multiple Match clauses in your query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I like the MATCH
and WITH
and WHERE
in backticks more as with and where are common English terms and it makes it more obvious than just capitalization that you mean the keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll make the change.
There is no Match operator. | ||
Instead, to fulfill anchoring and expansion, there are several kinds of lookup operators (AllNodesScan, NodeByLabelScan, NodeIndexSeek, and others), and several kinds of expand operators. | ||
|
||
As such, this particular aspect of query planning involves the analysis of the Match patterns in a query (not Match pattern by Match pattern, but connected patterns across multiple Matches), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to keep the one sentence per line, to make it easier later to track edits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the changes, thanks!
As such, this particular aspect of query planning involves the analysis of the Match patterns in a query (not Match pattern by Match pattern, but connected patterns across multiple Matches), | ||
and breaking those down into these various smaller operations. | ||
The ordering of those smaller operations is not constrained by the ordering of the Match patterns that these operations fulfill. | ||
Instead, the planner will attempt to use what it knows of the graph via metadata and counts data in order to select an optimal plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
via schema and index metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
In some cases, this works well, as the planner is able to see the larger pattern across Matches and consider other options that may be better for lookup and anchoring. | ||
|
||
In other cases, the metadata available to the planner may not give it sufficient knowledge to choose an optimal plan, and a suboptimal one may result instead. | ||
It is also possible for a plan to be mostly optimal across most of the graph data it can match against, but for there to be exceptional cases, such as supernodes with dense relationships, that end up being severely suboptimal when the query executes across these exceptional areas of the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with dense -> with many
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last part of the sentence is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made.
We also know that tv shows are supernodes, as there are many users who like the same show, so anchoring and expanding from tv show nodes is expensive. | ||
|
||
If the metadata available to the planner is not enough to guide it to an efficient plan, and it chooses either the tv show, country, or both for anchoring, then we may be looking for a way to force the planner to choose a more optimal plan. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be probably good to show such a suboptimal plan?
We can, however, use some Cypher techniques to introduce barriers in a query that the planner cannot cross when planning how to solve a Match clause. | ||
|
||
Consider a graph of users, the tv shows they like, and the country each lives in. | ||
A query for one such path, with all 3 nodes of the path strictly defined, might be this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you mention that only happens if there is an index/constraint on all three? otherwise the one with the index will be picked as starting point.
However, the With clause alone does not apply a barrier to reordering. | ||
This may actually produce the exact same plan. | ||
|
||
However, if we introduce a new variable in the With clause, then that DOES introduce a barrier across which the planner cannot consider or reorder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evil hack :)
in this case I'd have used an index hint on user instead
or using join on country + using join on TvShow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it but it will be a bit too abstract for most readers.
Better to add the graph model + some plans?
Perhaps even one supernode would be enough?
Also mention that this happens only if all 3 starting points are equal in their "index selectivity"
Not sure if you want to point to other KB/Manual with index / join hints?
No description provided.