Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New article on how to force match ordering #180

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions articles/modules/ROOT/pages/how-to-force-match-ordering.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
= How to Force Match Ordering
:slug: how-to-force-match-ordering
:author: Andrew Bowman
:neo4j-versions: 3.5, 4.0, 4.1, 4.2, 4.3, 4.4, 5.x
:tags: cypher
:category: cypher

Like SQL, Cypher is a declarative query language. This is most evident with Match patterns, which describe what you want to find in the graph.
You do not dictate to it how to find these patterns, the way you might in an imperative programming language.

Not only does the query planner decide how it will fulfill a single Match pattern, it has the ability to consider the entirety of patterns connected by common variables across multiple Match clauses in your query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like the MATCH and WITH and WHERE in backticks more as with and where are common English terms and it makes it more obvious than just capitalization that you mean the keywords.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll make the change.

This means that the planner does not have to fulfill Matches in the order that they occur in the query, and this can sometimes be surprising, especially when the ordering of operations the planner chooses to fulfill these Matches is suboptimal.

Keep in mind also that Where clauses are not independent, and are bound to the preceding Match or With clause.
This means that since multiple Match clauses are evaluated during planning, their Where clauses are likewise available for consideration.

It is also important to understand that a With clause alone also cannot prevent the planner from considering Match patterns after it.

This article discusses the reasons for this behavior, offers examples of when this behavior can become problematic, and provides techniques you can use to force the planner to fulfill separate Match clauses in the order they occur in your query.

== In a query plan there is no Match, there is only Expand

While the Cypher query describes "what" you want to find, the execution plan operators that make up a query plan are "how" Neo4j will fulfill the query.

If you review https://neo4j.com/docs/cypher-manual/current/execution-plans/operator-summary/[query plan operators in the docs], you may note that they do not correspond one-to-one with Cypher clauses.
There is no Match operator.
Instead, to fulfill anchoring and expansion, there are several kinds of lookup operators (AllNodesScan, NodeByLabelScan, NodeIndexSeek, and others), and several kinds of expand operators.

As such, this particular aspect of query planning involves the analysis of the Match patterns in a query (not Match pattern by Match pattern, but connected patterns across multiple Matches),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep the one sentence per line, to make it easier later to track edits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the changes, thanks!

and breaking those down into these various smaller operations.
The ordering of those smaller operations is not constrained by the ordering of the Match patterns that these operations fulfill.
Instead, the planner will attempt to use what it knows of the graph via metadata and counts data in order to select an optimal plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via schema and index metadata

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


In some cases, this works well, as the planner is able to see the larger pattern across Matches and consider other options that may be better for lookup and anchoring.

In other cases, the metadata available to the planner may not give it sufficient knowledge to choose an optimal plan, and a suboptimal one may result instead.
It is also possible for a plan to be mostly optimal across most of the graph data it can match against, but for there to be exceptional cases, such as supernodes with dense relationships, that end up being severely suboptimal when the query executes across these exceptional areas of the graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with dense -> with many

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last part of the sentence is redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes made.



== Barriers to entry

We cannot directly dictate the ordering of how Matches are evaluated.
We can, however, use some Cypher techniques to introduce barriers in a query that the planner cannot cross when planning how to solve a Match clause.

Consider a graph of users, the tv shows they like, and the country each lives in.
A query for one such path, with all 3 nodes of the path strictly defined, might be this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you mention that only happens if there is an index/constraint on all three? otherwise the one with the index will be picked as starting point.


[source,cypher]
----
MATCH (got:TvShow {name:'Game of Thrones'})<-[:LIKES]-(user:User {id:12345})-[:LIVES_IN]->(country:Country {name:'United States'})
----

We know that countries are supernodes, as there are many users who live in the same country, so anchoring and expanding from country nodes will be expensive.
We also know that tv shows are supernodes, as there are many users who like the same show, so anchoring and expanding from tv show nodes is expensive.

If the metadata available to the planner is not enough to guide it to an efficient plan, and it chooses either the tv show, country, or both for anchoring, then we may be looking for a way to force the planner to choose a more optimal plan.

Copy link
Contributor

@jexp jexp Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be probably good to show such a suboptimal plan?

The below presents two of the best means to influence clause ordering in the query.

=== With clause as a barrier

One attempt to do so might be to break up the single Match into two, and to use a With clause in between in hope that it enforces ordering.

[source,cypher]
----
MATCH (user:User {id:12345})-[:LIVES_IN]->(country:Country {name:'United States'})
WITH user, country
MATCH (got:TvShow {name:'Game of Thrones'})<-[:LIKES]-(user)
----

However, the With clause alone does not apply a barrier to reordering.
This may actually produce the exact same plan.

However, if we introduce a new variable in the With clause, then that DOES introduce a barrier across which the planner cannot consider or reorder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evil hack :)

in this case I'd have used an index hint on user instead
or using join on country + using join on TvShow


[source,cypher]
----
MATCH (user:User {id:12345})-[:LIVES_IN]->(country:Country {name:'United States'})
WITH user, country, 1 as ignored
MATCH (got:TvShow {name:'Game of Thrones'})<-[:LIKES]-(user)
----

The `1 as ignored` as a newly introduced variable is the key here.
This variable is not a part of the pattern in the Match nor derived from any part of that pattern.
The planner is forced to plan fulfilment of the first Match clause first, as it does not know if the variable introduced will influence subsequent operations.


=== Subquery as a barrier (since Neo4j 4.1.x)

Cypher subqueries that follow the `CALL {}` syntax are like per-row foreach operations.
That is, per incoming row to the subquery, the entirety of the subquery's logic will execute for that input row.

That forces a barrier to reordering, as clauses prior the subquery must all execute for that row prior to start of the subquery for that row.
As such, any Match clause prior to the subquery will be planned without consideration to the Match clauses within, or after, the subquery.

Here is one example:

[source,cypher]
----
MATCH (user:User {id:12345})-[:LIVES_IN]->(country:Country {name:'United States'})
CALL {
WITH user, country
MATCH (got:TvShow {name:'Game of Thrones'})<-[:LIKES]-(user)
RETURN got
}
...
----