Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New article on how to force match ordering #180

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

InverseFalcon
Copy link
Collaborator

No description provided.

Like SQL, Cypher is a declarative query language. This is most evident with Match patterns, which describe what you want to find in the graph.
You do not dictate to it how to find these patterns, the way you might in an imperative programming language.

Not only does the query planner decide how it will fulfill a single Match pattern, it has the ability to consider the entirety of patterns connected by common variables across multiple Match clauses in your query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like the MATCH and WITH and WHERE in backticks more as with and where are common English terms and it makes it more obvious than just capitalization that you mean the keywords.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll make the change.

There is no Match operator.
Instead, to fulfill anchoring and expansion, there are several kinds of lookup operators (AllNodesScan, NodeByLabelScan, NodeIndexSeek, and others), and several kinds of expand operators.

As such, this particular aspect of query planning involves the analysis of the Match patterns in a query (not Match pattern by Match pattern, but connected patterns across multiple Matches),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep the one sentence per line, to make it easier later to track edits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the changes, thanks!

As such, this particular aspect of query planning involves the analysis of the Match patterns in a query (not Match pattern by Match pattern, but connected patterns across multiple Matches),
and breaking those down into these various smaller operations.
The ordering of those smaller operations is not constrained by the ordering of the Match patterns that these operations fulfill.
Instead, the planner will attempt to use what it knows of the graph via metadata and counts data in order to select an optimal plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

via schema and index metadata

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

In some cases, this works well, as the planner is able to see the larger pattern across Matches and consider other options that may be better for lookup and anchoring.

In other cases, the metadata available to the planner may not give it sufficient knowledge to choose an optimal plan, and a suboptimal one may result instead.
It is also possible for a plan to be mostly optimal across most of the graph data it can match against, but for there to be exceptional cases, such as supernodes with dense relationships, that end up being severely suboptimal when the query executes across these exceptional areas of the graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with dense -> with many

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last part of the sentence is redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes made.

We also know that tv shows are supernodes, as there are many users who like the same show, so anchoring and expanding from tv show nodes is expensive.

If the metadata available to the planner is not enough to guide it to an efficient plan, and it chooses either the tv show, country, or both for anchoring, then we may be looking for a way to force the planner to choose a more optimal plan.

Copy link
Contributor

@jexp jexp Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be probably good to show such a suboptimal plan?

We can, however, use some Cypher techniques to introduce barriers in a query that the planner cannot cross when planning how to solve a Match clause.

Consider a graph of users, the tv shows they like, and the country each lives in.
A query for one such path, with all 3 nodes of the path strictly defined, might be this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you mention that only happens if there is an index/constraint on all three? otherwise the one with the index will be picked as starting point.

However, the With clause alone does not apply a barrier to reordering.
This may actually produce the exact same plan.

However, if we introduce a new variable in the With clause, then that DOES introduce a barrier across which the planner cannot consider or reorder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evil hack :)

in this case I'd have used an index hint on user instead
or using join on country + using join on TvShow

Copy link
Contributor

@jexp jexp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it but it will be a bit too abstract for most readers.
Better to add the graph model + some plans?
Perhaps even one supernode would be enough?

Also mention that this happens only if all 3 starting points are equal in their "index selectivity"

Not sure if you want to point to other KB/Manual with index / join hints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants