New article on forcing direction of an expansion #179

InverseFalcon · 2023-02-23T18:59:18Z

No description provided.

jexp · 2023-04-21T18:01:32Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+If only one of these nodes is chosen as an anchor node, the expansion will start there, expand the relationship to the other node, and then filter the other node's label and properties to find the matches.
+
+If both nodes are looked up via index, still only one will be the anchor node that we expand from, so the expansion process should remain about the same, but filtering will be more efficient, only having to filter on the node's internal graph id to see if the other node is the same one we matched earlier.


I would probably rewrite this to something along the lines of:

If both nodes are looked up from an index, an "expand-into" operation is happening.
Expansion will be more efficient if the node with the smaller degree is chosen.

(At runtime the degree of both nodes will be taken into account)

I guess the trickier case if it only picks one index and the wrong side (dense node).
Then you need to force it with using index or using join on (the other node)

jexp · 2023-04-21T18:02:18Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+If both nodes are looked up via index, still only one will be the anchor node that we expand from, so the expansion process should remain about the same, but filtering will be more efficient, only having to filter on the node's internal graph id to see if the other node is the same one we matched earlier.
+
+It should be clear that there are two possible ways to expand the pattern in this case, and that one is going to be far more efficient than the other.


Actually 4

left + expand

right + expand

both + join left

both + join right

jexp · 2023-04-21T18:02:52Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+=== The costly direction
+
+If we anchored on the country node, we would have to expand to every user in the United States, then filter all of them to find the ones that match the pattern.
+This would result in a ton of expanded rows, and a lot of filtering work where we would likely be throwing out every single row, except one.


a ton -> a lot or "many"

jexp · 2023-04-21T18:04:15Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+If we anchored on the country node, we would have to expand to every user in the United States, then filter all of them to find the ones that match the pattern.
+This would result in a ton of expanded rows, and a lot of filtering work where we would likely be throwing out every single row, except one.
+
+Even if we find that user early in the filtering, nothing in the query tells it to stop looking (no `LIMIT 1` present), and of course without a unique constraint nothing prevents a node with the same properties from being in the graph (or multiple relationship going to the same node), so it will keep on matching and filtering and throwing out all other non-matching results.


really important aspect and even with the unique constraint (if it's not used) it will not take it into account and filter all the remaining rows too.

probably highlight with a NOTE or such.

jexp · 2023-04-21T18:06:19Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+There are other ways the planner can decide to plan a query, some of them being cheaper with a small number of nodes and relationships, and some far more costly.
+
+A NodeHashJoin, for example (when we expand from two different anchor nodes to a common node in the middle of the pattern), might be very quick when the number of anchor nodes matched is low, and when we are expanding only a few relationships from the anchor nodes.


probably simplify the sentence by coming from the two nodes then saying that depending on the side it would check against an already found node in a set (that's why hash-join)

not sure if you want to put JOINs into a separate KB and link to it?

jexp · 2023-04-21T18:06:43Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+There are other ways the planner can decide to plan a query, some of them being cheaper with a small number of nodes and relationships, and some far more costly.
+
+A NodeHashJoin, for example (when we expand from two different anchor nodes to a common node in the middle of the pattern), might be very quick when the number of anchor nodes matched is low, and when we are expanding only a few relationships from the anchor nodes.
+But this can be very expensive if we're traversing a ton of relationships, becoming a hinderance to query execution.


that's why you can chose which side to join on with the index hint.

jexp · 2023-04-21T18:06:59Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+=== The cheap direction
+
+If we anchored to the user node, if we assume that each user only has one `:LIVES_IN` relationship, we would only have to expand on that single relationship and filter on that one connected node to see if the user really does live in the United States.


you didn't say how to force the "cheap" direction :)

jexp · 2023-04-21T18:07:57Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+A NodeHashJoin, for example (when we expand from two different anchor nodes to a common node in the middle of the pattern), might be very quick when the number of anchor nodes matched is low, and when we are expanding only a few relationships from the anchor nodes.
+But this can be very expensive if we're traversing a ton of relationships, becoming a hinderance to query execution.
+
+In the case of more complicated queries, there may be quite a few different nodes that could potentially be used as anchor nodes, with many possibilities on which one or which combination to anchor on, and how to expand to fulfill the desired patterns.


point out to build the query step by step and look at the plan for the number of rows produced from an expansion
(from either side) to select the better one (and with the knowledge of the model ofc)

jexp · 2023-04-21T18:08:46Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+In the case of more complicated queries, there may be quite a few different nodes that could potentially be used as anchor nodes, with many possibilities on which one or which combination to anchor on, and how to expand to fulfill the desired patterns.
+
+In any case, it is possible for the planner to make a bad choice, either because the approach isn't universally efficient across all data in your graph (some nodes may be supernodes, and cause the query over them to choke) or the metadata available to the planner is insufficient to warn it away from these more expensive expansions.


probably have a separate highlighted statement that says "don't query across or from supernodes with many relationships only against them, i.e. from the other side"

jexp · 2023-04-21T18:09:09Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+In any case, it is possible for the planner to make a bad choice, either because the approach isn't universally efficient across all data in your graph (some nodes may be supernodes, and cause the query over them to choke) or the metadata available to the planner is insufficient to warn it away from these more expensive expansions.
+
+In these cases, what we as humans know about the general shape of the graph may be greater than what can be inferred via metadata. Remember that walking the actual graph data is not possible here, since we're talking about query planning, which precedes execution.


really a shame that we have no histograms

actually ExpandInto does a runtime check for degrees and picks the smaller non-dense or smaller side to expand from :)

jexp · 2023-04-21T18:12:07Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+=== Using hash joins to force expansion to a supernode
+
+Remember that supernodes are really only problematic when expanding through, or expanding away from, but depending on your graph data it may be just fine if you are only expanding to them.


highlight this with a NOTE:

jexp · 2023-04-21T18:12:37Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+Remember that supernodes are really only problematic when expanding through, or expanding away from, but depending on your graph data it may be just fine if you are only expanding to them.
+
+
+In the case where you have multiple efficient anchor nodes, and a known or potential supernode in the middle of the pattern, and you know the expansion TO the supernode from both sides is cheap, you can use a join hint to force expanding to the super node.


I would still show an example.

jexp · 2023-04-21T18:13:29Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+That is, the planner prefers index lookups to find anchor nodes, though it may still use label scans to find anchor nodes (especially when there are no opportunities for index lookups in the query).
+
+Neither index lookups or label scans are possible when the label isn't present in the pattern (or otherwise described in a Where clause, like `WHERE c:Country`), and these two means of lookup are the two most common for finding anchor nodes in the graph.
+Of course, you may need to have that label in the query for the sake of correctness, otherwise the wrong nodes and patterns might be matched.


But that can be accomodated with more specific relationship-types too.

jexp · 2023-04-21T18:14:21Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+[source,cypher]
+----
+MATCH (user:User {id:12345})-[:LIVES_IN]->(country {name:'United States'})


This is really ouch, not sure you want to show this.
in this case not checking the label should be fine as the rel-type is specific enough.

jexp · 2023-04-21T18:15:54Z

articles/modules/ROOT/pages/how-to-force-direction-of-expansion.adoc

+
+Even though we have matched to both end nodes, and they are both potential anchors, in the second Match  it is clear that the `user` node is the only one we can expand from; the planner is only aware that `c` is an unlabeled node and not a candidate for an anchor node.
+
+The filtering that the `c` node we expand to must be the same as the `country` anchor node is something the planner can only consider after the expansion is finished, so there is no oppportunity for it to use `country` as an anchor for expansion.


the planner usually does a hash join here checking the expanded end node against the set of "country" nodes.

jexp

Harsh but true

Create how-to-force-direction-of-expansion.adoc

6156066

jexp reviewed Apr 21, 2023

View reviewed changes

jexp approved these changes Apr 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New article on forcing direction of an expansion #179

New article on forcing direction of an expansion #179

InverseFalcon commented Feb 23, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp left a comment


		If only one of these nodes is chosen as an anchor node, the expansion will start there, expand the relationship to the other node, and then filter the other node's label and properties to find the matches.

		If both nodes are looked up via index, still only one will be the anchor node that we expand from, so the expansion process should remain about the same, but filtering will be more efficient, only having to filter on the node's internal graph id to see if the other node is the same one we matched earlier.


		If both nodes are looked up via index, still only one will be the anchor node that we expand from, so the expansion process should remain about the same, but filtering will be more efficient, only having to filter on the node's internal graph id to see if the other node is the same one we matched earlier.

		It should be clear that there are two possible ways to expand the pattern in this case, and that one is going to be far more efficient than the other.


		There are other ways the planner can decide to plan a query, some of them being cheaper with a small number of nodes and relationships, and some far more costly.

		A NodeHashJoin, for example (when we expand from two different anchor nodes to a common node in the middle of the pattern), might be very quick when the number of anchor nodes matched is low, and when we are expanding only a few relationships from the anchor nodes.


		=== The cheap direction

		If we anchored to the user node, if we assume that each user only has one `:LIVES_IN` relationship, we would only have to expand on that single relationship and filter on that one connected node to see if the user really does live in the United States.


		In the case of more complicated queries, there may be quite a few different nodes that could potentially be used as anchor nodes, with many possibilities on which one or which combination to anchor on, and how to expand to fulfill the desired patterns.

		In any case, it is possible for the planner to make a bad choice, either because the approach isn't universally efficient across all data in your graph (some nodes may be supernodes, and cause the query over them to choke) or the metadata available to the planner is insufficient to warn it away from these more expensive expansions.


		In any case, it is possible for the planner to make a bad choice, either because the approach isn't universally efficient across all data in your graph (some nodes may be supernodes, and cause the query over them to choke) or the metadata available to the planner is insufficient to warn it away from these more expensive expansions.

		In these cases, what we as humans know about the general shape of the graph may be greater than what can be inferred via metadata. Remember that walking the actual graph data is not possible here, since we're talking about query planning, which precedes execution.


		=== Using hash joins to force expansion to a supernode

		Remember that supernodes are really only problematic when expanding through, or expanding away from, but depending on your graph data it may be just fine if you are only expanding to them.

		Remember that supernodes are really only problematic when expanding through, or expanding away from, but depending on your graph data it may be just fine if you are only expanding to them.


		In the case where you have multiple efficient anchor nodes, and a known or potential supernode in the middle of the pattern, and you know the expansion TO the supernode from both sides is cheap, you can use a join hint to force expanding to the super node.


		Even though we have matched to both end nodes, and they are both potential anchors, in the second Match it is clear that the `user` node is the only one we can expand from; the planner is only aware that `c` is an unlabeled node and not a candidate for an anchor node.

		The filtering that the `c` node we expand to must be the same as the `country` anchor node is something the planner can only consider after the expansion is finished, so there is no oppportunity for it to use `country` as an anchor for expansion.

New article on forcing direction of an expansion #179

Are you sure you want to change the base?

New article on forcing direction of an expansion #179

Conversation

InverseFalcon commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jexp left a comment

Choose a reason for hiding this comment