Create using-subqueries-to-avoid-the-eager.adoc #178

InverseFalcon · 2023-02-23T18:45:04Z

No description provided.

jexp · 2023-04-21T18:17:19Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+:tags: cypher, performance, load-csv
+:category: cypher
+
+Eager operators in a query plan can be disruptive, especially when performing writes involving large amounts of data, or batch loading.


note that Eager aggregation is something different than the write horizon eager.

jexp · 2023-04-21T18:17:59Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+Eager operators in a query plan can be disruptive, especially when performing writes involving large amounts of data, or batch loading.
+
+If you've used `USING PERIODIC COMMIT LOAD CSV` to import data into Neo4j, it's likely at some point that you've been bitten by the Eager:


using periodic commit is no longer supported in 5.x afaik, you might want to mention that?

btw. you also get Eager with call in transactions ...

jexp · 2023-04-21T18:19:58Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+Eager operators in a query plan can be disruptive, especially when performing writes involving large amounts of data, or batch loading.
+
+If you've used `USING PERIODIC COMMIT LOAD CSV` to import data into Neo4j, it's likely at some point that you've been bitten by the Eager:
+Some operations require eagerly pulling in interim results for all rows, which effectively disables the `PERIODIC COMMIT` behavior, possibly causing you to go out of memory when running on a large input dataset.


explain why it's necessary (to separate the read and write horizon -> i.e. do all writes first or do all reads first)
that's why it has to pull all data from the top through the operation, because that effectively will create a "materialized" horizon

(you don't need to mention that neo4j doesn't support MVCC)

jexp · 2023-04-21T18:20:20Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+Some operations require eagerly pulling in interim results for all rows, which effectively disables the `PERIODIC COMMIT` behavior, possibly causing you to go out of memory when running on a large input dataset.
+
+The culprit, in an EXPLAIN query plan, is usually the Eager operator, with a dark blue header.
+These are not just "monkeywrench operators" meant to disrupt your query, there are valid reasons these operators exist; they maintain the Cypher semantics which aim to minimize the effect of row order affecting processing and results.


explanation is a bit too vague

jexp · 2023-04-21T18:20:47Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+In most cases, the Eager operator cannot be removed from the query plan entirely, but with subqueries its effects can be scoped such that they are no longer disruptive or have an impact on heap and memory.
+This article provides some ways to minimize eager behavior by scoping their effect to local per-row executions with subqueries.
+
+NOTE: We won't be talking about EagerAggregations here, which result from aggregation functions like count() and collect().


I would pull that to the front

jexp · 2023-04-21T18:21:20Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+This article provides some ways to minimize eager behavior by scoping their effect to local per-row executions with subqueries.
+
+NOTE: We won't be talking about EagerAggregations here, which result from aggregation functions like count() and collect().
+https://support.neo4j.com/hc/en-us/articles/4403024564243-Using-Subqueries-to-Control-the-Scope-of-Aggregations[We have a separate article for those here], but they can be similarly scoped via subqueries to avoid adding pressure to heap memory.


change the link target text to something that google SEO can use

jexp · 2023-04-21T18:23:17Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+== Understanding Eager, and why it exists
+
+What does the Eager operator mean? From the perspective of those writing the query, behavior-wise it means that your query is going to be processed differently than you may expect, operation-by-operation for all rows at a time with each step.
+As a side effect this may be memory-intensive, as all input rows and intermediate rows are processed all at once, and holding onto massive sets of interim results can cause high GC pauses, maybe even out of memory errors, and definitely prevents you from doing batch commiting as you intended to when `USING PERIODIC COMMIT`.


don't refer to USING PERIODIC COMMIT, just say "batch processing" or similar, that's more version independent

jexp · 2023-04-21T18:25:35Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+As a side effect this may be memory-intensive, as all input rows and intermediate rows are processed all at once, and holding onto massive sets of interim results can cause high GC pauses, maybe even out of memory errors, and definitely prevents you from doing batch commiting as you intended to when `USING PERIODIC COMMIT`.
+
+Batch commits are not compatible with Eager behavior, as they require lazy row-by-row semantics for correct operation.
+While Cypher does not stop you from attempting batch commit operations when there is an Eager in the plan, they will not commit in batches, and you may encounter the above mentioned issues around GC pauses and heap problems.


I think there should actually a warning in Cypher that points that out. Not sure if there is. I think it used to be with the old planner.

jexp · 2023-04-21T18:26:28Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+Why does this happen?
+
+Cypher semantics demand that to the greatest extent possible, operations from later in the query should not influence the results of operations from earlier in the query.
+For example, a MERGE that appears later in the query should not influence a MATCH that shows up earlier in the query (on nodes of the same label).


Change MERGE to CREATE
"Otherwise you could end up with infinite loops where the newly created data is matched again and will lead to more data being created"

jexp · 2023-04-21T18:28:43Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+Cypher planning would ordinarily try for row-by-row processing, so the entire remaining query would execute for each input row.
+Because of that, a MERGE from later in the query, applying to an earlier row of input, would happen before processing of a later row of input, that has yet to execute its MATCH operation.
+
+If that MERGE from later in the query could affact the results of a MATCH earlier in the query, then that violates the above mentioned Cypher semantics, and so Eager is planned to preserve them.


And then it could produce infinite loops or at least affect "perceived already executed" operations.

jexp · 2023-04-21T18:32:20Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+If that MERGE from later in the query could affact the results of a MATCH earlier in the query, then that violates the above mentioned Cypher semantics, and so Eager is planned to preserve them.
+This causes the change in execution behavior, so instead of lazy row-by-row processing, all rows are processed operation-by-operation.
+
+If the input size is too large, either from the very start, or building  up over the course of execution, since all interim results must be built up and held in memory at once for all rows, this could easily exceed the bounds of the heap and cause out of memory errors, thus the problem of Eager.


Probably mention - if your neo4j instance is configured with transaction memory limits then the query will be aborted. If that's not the case the server might run into memory allocation errors.

jexp · 2023-04-21T18:33:18Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+We have a blog entry by Jennifer Reif discussing Eager and its effects in more detail here:
+
+https://community.neo4j.com/t5/general-discussions/cypher-sleuthing-the-eager-operator/m-p/50596


that link is broken better use the original article:
https://medium.com/neo4j/cypher-sleuthing-the-eager-operator-84a64d91a452

jexp · 2023-04-21T18:34:16Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+* MATCH (regular or OPTIONAL) and CREATE clauses (in any ordering) on the same node labels
+* MATCH (regular or OPTIONAL) and MERGE clauses (in any ordering) on the same node labels
+* CREATE and MERGE clauses (in any ordering) on the same node labels
+* Multiple MERGE clauses on the same labels


which can happen if you load a mono-partite graph like (:User)-[:FOLLOWS]->(:User)

jexp · 2023-04-21T18:35:20Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+:tags: cypher, performance, load-csv
+:category: cypher
+
+Eager operators in a query plan can be disruptive, especially when performing writes involving large amounts of data, or batch loading.


Perhaps mention somewhere at the beginning that the Cypher planner is sometimes over-eager to insert Eager operation (pun intended) as it rather wants to be safe than sorry.

jexp · 2023-04-21T18:49:47Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+Remember that in Cypher, operators produce rows, and execute per row.
+That's why the UNWIND is important for the Eager to show up, as the planner infers that there are multiple rows for which the MATCH will be called (not just once),
+so the execution of the MATCH when processing a later row could be influenced by the MERGE being performed when processing an earlier row.


sentence per line

jexp · 2023-04-21T18:52:39Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+That's why the UNWIND is important for the Eager to show up, as the planner infers that there are multiple rows for which the MATCH will be called (not just once),
+so the execution of the MATCH when processing a later row could be influenced by the MERGE being performed when processing an earlier row.
+
+The same thing would happen if we derived the id from a LOAD CSV, with the difference that we might be ingesting from a massive file, in which case the Eager behavior would be much more impactful on memory.


or apoc.load procedures or large matches, e.g. in graph refactorings

jexp · 2023-04-21T18:52:58Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+We can see the Eager operator in the resulting query plan here:
+
+image::https://i.imgur.com/7cCwf9x.jpeg[]


probably better to upload to the CDN?

jexp · 2023-04-21T18:54:02Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+== Subqueries enforce per-row processing
+
+To review, a subquery means per input row, the subquery will execute in full.
+The planner has no ability to insert an Eager between separate per-row executions of a single subquery.


but there might be an eager introduce before the subquery.

see the screenshot I sent you for:

MATCH (n)-[r]->() CALL { WITH r DELETE r } IN TRANSACTIONS OF 1 ROWS```

jexp · 2023-04-21T18:57:54Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+  MATCH (n:Node {id: id})
+  MERGE (x:Node {id: id + 1})
+  RETURN true as done
+}


Now you can apply batches here too

jexp · 2023-04-21T18:58:46Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+* The subsequent rows from the UNWIND will execute in a similar manner from the subquery.
+* As a result of the subquery scoping the Eager and enforcing per-row execution here, a single run of this query will produce 5 new nodes, for a total of 6, with ids of 1 through 6.
+
+While this has changed behavior such that it won't pressure the heap, and will again allow sane batch processing, it is important to note that the query results changed!


statement results

jexp · 2023-04-21T19:05:20Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+The difference is in where the Eager occurs in the plan.
+In this case, the Eager is on the right-hand side of an Apply operator.
+
+The Apply operator means: for each input row from the left side, do all the stuff on the right side of the operator.


do all the operations on the right hand side of apply.

jexp · 2023-04-21T19:05:52Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+Subqueries generate Apply operators, so this plan just confirms that the Eager is scoped to an individual subquery execution, and won't alter behavior outside of the subquery.
+
+When managing eager behavior, this kind of plan is what you're looking for to confirm that the Eager is scoped behind an Apply,


sentence per line

jexp · 2023-04-21T19:06:57Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+== Nested subqueries for additional scoping
+
+For a more complex query, a single subquery usage may not be enough to properly reign in the eager behavior.


this feels like the hacks people do with cache-line padding where they create one subclass with one pad to avoid the CPU/JVM reordering fields

jexp · 2023-04-21T19:07:27Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+For a more complex query, a single subquery usage may not be enough to properly reign in the eager behavior.
+
+That is, when the Eager is scoped behind a subquery, it means each individual subquery execution behaves eagerly, and that's usually enough to make the impact minimal.
+But when an individual subquery execution can generate a ton of rows (such as additional MATCHes) such that the Eager still retains a negative impact,


a ton -> a lot

jexp · 2023-04-21T19:09:02Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+More on usage of aggregations and subqueries can be found here:
+
+https://support.neo4j.com/hc/en-us/articles/4403024564243-Using-Subqueries-to-Control-the-Scope-of-Aggregations


perhaps point to the public KB instead?

jexp · 2023-04-21T19:09:12Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+== Using APOC procs as subqueries
+
+If you aren't running Neo4j 4.1 or higher, you can make use of some procs in APOC to act as subqueries for a similar effect.


procs -> procedures

jexp · 2023-04-21T19:09:32Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+MERGE (e)-[r:DEDICATED_TO]->(c)
+----
+
+In this one, we conditionally add the :Customer label to the :Employee node.


backticks for the labels

jexp · 2023-04-21T19:10:26Z

articles/modules/ROOT/pages/using-subqueries-to-avoid-the-eager.adoc

+
+Just like before, isolating the scope with a subquery prevents the planner from adding the Eager, it vanishes from the query plan.
+
+Be aware, however, that usage of APOC procedures that execute a dynamic query like this require overhead to parse, compile, and execute the query, a cost that you do not have to pay when using native Cypher subqueries.


and there can be a potential of Cypher injection when executing subqueries as strings

jexp

see my comments

Create using-subqueries-to-avoid-the-eager.adoc

3caf67d

jexp reviewed Apr 21, 2023

View reviewed changes

jexp approved these changes Apr 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create using-subqueries-to-avoid-the-eager.adoc #178

Create using-subqueries-to-avoid-the-eager.adoc #178

InverseFalcon commented Feb 23, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp Apr 21, 2023

jexp left a comment


		Eager operators in a query plan can be disruptive, especially when performing writes involving large amounts of data, or batch loading.

		If you've used `USING PERIODIC COMMIT LOAD CSV` to import data into Neo4j, it's likely at some point that you've been bitten by the Eager:


		We have a blog entry by Jennifer Reif discussing Eager and its effects in more detail here:

		https://community.neo4j.com/t5/general-discussions/cypher-sleuthing-the-eager-operator/m-p/50596


		We can see the Eager operator in the resulting query plan here:

		image::https://i.imgur.com/7cCwf9x.jpeg[]


		Subqueries generate Apply operators, so this plan just confirms that the Eager is scoped to an individual subquery execution, and won't alter behavior outside of the subquery.

		When managing eager behavior, this kind of plan is what you're looking for to confirm that the Eager is scoped behind an Apply,


		== Nested subqueries for additional scoping

		For a more complex query, a single subquery usage may not be enough to properly reign in the eager behavior.


		More on usage of aggregations and subqueries can be found here:

		https://support.neo4j.com/hc/en-us/articles/4403024564243-Using-Subqueries-to-Control-the-Scope-of-Aggregations


		== Using APOC procs as subqueries

		If you aren't running Neo4j 4.1 or higher, you can make use of some procs in APOC to act as subqueries for a similar effect.


		Just like before, isolating the scope with a subquery prevents the planner from adding the Eager, it vanishes from the query plan.

		Be aware, however, that usage of APOC procedures that execute a dynamic query like this require overhead to parse, compile, and execute the query, a cost that you do not have to pay when using native Cypher subqueries.

Create using-subqueries-to-avoid-the-eager.adoc #178

Are you sure you want to change the base?

Create using-subqueries-to-avoid-the-eager.adoc #178

Conversation

InverseFalcon commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jexp left a comment

Choose a reason for hiding this comment