Replies: 2 comments 1 reply
-
Is it a peak usage or accumulated amount? The agent traverses the Spark logical plan tree and builds its own lineage data structure. It of course requires some additional memory. Whether 2Gb is a lot or not really in the given use-case, I cannot tell. There of course might be room for optimization, but we need to have a reproducible scenario at hands to profile the agent. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I've created an issue for this, we can continue discussion there - #537 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hi there,
some of our data processing pipelines have extremely complex data transformation flows and operate on the datasets with huge number of input/output dataset attributes.
For example: 10+ input files with 100+ input columns per dataset, having 500+ data transformation operations and 500+ attributes in the resulting dataset.
For this type of pipelines we observe that the memory usage in the case when the pipelines are running with the lineage enabled requires additional 2Gb of memory compared to the execution of exactly the same pipeline but with the lineage disabled.
Our dispatcher is using method
def send(executionPlan: ExecutionPlan): Unit
to serialize the data and persist it in the store.And I'm just wondering if we can optimise the memory usage for the Spline agent under those circumstances I have described above?
Thank you in advance,
Alexander
Beta Was this translation helpful? Give feedback.
All reactions