Spline agent memory footprint for large and complex data processing pipelines #535

juja76 · 2022-11-23T10:37:04Z

juja76
Nov 23, 2022

hi there,

some of our data processing pipelines have extremely complex data transformation flows and operate on the datasets with huge number of input/output dataset attributes.
For example: 10+ input files with 100+ input columns per dataset, having 500+ data transformation operations and 500+ attributes in the resulting dataset.
For this type of pipelines we observe that the memory usage in the case when the pipelines are running with the lineage enabled requires additional 2Gb of memory compared to the execution of exactly the same pipeline but with the lineage disabled.
Our dispatcher is using method def send(executionPlan: ExecutionPlan): Unit to serialize the data and persist it in the store.
And I'm just wondering if we can optimise the memory usage for the Spline agent under those circumstances I have described above?

Thank you in advance,
Alexander

wajda · 2022-11-24T12:34:55Z

wajda
Nov 24, 2022
Maintainer

Is it a peak usage or accumulated amount? The agent traverses the Spark logical plan tree and builds its own lineage data structure. It of course requires some additional memory. Whether 2Gb is a lot or not really in the given use-case, I cannot tell. There of course might be room for optimization, but we need to have a reproducible scenario at hands to profile the agent.
Can you help us with profiling?

1 reply

juja76 Nov 24, 2022
Author

yes, I'll be back ...

wajda · 2022-11-24T12:38:24Z

wajda
Nov 24, 2022
Maintainer

I've created an issue for this, we can continue discussion there - #537

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spline agent memory footprint for large and complex data processing pipelines #535

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Spline agent memory footprint for large and complex data processing pipelines #535

juja76 Nov 23, 2022

Replies: 2 comments · 1 reply

wajda Nov 24, 2022 Maintainer

juja76 Nov 24, 2022 Author

wajda Nov 24, 2022 Maintainer

juja76
Nov 23, 2022

Replies: 2 comments 1 reply

wajda
Nov 24, 2022
Maintainer

juja76 Nov 24, 2022
Author

wajda
Nov 24, 2022
Maintainer