Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render large JSON files in a memory-efficient way #7

Open
bentsherman opened this issue Sep 28, 2023 · 1 comment
Open

Render large JSON files in a memory-efficient way #7

bentsherman opened this issue Sep 28, 2023 · 1 comment

Comments

@bentsherman
Copy link
Member

The usual pattern to render a JSON file is to create the equivalent data structure in Groovy code, render it to a JSON string, and write the entire string to a file. For large runs with thousands of tasks, the JSON string could be quite large and cause Nextflow to run out of memory.

First we need to evaluate whether this is actually a real problem. Do some large runs and see how large the resulting prov reports are. If they get into the 100 MB - 1 GB range, then we should probably optimize the rendering code.

The memory-efficient approach is to write the JSON output directly and save it to the file in pieces, so that we never have to allocate the entire report in memory and the memory usage does not increase with the number of tasks / outputs.

@bentsherman
Copy link
Member Author

bentsherman commented Sep 28, 2023

rnaseq-nf BCO is ~7 KB

nf-core/rnaseq (-profile test) BCO is ~423 KB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant