Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV.write() with append=true allocating a lot of memory #1138

Open
PedroPizarro opened this issue Jul 20, 2024 · 0 comments
Open

CSV.write() with append=true allocating a lot of memory #1138

PedroPizarro opened this issue Jul 20, 2024 · 0 comments

Comments

@PedroPizarro
Copy link

PedroPizarro commented Jul 20, 2024

I'm trying to convert some JSON database files into a data.csv file with all data features.

Currently, I'm reading the data and pushing it (push!()) into a DataFrames object, after all the "pushes" I'm writing the DataFrame object into a CSV file. However, I'm studying the possibility of writing the data directly into the .csv file with csv.write() with the append=true .

In my tests, when using this option with csv.write() the number of allocations increases from 90.204 MiB to 24.205 GiB.

When I run the code with julia --track-allocation=user, it shows that the allocation comes from the csv.write(data, append=true) function call. Does the append load all file content into the RAM, being that the cause?

@time result with push!() using DataFrames, appending the data inside a for loop through the JSON file list:
1.716448 seconds (1.32 M allocations: 90.204 MiB, 0.84% gc time)

@time result with csv.write(), appending the data inside a for loop through the JSON file list
2.116444 seconds (1.53 M allocations: 24.205 GiB, 9.48% gc time)

@time result with Base.write(), appending the data inside a for loop through the JSON file list
1.976700 seconds (1.36 M allocations: 93.373 MiB, 0.67% gc time)

The objective of the change was to reduce the number of allocations, however, there was an increase and I don't understand why.

Code versioning:
DataFrames version: [a93c6f00] DataFrames v1.6.1
CSV version: [336ed68f] CSV v0.10.14
Julia version: Julia Version 1.10.4 Commit 48d4fd48430 (2024-06-04 10:41 UTC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant