You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to convert some JSON database files into a data.csv file with all data features.
Currently, I'm reading the data and pushing it (push!()) into a DataFrames object, after all the "pushes" I'm writing the DataFrame object into a CSV file. However, I'm studying the possibility of writing the data directly into the .csv file with csv.write() with the append=true .
In my tests, when using this option with csv.write() the number of allocations increases from 90.204 MiB to 24.205 GiB.
When I run the code with julia --track-allocation=user, it shows that the allocation comes from the csv.write(data, append=true) function call. Does the append load all file content into the RAM, being that the cause?
@time result with push!() using DataFrames, appending the data inside a for loop through the JSON file list: 1.716448 seconds (1.32 M allocations: 90.204 MiB, 0.84% gc time)
@time result with csv.write(), appending the data inside a for loop through the JSON file list 2.116444 seconds (1.53 M allocations: 24.205 GiB, 9.48% gc time)
@time result with Base.write(), appending the data inside a for loop through the JSON file list 1.976700 seconds (1.36 M allocations: 93.373 MiB, 0.67% gc time)
The objective of the change was to reduce the number of allocations, however, there was an increase and I don't understand why.
Code versioning:
DataFrames version: [a93c6f00] DataFrames v1.6.1
CSV version: [336ed68f] CSV v0.10.14
Julia version: Julia Version 1.10.4 Commit 48d4fd48430 (2024-06-04 10:41 UTC)
The text was updated successfully, but these errors were encountered:
I'm trying to convert some JSON database files into a data.csv file with all data features.
Currently, I'm reading the data and pushing it (
push!()
) into a DataFrames object, after all the "pushes" I'm writing the DataFrame object into a CSV file. However, I'm studying the possibility of writing the data directly into the .csv file withcsv.write()
with theappend=true
.In my tests, when using this option with
csv.write()
the number of allocations increases from 90.204 MiB to 24.205 GiB.When I run the code with
julia --track-allocation=user
, it shows that the allocation comes from thecsv.write(data, append=true)
function call. Does the append load all file content into the RAM, being that the cause?@time
result withpush!()
using DataFrames, appending the data inside a for loop through the JSON file list:1.716448 seconds (1.32 M allocations: 90.204 MiB, 0.84% gc time)
@time
result withcsv.write()
, appending the data inside a for loop through the JSON file list2.116444 seconds (1.53 M allocations: 24.205 GiB, 9.48% gc time)
@time
result withBase.write()
, appending the data inside a for loop through the JSON file list1.976700 seconds (1.36 M allocations: 93.373 MiB, 0.67% gc time)
The objective of the change was to reduce the number of allocations, however, there was an increase and I don't understand why.
Code versioning:
DataFrames version:
[a93c6f00] DataFrames v1.6.1
CSV version:
[336ed68f] CSV v0.10.14
Julia version:
Julia Version 1.10.4 Commit 48d4fd48430 (2024-06-04 10:41 UTC)
The text was updated successfully, but these errors were encountered: