add default compression to write_dataframe function to compress dl2 #1165

vuillaut · 2023-09-18T16:28:15Z

codecov · 2023-09-18T16:44:54Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (bb39662) 73.97% compared to head (c3cc978) 73.97%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1165   +/-   ##
=======================================
  Coverage   73.97%   73.97%           
=======================================
  Files         124      124           
  Lines       12647    12647           
=======================================
  Hits         9356     9356           
  Misses       3291     3291

Files	Coverage Δ
lstchain/io/io.py	`77.82% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

moralejo · 2023-09-19T16:58:27Z

Well spotted... indeed DL2 are really bulkier than the corresponding DL1b, funny we have not noticed.
Do you know (or can test) how much the default compression level impacts reading speed?

vuillaut · 2023-09-20T10:06:59Z

Well spotted... indeed DL2 are really bulkier than the corresponding DL1b, funny we have not noticed. Do you know (or can test) how much the default compression level impacts reading speed?

Hi @moralejo

I just did the test and here are the results.
From these, I would actually advocate for a default compression level = 1, the impact on file size and reading time being marginal, but the writing time increases gradually.
What do you think?

Compression level	File size (MB)	Write time (s)	Read time (s)
0	406.6	4.6	1.2
1	255.7	11.0	3.8
2	254.8	11.3	3.9
3	254.1	12.2	3.9
4	252.9	13.1	3.9
5	252.5	14.8	4.0
6	251.9	17.6	4.3
7	251.8	20.5	4.2
8	251.7	33.1	4.5
9	251.7	41.8	4.4

maxnoe · 2023-09-20T11:01:40Z

@vuillaut Did you test that locally or on the cluster? Because I'd suspect that write and read speed actually would go up when writing much less data to the slowish network file system on the cluster.

…into compress_dl2

vuillaut · 2023-09-20T12:28:53Z

@vuillaut Did you test that locally or on the cluster? Because I'd suspect that write and read speed actually would go up when writing much less data to the slowish network file system on the cluster.

Indeed, I tested locally on my laptop, let me run this on the cluster.

vuillaut · 2023-09-20T13:35:13Z

I updated the table in my previous comment with numbers from the test at la palma

morcuended · 2023-09-26T12:29:52Z

+1 to setting the compression level to 1 by default

vuillaut · 2023-09-28T14:46:14Z

I have set default complevel=1 and redone the test with bloc:zstd as complib.

Compression level	File size (MB)	Write time (s)	Read time (s)
0	406.70	4.27	1.38
1	268.55	4.27	1.90
2	263.42	6.13	1.70
3	262.20	7.56	1.73
4	260.82	9.55	1.70
5	260.27	12.03	1.67
6	260.08	25.49	1.81
7	259.62	38.49	1.72
8	257.56	51.92	1.71
9	257.47	90.88	1.77

(I don't think the read time are too relevant, they probably fluctuate too much with cluster usage)

I think this is ready for review.

morcuended

thanks. I left some comments

lstchain/io/io.py

morcuended

looks good to me

…aframes. update complevel=1 for all"

vuillaut · 2023-10-03T15:28:48Z

Hey @morcuended
Sorry I did not see you review!
Your comments made me think that maybe we never tried the different comp levels, so I made the test for images as well (the impact for other data is less important IMO).

Compression level	File size (MB)	Write time (s)	Read time (s)
0	939.11	11.30	0.96
1	483.43	4.58	1.60
2	482.69	5.69	1.65
3	482.31	8.29	1.64
4	481.49	8.29	1.59
5	481.54	7.82	1.52
6	481.11	14.99	1.69
7	481.29	25.21	1.51
8	481.85	40.10	1.58
9	481.06	216.87	1.72

I'd say the conclusion is the same as for parameters, so I did change the complevel to 1 as default for everyone.

I also simplified the write_dataframe function to use ctapipe write_table.

Could you review again, please?

lstchain/io/io.py

…into compress_dl2

vuillaut · 2023-10-05T14:11:43Z

Thanks Daniel!

moralejo · 2023-10-10T10:30:09Z

Apologies for missed review and thanks for this @vuillaut !

add default compression to write_dataframe function

029cc08

Merge branch 'main' of https://github.com/cta-observatory/cta-lstchain …

1281b1b

…into compress_dl2

use complevel=1

f62a2c3

vuillaut requested review from morcuended, moralejo and maxnoe September 28, 2023 14:46

morcuended requested changes Sep 28, 2023

View reviewed changes

lstchain/io/io.py Outdated Show resolved Hide resolved

lstchain/io/io.py Outdated Show resolved Hide resolved

improve docstring

625fe30

morcuended previously approved these changes Oct 3, 2023

View reviewed changes

use HDF5_ZSTD_FILTERS as filters and ctapipe write_table to write dat…

7883cd2

…aframes. update complevel=1 for all"

vuillaut dismissed morcuended’s stale review via 7883cd2 October 3, 2023 15:18

comment on complevel

fa84f2e

morcuended reviewed Oct 3, 2023

View reviewed changes

lstchain/io/io.py Outdated Show resolved Hide resolved

vuillaut added 4 commits October 4, 2023 09:26

revert changes in write dataframe

13bc043

rm write_table

466d52f

Merge branch 'main' of https://github.com/cta-observatory/cta-lstchain …

8cd8ebf

…into compress_dl2

fix docs warning about StrDict and ClassesType

c3cc978

morcuended approved these changes Oct 5, 2023

View reviewed changes

vuillaut merged commit 675b245 into main Oct 5, 2023

vuillaut deleted the compress_dl2 branch October 5, 2023 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add default compression to write_dataframe function to compress dl2 #1165

add default compression to write_dataframe function to compress dl2 #1165

vuillaut commented Sep 18, 2023

codecov bot commented Sep 18, 2023 •

edited

Loading

moralejo commented Sep 19, 2023

vuillaut commented Sep 20, 2023 •

edited

Loading

maxnoe commented Sep 20, 2023

vuillaut commented Sep 20, 2023

vuillaut commented Sep 20, 2023 •

edited

Loading

morcuended commented Sep 26, 2023

vuillaut commented Sep 28, 2023

morcuended left a comment

morcuended left a comment

vuillaut commented Oct 3, 2023

vuillaut commented Oct 5, 2023

moralejo commented Oct 10, 2023

add default compression to write_dataframe function to compress dl2 #1165

add default compression to write_dataframe function to compress dl2 #1165

Conversation

vuillaut commented Sep 18, 2023

codecov bot commented Sep 18, 2023 • edited Loading

Codecov Report

moralejo commented Sep 19, 2023

vuillaut commented Sep 20, 2023 • edited Loading

maxnoe commented Sep 20, 2023

vuillaut commented Sep 20, 2023

vuillaut commented Sep 20, 2023 • edited Loading

morcuended commented Sep 26, 2023

vuillaut commented Sep 28, 2023

morcuended left a comment

Choose a reason for hiding this comment

morcuended left a comment

Choose a reason for hiding this comment

vuillaut commented Oct 3, 2023

vuillaut commented Oct 5, 2023

moralejo commented Oct 10, 2023

codecov bot commented Sep 18, 2023 •

edited

Loading

vuillaut commented Sep 20, 2023 •

edited

Loading

vuillaut commented Sep 20, 2023 •

edited

Loading