-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ZstdFrameCompressor and ZstdFrameDecompressor for non-streaming API #46
Conversation
Also add ZstdError exception type for more descriptive errors. ZstdError is only implemented for the new compressors. They will be applied to the streaming compressors for a breaking version bump.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #46 +/- ##
==========================================
+ Coverage 33.01% 42.85% +9.84%
==========================================
Files 5 7 +2
Lines 524 637 +113
==========================================
+ Hits 173 273 +100
- Misses 351 364 +13 ☔ View full report in Codecov by Sentry. |
This is meant to support JuliaIO/JLD2.jl#560 which requires the non-streaming compression API where blocks know their decompressed size. |
Could you try to dev this with your JLD2.jl changes, @milankl? |
Could the |
Also, not directly related to this PR, but maybe there should be a new package for supporting non-streaming codecs with a simpler API like https://numcodecs.readthedocs.io/en/stable/abc.html |
After installing your branch here and my branch from JuliaIO/JLD2.jl#560, I get this julia> using JLD2, CodecZstd, HDF5, H5Zzstd
│ Package H5Zzstd not found, but a package named H5Zzstd is available from a
│ registry.
│ Install package?
│ (@v1.10) pkg> add H5Zzstd
└ (y/n/o) [y]: y
Resolving package versions...
ERROR: Unsatisfiable requirements detected for package H5Zzstd [f6f2d980]:
H5Zzstd [f6f2d980] log:
├─possible versions are: 0.1.0-0.1.1 or uninstalled
├─restricted to versions * by an explicit requirement, leaving only versions: 0.1.0-0.1.1
└─restricted by compatibility requirements with CodecZstd [6b39b394] to versions: uninstalled — no versions left
└─CodecZstd [6b39b394] log:
├─possible versions are: 0.8.2 or uninstalled
└─CodecZstd [6b39b394] is fixed to version 0.8.2 Not sure I can make sense of this? EDIT: This is because with (@v1.10) pkg> st H5Zzstd
Status `~/.julia/environments/v1.10/Project.toml`
[f6f2d980] H5Zzstd v0.1.1 The CodecZstd version is limited to (@v1.10) pkg> status --outdated
Status `~/.julia/environments/v1.10/Project.toml`
⌅ [6b39b394] CodecZstd v0.7.2 (<v0.8.2): H5Zzstd |
I can test this @nhz2. EDIT: Yes, works as before so doesn't matter whether setting julia> using JLD2, CodecZstd
julia> A = zeros(1000,1000);
julia> A[1] = rand()
0.6101634539467033
julia> save("test_with_zstd_compression.jld2", "A", A, compress=ZstdFrameCompressor())
julia> A == load("test_with_zstd_compression.jld2", "A")
true |
@mkitti this works: julia> using JLD2, CodecZstd
julia> A = zeros(1000,1000);
julia> A[1] = rand()
0.1653492903631878
julia> save("test_with_zstd_compression.jld2", "A", A, compress=ZstdFrameCompressor())
julia> A == load("test_with_zstd_compression.jld2", "A")
true while julia> using HDF5, H5Zzstd
julia> h5open("test_with_zstd_compression.jld2") do h5f
h5f["A"][]
end
1000×1000 Matrix{Float64}:
0.165349 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋮ ⋱ ⋮
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |
At the end of the day, I think the public API for transcoding streams is actually simpler and more flexible. I'm involved over in numcodecs as well actually. Transcoding streams is simpler in that there is only The implementation here is trickier initially, but ultimately but can be manipulated for a lot of circumstances. You are correct that the non-streaming API is basically just a special case of the streaming API. The non-streaming "frame" API assumes
An important consequence of the above is that we can do a single allocation for the result whereas for streaming we allow for multiple allocations to extend the buffers as needed. |
I don't have any opinion on the streaming vs non-streaming API... I just would like to get his here merged so that we can merge JuliaIO/JLD2.jl#560 and have Zstd compression in JLD2 😉 |
Can this be implemented as a new |
error[] = ZstdError(code) | ||
return 0, 0, :error | ||
else | ||
return Int(input.size), Int(code), :end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think process
can return :end
here. Generally when compressing, :end
is only returned if input
is empty and no extra output needs to be written for the current frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are either going to return :end
or :ok
here as in compression.jl:
CodecZstd.jl/src/compression.jl
Line 96 in 4ba1be6
return Δin, Δout, input.size == 0 && code == 0 ? :end : :ok |
Here, unlike the streaming API, we consumed the entire input buffer and have written the entire output by invoking ZSTD_compress2
. There is no continuation. There is no more input to process.
By the completion of ZSTD_compress2
there are two possible outcomes. We have either successfully compressed the data into the output buffer. How else would you describe the state after ZSTD_compress2
runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller of process
is allowed to break up the input data because process
is a streaming API. Those small inputs could be appended to an internal buffer in the codec, and then only after input.size
is set to zero, signaling there are no more bytes in the frame, ZSTD_compress2
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the best way to detect additional bytes added and throw an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to assume there will be additional bytes in the frame until process
is called with input size zero.
There should be a new process2
similar to ZSTD_compressStream2
with an additional endOp
argument, to allow one-shot compression of a frame.
If we pursue this path, we would need to figure out how to modify JLD2.jl as well to work with this. |
Okay, I took a look at JuliaIO/JLD2.jl#560 and why it would be nicer to have a special |
I'm starting to favor the approach in #49 where we might be able to get the streaming API to save the encoded size in the frame by providing We may end using a similar API that exists here. For example, Alternatively, we try the approach in JuliaIO/TranscodingStreams.jl#215. |
Closing in favor of #52 . |
Also add ZstdError exception type for more descriptive errors. ZstdError is only
implemented for the new compressors. They will be applied to the streaming compressors
for a breaking version bump.