-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-haul and implement a more generic read #652
Conversation
* I actually think the current behavior is more correct that what was done previously
Looks good to me! |
👍
I don't have a strong opinion whether it should a string or character, but I tend to agree with you. Perhaps there is some efficiency lost by it not being a character, when |
src/HDF5.jl
Outdated
@@ -1952,6 +1820,15 @@ function hdf5_to_julia_eltype(objtype) | |||
return T | |||
end | |||
|
|||
function get_jl_type(objtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the point of this method.
The hdf5_to_julia_eltype
method
already has defined class_id == H5T_OPAQUE
Line 1808 in ece0dcd
elseif class_id == H5T_OPAQUE |
So it seems that without this method things should just work already.
Edit: NVM that's for hdf5_to_julia_eltype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this mostly duplicates hdf5_to_julia_eltype
but changes the string handling part so there are no special cases for the length=1 case. I wrote a new function instead of changing the original because I wanted to only touch the read part of the library. There is probably some consolidation/cleanup possible.
src/HDF5.jl
Outdated
@@ -1952,6 +1820,15 @@ function hdf5_to_julia_eltype(objtype) | |||
return T | |||
end | |||
|
|||
function get_jl_type(objtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment that this is temporarily special cased for H5T_OPAQUE
massive improvement in simplicity and readability. |
feel free to merge whenever |
What should we do with the |
Cool, if you are overall happy with this approach I can clean things up a bit and write some documentation. I left some stuff that can probably be removed like the I was actually wrong about the string reading behavior. Fixed length strings with length 1 are already read as strings not characters. The only place where the character types are actually used to change behavior is for reading |
Right now I think is the appropriate time to make such changes in the upcoming breaking release. It would certainly be good to clean up as much of the old cruft and any ugly parts of the code that we can, so I'm in support. |
@kleinhenz: I have switched to the regular compound reading with named tuples: MagneticResonanceImaging/MRIReco.jl@9b1a219 works fine although there was some drop in performance. |
* requires Compat for julia 1.3
@kleinhenz LMK if this is g2g, I'm happy with the changes here. |
yeah I think this is good to go. There is some more cleanup of internals possible but it is only sort of related to the changes here and I don't want to bloat this pull request too much. |
* Add deprecated bindings for backwards compatibility with ecosystem packages Namely, get the tests of JLD.jl and MAT.jl to pass again. * Deprecations for #632 - keywords instead of property lists * Deprecations for #652 - generic read * Deprecate {d,a}_{create,write} methods with property lists The calls should use keyword properties instead.
HDF5 has the concept of NULL arrays which are distinct from 0-size arrays. As shown in JuliaIO#705, writing any 0-size array was being translated to the NULL object during writing, but having 0-length axes is also perfectly valid. This PR preserves 0-sizes when writing arrays. To represent the HDF5 NULL object, there was already the `HDF5.EmptyArray` type defined, but prior to JuliaIO#652 it appears to have been used only for internal dispatch on the generic read (which returned a 0-length vector for NULL objects). Since the generic read overhaul, it has been unused. This PR reuses `HDF5.EmptyArray` and elevates it to a full subtype of `AbstractArray`, which then becomes the returned object for any NULL datasets/attributes which are read in. This array type carries an element type but has no contents. This is distinct from Julia's `Array{T,0}` 0-dimensional arrays because the latter _does_ have a single element. (I tried just doing duck typing, but you end up playing whack-a-mole with having to define enough new method definitions that you might as well just implement the `AbstractArray` interface and take advantage of its fallbacks once the few methods are defined.) (One oddity is that Julia's 0-dimensional `Arrays` are turned into HDF5 scalars and will not be read back in as a 0-dimensional array, but (a) that is the existing behavior, and (b) I haven't thought of any other way to handle the case.)
HDF5 has the concept of NULL arrays which are distinct from 0-size arrays. As shown in JuliaIO#705, writing any 0-size array was being translated to the NULL object during writing, but having 0-length axes is also perfectly valid. This PR preserves 0-sizes when writing arrays. To represent the HDF5 NULL object, there was already the `HDF5.EmptyArray` type defined, but prior to JuliaIO#652 it appears to have been used only for internal dispatch on the generic read (which returned a 0-length vector for NULL objects). Since the generic read overhaul, it has been unused. This PR reuses `HDF5.EmptyArray` and elevates it to a full subtype of `AbstractArray`, which then becomes the returned object for any NULL datasets/attributes which are read in. This array type carries an element type but has no contents. This is distinct from Julia's `Array{T,0}` 0-dimensional arrays because the latter _does_ have a single element. (I tried just doing duck typing, but you end up playing whack-a-mole with having to define enough new method definitions that you might as well just implement the `AbstractArray` interface and take advantage of its fallbacks once the few methods are defined.) (One oddity is that Julia's 0-dimensional `Arrays` are turned into HDF5 scalars and will not be read back in as a 0-dimensional array, but (a) that is the existing behavior, and (b) I haven't thought of any other way to handle the case.)
@@ -1991,19 +1874,22 @@ function get_mem_compatible_jl_type(objtype) | |||
finally | |||
h5t_close(super_type) | |||
end | |||
elseif class_id == H5T_REFERENCE | |||
# TODO update to use version 1.12 reference functions/types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a new generic (not type constrained) read method
which reads a hyperslab of a dataset or an attribute into an array of type T.
It also adds convenience methods
which can be used like
v_read = read(h5f, "data"=>Simple)
orv_read = h5read(fn, "data"=>Simple)
to readdata
into an array with element typeSimple
(seetest/custom.jl
). This provides the functionality of #559 in a way that is more integrated with the rest of the library than adding a new separateh5read_compound
method and closes #169.Although the function does not have a type constraint I do check that the type is the correct size so this should still prevent you from shooting yourself in the foot too easily. The behavior of the generic read can be customized by defining
do_reclaim
,do_normalize
andnormalize_types
on your custom type. If we decide to do this, I will write up some documentation for customization.Internally, I was able to replace a bunch of separate read methods for different types with this one generic method. This works by separating reading from hdf5 into a memory compatible datatype from mapping that data into a native julia type. The first part can be handled generically (
HDF5Scalar
types are not special) and the second part can be handled by thenormalize_types
methods which we have been using with compound types and can be extended. I think this is a win in terms of simplification and reducing the number of lines of code. The only type that requires special handling is nowHDF5Opaque
.As a side-effect of this hyperslab indexing works generically for everything besides
HDF5Opaque
. Closes #625.As another side-effect, vlen strings are now handled uniformly via
unsafe_string
so this should hopefully close #627.The rules for mapping from hdf5 to julia types now uniformly follow the conventions we have been using for reading compound types which shouldn't be too breaking but does cause a few changes in behavior. In particular the data
is now read as
salut_vlenr = [["H", "i"], ["t", "h", "e", "r", "e"]]
instead of["Hi", "There"]
. However, I think the new behavior is actually a more correct interpretation of the datatype sinceSTRSIZE=1
. Also now hdf5 strings are read into julia strings even if they are length 1 whereas previously these would have been read in character types. This could be changed if people think that it is important. Other than thesalut_vlenr
example all other tests pass without modification.