-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add h5read_compound to read compound structures efficiently #559
Conversation
I'd like to add a test but I still did not manage to use HDF5.jl to write a compound structure. The appveyor complains about Edit: I just realised there is a |
I am a bit puzzled by this error on Travis:
why is it running in a local scope? The test suite works fine on my desktop... |
OK got it, it has to be outside of the test suite block. |
Can this parse arrays of compounds? In particular, the challenge is parsing the following file format: I made a custom parser but of course it would be great is HDF5.jl would support this directly. |
Not sure if I understand what you mean. In the example above it does read an array of compounds. I was not able to read the HDF5 file you provided. Can you provide a h5read_compound("simple_gre.h5", "/dataset/data", TheStruct); |
Hmm, I think I got it. I see that each entry in your file is an array of compounds... Not sure how to deal with that though. At least I successfully used |
I don't want to hold up this PR but just mention that in the end it would also be cool to read such files. h5py is able to do that. It basically constructs the the structs in a dynamic fashion. Your approach has the advantage that all is type stable. |
That sounds very interesting. Note that I wrote a custom parser for the data: https://github.com/MagneticResonanceImaging/MRIReco.jl/blob/master/src/IO/ISMRMRD.jl#L14 Of course I am relying on the current implementation. https://github.com/MagneticResonanceImaging/MRIReco.jl/blob/master/src/IO/ISMRMRD.jl#L70 Is you implementation handling that well? Other question: In my library I have already defined the struct the is comparable to the compound stored, see https://github.com/MagneticResonanceImaging/MRIReco.jl/blob/master/src/Datatypes/RawAcqData.jl#L6 Ideally I am able to convert the named tuple into that. |
initial testing looks very good! Nice job |
I am not against deriving the types dynamically, it's actually a nice feature to have, but there are also other use cases where you would like to map an existing I made some preliminary checks and it also seems that the static approach is much faster than the dynamic one. On the other hand, both approaches are orthogonal to each other, so I think they could be both merged. What's the status of the PR? Any chance of accepting it or should I close it? |
Any chance we can reach some convergence on this PR and #592 ? Some sort of unanimous decision would be great, so we can make progress. |
I agree that the two approaches are mostly orthogonal so it wouldn't hurt to merge both. I think my approach is more in line with how things work in @tamasgal I'm curious where my approach was significantly slower? The actual hdf5 io routines called in both cases should be identical. On this benchmark I get basically identical performance. In the case where there are strings or other vlen members then I have some overhead from copying the data over into the native julia type but this is not really avoidable as far as I can tell. |
@kleinhenz I have to check carefully but I had no time yet. I'll try out your example and compare that! |
filetype = datatype(dset) # packed layout on disk | ||
memtype_id = h5t_get_native_type(filetype.id) # padded layout in memory | ||
@assert sizeof(T) == h5t_get_size(memtype_id) "Type sizes mismatch!" | ||
out = Vector{T}(undef, length(dset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not out = Array{T}(undef, size(dset))
to preserve all the size information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
I think it would be nice to integrate this a little more with the existing api infrastructure before it is exported. For example currently this doesn't allow reading hyperslabs. It's interesting to note that the current
would already almost work for this if it weren't for the |
Yes good catch and idea! I’d appreciate if you could take over, I am quite overcommitted these days (well, who is not in this community 🙈 ). |
@kleinhenz: I second my request that this API is able of handeling the ISMRMRD data format. #592 will break my code in MRIReco.jl and I hope that the present PR will rescue me. |
What's the status here? Should we proceed or do something else? |
#652 is my idea of how to do this. |
The
h5read_compound
function with its three methods is already in use in all of my projects where I have to readHDF5Compound
and is roughly two orders of magnitudes faster than usingread
orh5read
and then parsing the data manually.I carry over these functions all over the place and I would really like them to be included in
HDF5.jl
.Credits go to @damiendr who provided the base code snippet back in 2017: #408
For demonstration purposes I created a file
compound.h5
using Python+PyTables with a compound structure using this Python script:Here are the results of a simple benchmark: