Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document gotchas when interoperation with python #804

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ makedocs(;
pages=[
"Home" => "index.md",
"Low-level library bindings" => "api_bindings.md",
"Python interoperability" => "h5py.md"
],
)

Expand Down
105 changes: 105 additions & 0 deletions docs/src/h5py.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Python interoperability

When loading python created hdf5 files from Julia the dimensions of arrays are reversed.
The reason is that in python C-memory layout is the default, while Julia uses Fortran layout.
Here is an example:
```python
import h5py
import numpy as np
path = "created_by_h5py.h5"
file = h5py.File(path, "w")
arr1d = np.array([1,2,3])
arr2d = np.array([[1,2,3], [4,5,6]])
arr3d = np.array([[[1,2,3], [4,5,6]]])
assert arr1d.shape == (3,)
assert arr2d.shape == (2,3)
assert arr3d.shape == (1,2,3)
file["1d"] = arr1d
file["2d"] = arr2d
file["3d"] = arr3d
file.close()
```
When we try to load it from julia, dimensions are reversed:
```julia
using HDF5
path = "created_by_h5py.h5"
h5open(path, "r") do file
arr1d = read(file["1d"])
arr2d = read(file["2d"])
arr3d = read(file["3d"])
@assert size(arr1d) == (3,)
@assert size(arr2d) == (3,2)
@assert size(arr3d) == (3,2,1)
end
```
To fix this, we can simply reverse the dimensions again:

```julia
using HDF5
function reversedims(arr)
dims = ntuple(identity, Val(ndims(arr)))
return permutedims(arr, reverse(dims))
jw3126 marked this conversation as resolved.
Show resolved Hide resolved
end

path = "created_by_h5py.h5"
h5open(path, "r") do file
arr1d = reversedims(read(file["1d"]))
arr2d = reversedims(read(file["2d"]))
arr3d = reversedims(read(file["3d"]))
@assert arr1d == [1,2,3]
@assert arr2d == [1 2 3; 4 5 6]
@assert arr3d == reshape(arr2d, (1,2,3))
end
```
Similarly `reversedims` can be used before saving arrays intended for use from python.
If copying of data is undesirable, other options are:
* using Fortran memory layout on the python side
* using C-memory layout on the Julia side (e.g. a lazy variant of `reversedims`)

The whole example as a Julia executable script:
jw3126 marked this conversation as resolved.
Show resolved Hide resolved
```julia
using PyCall
Copy link
Author

@jw3126 jw3126 Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add PyCall as a test dependency and turn this into a doctest. However it might be fragile since we need to install h5py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is possible to use Conda.jl to install the hdf5 library in order to turn these to doctests. I'm not opposed to adding a PyCall test dependency. In general, it does seem to be a good idea to ensure inter-op.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but this introduces further headaches. If you run tests locally using e.g. system python things may fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this anyway.

py"""
import h5py
import numpy as np
path = "created_by_h5py.h5"
file = h5py.File(path, "w")
arr1d = np.array([1,2,3])
arr2d = np.array([[1,2,3], [4,5,6]])
arr3d = np.array([[[1,2,3], [4,5,6]]])
assert arr1d.shape == (3,)
assert arr2d.shape == (2,3)
assert arr3d.shape == (1,2,3)
file["1d"] = arr1d
file["2d"] = arr2d
file["3d"] = arr3d
file.close()
"""

using HDF5
path = "created_by_h5py.h5"
h5open(path, "r") do file
arr1d = read(file["1d"])
arr2d = read(file["2d"])
arr3d = read(file["3d"])
@assert size(arr1d) == (3,)
@assert size(arr2d) == (3,2)
@assert size(arr3d) == (3,2,1)
end

using HDF5
function reversedims(arr)
dims = ntuple(identity, Val(ndims(arr)))
return permutedims(arr, reverse(dims))
end

path = "created_by_h5py.h5"
h5open(path, "r") do file
arr1d = reversedims(read(file["1d"]))
arr2d = reversedims(read(file["2d"]))
arr3d = reversedims(read(file["3d"]))
@assert arr1d == [1,2,3]
@assert arr2d == [1 2 3; 4 5 6]
@assert arr3d == reshape(arr2d, (1,2,3))
end
```