You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So after some exploration, I believe the categorical data in the "obs" of an H5AD file works in a way by saving zero-based integer values in the 1D H5D array, and have its attribute as an H5 reference pointing to another location in the same H5AD file 'obs/__categories/variableName' where the "factor's levels" are saved. From hdf5r interface, that reference is presented as "H5R_OBJECT" class. However, I haven't yet found a clean way to create it but I can hack it by modifying the source code of an H5D class'es create_reference() method:
Overall and briefly, you create the H5D (call it a) for a factor from a data.frame by writing integers in it, and create another H5D (call it b) in "obs/__categories" for its levels, create a reference object ref <- .H5.create_reference(b), and then do a$create_attr(attr_name = "categories", robj = b, space = Scalar(), dtype = GuessDType(b)). This works for me to make an H5AD file loadable in Python with text categorical annotations shown properly. But the call of .Call() would trigger NOTES in the R CMD check of my package.
It would be nice if you would like to include this in your future updates or come up with an even better cleaner way to prevent the check notes!
Until this point you should see the file pbmc3k.h5ad created on disk and it can be loaded in Python with integer values in "orig.ident", "seurat_annotations" etc.
Go back to your R session and do:
# Load utilities you'll need
# The library
library(hdf5r)
# My hack function
H5.create_reference <- function(self, ...) {
space <- self$get_space()
do.call("[", c(list(space), list(...)))
ref_type <- hdf5r::h5const$H5R_OBJECT
ref_obj <- hdf5r::H5R_OBJECT$new(1, self)
res <- .Call("R_H5Rcreate", ref_obj$ref, self$id, ".", ref_type,
space$id, FALSE, PACKAGE = "hdf5r")
if (res$return_val < 0) {
stop("Error creating object reference")
}
ref_obj$ref <- res$ref
return(ref_obj)
}
# Load the H5AD file, which is indeed an H5 file, "r+" mode for read-and-write access
h5ad <- H5File$new("pbmc3k.h5ad", "r+")
# Fix for `orig.ident`
ref.orig.ident <- H5.create_reference(h5ad[['obs/__categories/orig.ident']])
h5ad[['obs/orig.ident']]$create_attr(
attr_name = "categories",
robj = ref.orig.ident,
space = H5S$new(type = "scalar")
)
# You might see it returns something of H5A class. Don't worry about it.
# And manually do the same for other categorical variables...
# Finally remember to close the H5AD file connection which has write-access on
h5ad$close_all()
Then you can reload the AnnData in Python and see the changes 😉
The text was updated successfully, but these errors were encountered:
Exactly related to #138
For SeuratDisk Team,
So after some exploration, I believe the categorical data in the "obs" of an H5AD file works in a way by saving zero-based integer values in the 1D H5D array, and have its attribute as an H5 reference pointing to another location in the same H5AD file 'obs/__categories/variableName' where the "factor's levels" are saved. From
hdf5r
interface, that reference is presented as "H5R_OBJECT" class. However, I haven't yet found a clean way to create it but I can hack it by modifying the source code of an H5D class'escreate_reference()
method:Overall and briefly, you create the H5D (call it
a
) for a factor from a data.frame by writing integers in it, and create another H5D (call itb
) in "obs/__categories" for its levels, create a reference objectref <- .H5.create_reference(b)
, and then doa$create_attr(attr_name = "categories", robj = b, space = Scalar(), dtype = GuessDType(b))
. This works for me to make an H5AD file loadable in Python with text categorical annotations shown properly. But the call of.Call()
would trigger NOTES in the R CMD check of my package.It would be nice if you would like to include this in your future updates or come up with an even better cleaner way to prevent the check notes!
Best,
Yichen
For users,
I'll go from the tutorial
Until this point you should see the file
pbmc3k.h5ad
created on disk and it can be loaded in Python with integer values in "orig.ident", "seurat_annotations" etc.Go back to your R session and do:
Then you can reload the AnnData in Python and see the changes 😉
The text was updated successfully, but these errors were encountered: