Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

mvfki · 2024-06-25T18:22:20Z

Exactly related to #138

For SeuratDisk Team,

So after some exploration, I believe the categorical data in the "obs" of an H5AD file works in a way by saving zero-based integer values in the 1D H5D array, and have its attribute as an H5 reference pointing to another location in the same H5AD file 'obs/__categories/variableName' where the "factor's levels" are saved. From hdf5r interface, that reference is presented as "H5R_OBJECT" class. However, I haven't yet found a clean way to create it but I can hack it by modifying the source code of an H5D class'es create_reference() method:

# `self` is an H5D object
.H5.create_reference <- function(self, ...) {
    space <- self$get_space()
    do.call("[", c(list(space), list(...)))
    ref_type <- hdf5r::h5const$H5R_OBJECT
    ref_obj <- hdf5r::H5R_OBJECT$new(1, self)
    res <- .Call("R_H5Rcreate", ref_obj$ref, self$id, ".", ref_type,
                 space$id, FALSE, PACKAGE = "hdf5r")
    if (res$return_val < 0) {
        stop("Error creating object reference")
    }
    ref_obj$ref <- res$ref
    return(ref_obj)
}

Overall and briefly, you create the H5D (call it a) for a factor from a data.frame by writing integers in it, and create another H5D (call it b) in "obs/__categories" for its levels, create a reference object ref <- .H5.create_reference(b), and then do a$create_attr(attr_name = "categories", robj = b, space = Scalar(), dtype = GuessDType(b)). This works for me to make an H5AD file loadable in Python with text categorical annotations shown properly. But the call of .Call() would trigger NOTES in the R CMD check of my package.

It would be nice if you would like to include this in your future updates or come up with an even better cleaner way to prevent the check notes!

Best,
Yichen

For users,

I'll go from the tutorial

library(Seurat)
library(SeuratData)
library(SeuratDisk)
InstallData("pbmc3k")
data("pbmc3k.final")
SaveH5Seurat(pbmc3k.final, filename = "pbmc3k.h5Seurat")
Convert("pbmc3k.h5Seurat", dest = "h5ad")

Until this point you should see the file pbmc3k.h5ad created on disk and it can be loaded in Python with integer values in "orig.ident", "seurat_annotations" etc.

Go back to your R session and do:

# Load utilities you'll need
# The library
library(hdf5r)
# My hack function
H5.create_reference <- function(self, ...) {
    space <- self$get_space()
    do.call("[", c(list(space), list(...)))
    ref_type <- hdf5r::h5const$H5R_OBJECT
    ref_obj <- hdf5r::H5R_OBJECT$new(1, self)
    res <- .Call("R_H5Rcreate", ref_obj$ref, self$id, ".", ref_type,
                 space$id, FALSE, PACKAGE = "hdf5r")
    if (res$return_val < 0) {
        stop("Error creating object reference")
    }
    ref_obj$ref <- res$ref
    return(ref_obj)
}
# Load the H5AD file, which is indeed an H5 file, "r+" mode for read-and-write access
h5ad <- H5File$new("pbmc3k.h5ad", "r+")
# Fix for `orig.ident`
ref.orig.ident <- H5.create_reference(h5ad[['obs/__categories/orig.ident']])
h5ad[['obs/orig.ident']]$create_attr(
    attr_name = "categories", 
    robj = ref.orig.ident, 
    space = H5S$new(type = "scalar")
)
# You might see it returns something of H5A class. Don't worry about it.
# And manually do the same for other categorical variables...
# Finally remember to close the H5AD file connection which has write-access on
h5ad$close_all()

Then you can reload the AnnData in Python and see the changes 😉

The text was updated successfully, but these errors were encountered:

mvfki · 2024-06-25T18:24:45Z

Should also be related to #137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

mvfki commented Jun 25, 2024

mvfki commented Jun 25, 2024

Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

Possible solution for converting categorical variable (factor) to proper text categoricals in H5AD #183

Comments

mvfki commented Jun 25, 2024

mvfki commented Jun 25, 2024