Introduction
- - Example Workflow +
- Concepts and features
- Package Architecture
- Development Roadmap
Introduction
- - Example Workflow +
- Concepts and features
- Package Architecture
- Development Roadmap
- +Instance: A LaminDB instance is a self-contained +environment for storing and managing data and metadata. Think of it like +a database or a project directory. Each instance has its own schema, +storage location, and metadata database. +
- +Module: A module is a collection of related +registries that provide specific functionality. For example, the core +module contains essential registries for general data management, while +the bionty module provides registries for biological entities like genes +and proteins. +
- +Registry: A registry is a centralized collection of +related records, similar to a table in a database. Each registry holds a +specific type of metadata, such as information about artifacts, +transforms, or features. +
- +Record: A record is a single entry within a +registry, analogous to a row in a database table. Each record represents +a specific entity and combines multiple fields of information. +
- +Field: A field is a single piece of information +within a record, like a column in a database table. For example, an +artifact record might have fields for its name, description, and +creation date. +
- Install the {laminr} package. +
- (Optional) Install suggested dependencies. +
Introduction
- - Example Workflow +
- Concepts and features
- Package Architecture
- Development Roadmap
-
Introduction
- - Example Workflow +
- Concepts and features
- Package Architecture
- Development Roadmap
- Architecture
- -
- Feature List and Roadmap +
- Concepts and features
- -
- Example Workflow: CELLxGENE +
- Feature List and Roadmap
- -
- Getting Started +
- Getting started
- Bionty Module
-
diff --git a/articles/laminr.html b/articles/laminr.html
index 3c74ff6..aad87d7 100644
--- a/articles/laminr.html
+++ b/articles/laminr.html
@@ -5,12 +5,12 @@
-
Getting Started • laminr +Getting started • laminr - + Skip to contents @@ -35,7 +35,7 @@Introduction
- - Example Workflow +
- Concepts and features
- Package Architecture
- Development Roadmap
- -Instance: A LaminDB instance is a self-contained -environment for storing and managing data and metadata. Think of it like -a database or a project directory. Each instance has its own schema, -storage location, and metadata database. -
- -Module: A module is a collection of related -registries that provide specific functionality. For example, the core -module contains essential registries for general data management, while -the bionty module provides registries for biological entities like genes -and proteins. -
- -Registry: A registry is a centralized collection of -related records, similar to a table in a database. Each registry holds a -specific type of metadata, such as information about artifacts, -transforms, or features. -
- -Record: A record is a single entry within a -registry, analogous to a row in a database table. Each record represents -a specific entity and combines multiple fields of information. -
- -Field: A field is a single piece of information -within a record, like a column in a database table. For example, an -artifact record might have fields for its name, description, and -creation date. -
- Install the {laminr} package. -
- (Optional) Install suggested dependencies. -
- Render the notebook to HTML (not needed for
.R
+scripts)
+ In RStudio, click the “Knit” button
+-
+
OR From the command line, run:
+ +
+ -
+
OR Use the
rmarkdown
package in +R:--
artifact$cache() # Cache the data locally -#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100% -artifact$load() # Load the data into memory -#> ℹ s3://cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad already exists at /home/runner/.cache/lamindb/cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad -#> AnnData object with n_obs × n_vars = 51552 × 36398 -#> obs: 'donor_id', 'Predicted_labels_CellTypist', 'Majority_voting_CellTypist', 'Manually_curated_celltype', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid' -#> var: 'gene_symbols', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length' -#> uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'sex_ontology_term_id_colors', 'title' -#> obsm: 'X_umap'
-+Currently, {laminr} primarily supports S3 storage. -Support for other storage backends will be added in the future. For more -information related to planned features and the roadmap, please refer to -the Development vignette -(
+vignette("development", package = "laminr")
).rmarkdown::render("laminr.Rmd")
+ - Save it to your LaminDB instance using the
lamin
+CLI:
+
--Getting Started
+Getting started
- Source:vignettes/laminr.Rmd
+ Source:vignettes/laminr.Rmd
laminr.Rmd
This vignette provides a practical introduction to using the -{laminr} package to interact with LaminDB. We’ll start -with a brief overview of key concepts and then walk through the basic -steps to connect to a LaminDB instance and work with its core -components.
-Key Concepts in LaminDB +
Introduction
-Before diving into the practical usage of {laminr}, -it’s helpful to understand some core concepts in LaminDB. For a more -detailed explanation, refer to the Architecture vignette -(
-vignette("architecture", package = "laminr")
).-
-
This vignettes provides a quick introduction to the +{laminr} workflow. For more details about how +{laminr} works see +
vignette("concepts_features", package = "laminr")
.-Initial setup +
Installation
-Now, let’s set up your environment to use -{laminr}.
- --R setup -
--
-
+
Install {laminr} from CRAN using:
+-install.packages("laminr")
-
-
++
You will also need to install the
+ +lamindb
Python +package:Some functionality requires additional packages. You will be prompted +to install them as needed or you can install them all now with:
+-install.packages("laminr", dependencies = TRUE)
This includes packages like {anndata} for working -with AnnData objects and {s3} for interacting with S3 -storage.
-See the “Initial setup” section of +
vignette("concepts_features", package = "laminr")
for more +details.-Connecting to LaminDB from R +
Connecting to LaminDB
-Connect to the
- -laminlabs/cellxgene
instance from your R -session:The
+db
object now represents your connection to the -LaminDB instance. You can explore the available registries (like -Artifact
,Collection
,Feature
, -etc.) by simply printing thedb
object:Load {laminr} to get started.
+ +++Connect to the default instance +
+The default LaminDB instance is set using the
+lamin
CLI +on the command line:
+lamin connect <owner>/<name>
Once a default instance has been set, connect to it with +{laminr}:
--
db +
db <- connect() +#> ! schema module 'bionty' is not installed → no access to its labels & registries (resolve via `pip install bionty`) +#> → connected lamindb: laminlabs/cellxgene +db #> cellxgene #> Core registries #> $Run @@ -179,194 +133,395 @@
Connecting to LaminDB from R#> $FeatureValue #> Additional modules #> bionty
These registries correspond to Python classes in LaminDB.
-To access registries within specific modules, use the $ operator. For -example, to access the bionty module:
+++Note
+Only the default instance can create new records. This tutorial +assumes you have access to an instance where you have permission to add +data.
+++Connect to other instances +
+It is possible to connect to non-default instances by providing a +slug to the
connect()
function. Instances connected to in +this way can be used to query data but cannot make any changes. Connect +to the public CELLxGENE instance:--
db$bionty -#> bionty -#> Registries -#> $Gene -#> $Source -#> $Tissue -#> $Disease -#> $Pathway -#> $Protein -#> $CellLine -#> $CellType -#> $Organism -#> $Ethnicity -#> $Phenotype -#> $CellMarker -#> $DevelopmentalStage -#> $ExperimentalFactor
The
+bionty
and other registries also have corresponding -Python classes.cellxgene <- connect("laminlabs/cellxgene") +cellxgene +#> cellxgene +#> Core registries +#> $Run +#> $User +#> $Param +#> $ULabel +#> $Feature +#> $Storage +#> $Artifact +#> $Transform +#> $Collection +#> $FeatureSet +#> $ParamValue +#> $FeatureValue +#> Additional modules +#> bionty
-+Working with registries +
Track data provenance
-Let’s use the
-Artifact
registry as an example. This -registry stores datasets, models, and other data entities.To see the available functions for the
+Artifact
-registry, print the registry object:LaminDB can track which scripts or notebooks were used to create +data. Starts the tracking process:
--
db$Artifact -#> Artifact -#> Simple fields -#> id: AutoField -#> key: CharField -#> uid: CharField -#> hash: CharField -#> size: BigIntegerField -#> type: CharField -#> suffix: CharField -#> version: CharField -#> is_latest: BooleanField -#> n_objects: BigIntegerField -#> created_at: DateTimeField -#> updated_at: DateTimeField -#> visibility: SmallIntegerField -#> description: CharField -#> n_observations: BigIntegerField -#> Relational fields -#> run: Run (many-to-one) -#> storage: Storage (many-to-one) -#> ulabels: ULabel (many-to-many) -#> transform: Transform (many-to-one) -#> created_by: User (many-to-one) -#> collections: Collection (many-to-many) -#> feature_sets: FeatureSet (many-to-many) -#> input_of_runs: Run (many-to-many) -#> Bionty fields -#> genes: bionty$Gene (many-to-many) -#> tissues: bionty$Tissue (many-to-many) -#> diseases: bionty$Disease (many-to-many) -#> pathways: bionty$Pathway (many-to-many) -#> proteins: bionty$Protein (many-to-many) -#> organisms: bionty$Organism (many-to-many) -#> cell_lines: bionty$CellLine (many-to-many) -#> cell_types: bionty$CellType (many-to-many) -#> phenotypes: bionty$Phenotype (many-to-many) -#> ethnicities: bionty$Ethnicity (many-to-many) -#> cell_markers: bionty$CellMarker (many-to-many) -#> developmental_stages: bionty$DevelopmentalStage (many-to-many) -#> experimental_factors: bionty$ExperimentalFactor (many-to-many)
You can also get a data frame summarising the records associated with -a registry.
-+-
db$Artifact$df(limit = 5) -#> id suffix X_accessor n_objects visibility -#> 1 2846 tiledbsoma 290 1 -#> 2 3665 tiledbsoma 330 1 -#> 3 1270 .h5ad AnnData NA 1 -#> 4 2840 .ipynb <NA> NA 0 -#> 5 2842 .html <NA> NA 0 -#> key -#> 1 cell-census/2023-12-15/soma -#> 2 cell-census/2024-07-01/soma -#> 3 cell-census/2023-07-25/h5ads/7a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad -#> 4 <NA> -#> 5 <NA> -#> uid size hash -#> 1 FYMewVq5twKMDXVy0000 635848093433 Mfyw8VuqftX5REITfQH_yg -#> 2 FYMewVq5twKMDXVy0001 870700998221 bzrXBPNvitSVKvb3GG38_w -#> 3 tczTlSHFPOcAcBnfyxKA 1297573950 UlsVvBz9kMzn2r9RdoAAOg -#> 4 JIIPyQX5l9qELPl42d75 36297 gNdUkonYgQJP_Mi3xLzt_g -#> 5 Whyxwf3k2GjJwTPCl1FK 716529 BDGZac3qU3oLVFpO035Qhg -#> description n_observations is_latest X_hash_type -#> 1 Census 2023-12-15 68683222 FALSE md5-d -#> 2 Census 2024-07-01 115556140 TRUE md5-d -#> 3 Supercluster: Hippocampal CA1-3 74979 FALSE md5-n -#> 4 Source of transform G69jtgzKO0eJ6K79 NA FALSE md5 -#> 5 Report of run UAAiLAi0BrLvlKnsuvP3 NA FALSE md5 -#> type created_at X_key_is_virtual -#> 1 dataset 2024-07-12T12:12:16.091881+00:00 FALSE -#> 2 dataset 2024-07-16T12:52:01.424629+00:00 FALSE -#> 3 <NA> 2023-11-28T21:46:12.685907+00:00 FALSE -#> 4 <NA> 2024-01-29T08:32:13.311741+00:00 TRUE -#> 5 <NA> 2024-01-29T08:32:18.346499+00:00 TRUE -#> updated_at version -#> 1 2024-09-17T13:00:13.714256+00:00 2023-12-15 -#> 2 2024-09-17T13:01:23.739635+00:00 2024-07-01 -#> 3 2024-01-24T07:10:21.725547+00:00 2023-07-25 -#> 4 2024-01-29T08:32:13.311792+00:00 0 -#> 5 2024-01-30T09:12:06.027928+00:00 1
db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd")
+Tip
+The ID should be obtained by running +
+db$track(path = "your_file.R")
and copying the ID from the +output.-+Working with records +
Download a dataset
-You can fetch a specific record from a registry using its ID or UID. -For instance, to get the artifact with UID KBW89Mf7IGcekja2hADu:
+Artifacts are objects that contain measurements as well as associated +metadata.
+++
artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk") +artifact +#> Artifact(uid='7dVluLROpalzEh8mNyxk', description='Renal cell carcinoma, pre aPD1, kidney Puck_200727_12', key='cell-census/2023-12-15/h5ads/02faf712-92d4-4589-bec7-13105059cf86.h5ad', id=1742, run_id=22, hash='YNYuokfAoDFxdaRILjmU9w', size=13997860, suffix='.h5ad', storage_id=2, version='2023-12-15', _accessor='AnnData', is_latest=TRUE, transform_id=16, _hash_type='md5-n', created_at='2024-01-11T09:13:23.143694+00:00', created_by_id=1, updated_at='2024-01-24T07:17:47.009288+00:00', visibility=1, n_observations=17612, _key_is_virtual=FALSE)
++Tip
+You can view information about this dataset on Lamin Hub https://lamin.ai/laminlabs/cellxgene/artifact/7dVluLROpalzEh8mNyxk. +It can also be used to search for other CELLxGENE datasets.
+So far only retrieved the metadata of this artifact has been +retrieved. To download the data itself, run:
--
artifact <- db$Artifact$get("KBW89Mf7IGcekja2hADu")
This artifact contains an
+AnnData
object with myeloid -cell data. You can view its metadata:adata <- artifact$load() +#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100% +adata +#> AnnData object with n_obs × n_vars = 17612 × 23254 +#> obs: 'n_genes', 'n_UMIs', 'log10_n_UMIs', 'log10_n_genes', 'Cell_Type', 'cell_type_ontology_term_id', 'organism_ontology_term_id', 'tissue_ontology_term_id', 'assay_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'sex_ontology_term_id', 'donor_id', 'is_primary_data', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage' +#> var: 'gene', 'n_beads', 'n_UMIs', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype' +#> uns: 'Cell_Type_colors', 'schema_version', 'title' +#> obsm: 'X_spatial'
You can see that this artifact contains an
+AnnData
+object.++Work with the data +
+Once you have loaded a dataset you can perform any analysis with it +as you would normally. Here, marker genes are calculated for each of the +provided cell type labels using {Seurat}.
--
artifact -#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE)
For artifact records, you can get more detailed information:
+# Create a Seurat object +seurat <- SeuratObject::CreateSeuratObject( + counts = as(Matrix::t(adata$X), "CsparseMatrix"), + meta.data = adata$obs, +) +# Set cell identities to the provided cell type annotation +SeuratObject::Idents(seurat) <- "Cell_Type" +# Normalise the data +seurat <- Seurat::NormalizeData(seurat) +#> Normalizing layer: counts +# Test for marker genes (the output is a data.frame) +markers <- Seurat::FindAllMarkers( + seurat, + features = SeuratObject::Features(seurat)[1:100] # Only test a few features for speed +) +#> Calculating cluster Epithelial +#> Calculating cluster Fibroblast +#> For a (much!) faster implementation of the Wilcoxon Rank Sum Test, +#> (default method for FindMarkers) please install the presto package +#> -------------------------------------------- +#> install.packages('devtools') +#> devtools::install_github('immunogenomics/presto') +#> -------------------------------------------- +#> After installation of presto, Seurat will automatically use the more +#> efficient implementation (no further action necessary). +#> This message will be shown once per session +#> Calculating cluster Myeloid +#> Calculating cluster Tumor +#> Warning: The following tests were not performed: +#> Warning: When testing Epithelial versus all: +#> Cell group 1 has fewer than 3 cells +# Display the marker genes +knitr::kable(markers)
+
+ ++ + + + + + + + + + ++ p_val +avg_log2FC +pct.1 +pct.2 +p_val_adj +cluster +gene ++ +ENSG00000147113 +0.0000001 +1.8228103 +0.019 +0.005 +0.0011654 +Fibroblast +ENSG00000147113 ++ +ENSG00000170004 +0.0000002 +2.5663044 +0.021 +0.006 +0.0036485 +Fibroblast +ENSG00000170004 ++ +ENSG00000196139 +0.0000003 +-0.8318130 +0.053 +0.110 +0.0058749 +Fibroblast +ENSG00000196139 ++ +ENSG00000132170 +0.0006719 +1.7151510 +0.020 +0.009 +1.0000000 +Fibroblast +ENSG00000132170 ++ +ENSG00000205542 +0.0007360 +0.6442683 +0.230 +0.195 +1.0000000 +Fibroblast +ENSG00000205542 ++ +ENSG00000163536 +0.0025157 +1.8914687 +0.012 +0.004 +1.0000000 +Fibroblast +ENSG00000163536 ++ +ENSG00000067064 +0.0063090 +1.2509162 +0.014 +0.006 +1.0000000 +Fibroblast +ENSG00000067064 ++ +ENSG00000105855 +0.0068491 +-0.5412708 +0.022 +0.041 +1.0000000 +Fibroblast +ENSG00000105855 ++ +ENSG00000205542.1 +0.0000002 +1.3623005 +0.310 +0.195 +0.0046479 +Myeloid +ENSG00000205542 ++ +ENSG00000196139.1 +0.0015658 +-0.5898982 +0.040 +0.108 +1.0000000 +Myeloid +ENSG00000196139 ++ +ENSG00000196139.2 +0.0000000 +0.7939631 +0.111 +0.050 +0.0000224 +Tumor +ENSG00000196139 ++ +ENSG00000205542.2 +0.0000001 +-0.8585382 +0.193 +0.247 +0.0013456 +Tumor +ENSG00000205542 ++ +ENSG00000147113.1 +0.0000018 +-1.4976270 +0.005 +0.016 +0.0415774 +Tumor +ENSG00000147113 ++ +ENSG00000170004.1 +0.0000073 +-2.2898276 +0.006 +0.018 +0.1686987 +Tumor +ENSG00000170004 ++ +ENSG00000105855.1 +0.0003828 +0.7197716 +0.041 +0.019 +1.0000000 +Tumor +ENSG00000105855 ++ +ENSG00000053371 +0.0038080 +0.8347505 +0.029 +0.014 +1.0000000 +Tumor +ENSG00000053371 ++ +ENSG00000141385 +0.0058269 +1.0575502 +0.019 +0.007 +1.0000000 +Tumor +ENSG00000141385 ++ +ENSG00000132170.1 +0.0072852 +-1.3878878 +0.009 +0.017 +1.0000000 +Tumor +ENSG00000132170 ++ + +ENSG00000163536.1 +0.0076905 +-1.8629158 +0.004 +0.010 +1.0000000 +Tumor +ENSG00000163536 +--
artifact$describe() -#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE) -#> Provenance -#> $storage = 's3://cellxgene-data-public' -#> $transform = 'Census release 2024-07-01 (LTS)' -#> $run = '2024-07-16T12:49:41.81955+00:00' -#> $created_by = 'sunnyosun'
Access specific fields of the record using the
+$
-operator:# Plot the marker genes +Seurat::DotPlot(seurat, features = unique(markers$gene)) + + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)) +#> Warning: Scaling data with a low number of groups may produce misleading +#> results
+ + +++ +Save the results to your instance +
+Any results can be saved to the default LaminDB instance.
--
artifact$id -#> [1] 3659 -artifact$uid -#> [1] "KBW89Mf7IGcekja2hADu" -artifact$key -#> [1] "cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad"
Some fields of a record contain links to related information.
+seurat_path <- tempfile(fileext = ".rds") +saveRDS(seurat, seurat_path) + +db$Artifact$from_df( + markers, + description = "Marker genes for renal cell carcinoma dataset" +)$save() + +db$Artifact$from_path( + seurat_path, + description = "Seurat object for renal cell carcinoma dataset" +)$save()
++Finish tracking +
+End the tracking run to generate a timestamp:
--
artifact$storage -#> Storage(uid='oIYGbD74', root='s3://cellxgene-data-public', id=2, type='s3', region='us-west-2', created_at='2023-09-19T13:17:56.273068+00:00', created_by_id=1, updated_at='2023-10-16T15:04:08.998203+00:00') -artifact$developmental_stages -#> RelatedRecords(field_name='developmental_stages', relation_type='many-to-many', related_to='KBW89Mf7IGcekja2hADu')
When those that are one-to-many or many-to-many relationship, a -summary of the related information can be retrieved as a data frame.
---
artifact$developmental_stages$df() -#> id uid abbr name synonyms -#> 1 422 1xebUrrX NA sixth decade human stage NA -#> 2 423 3yuYMeZt NA seventh decade human stage NA -#> 3 424 2EztBuvx NA eighth decade human stage NA -#> created_at updated_at -#> 1 2023-11-28T23:05:31.450102+00:00 2023-11-28T23:05:31.450106+00:00 -#> 2 2023-11-28T23:05:31.450123+00:00 2023-11-28T23:05:31.450127+00:00 -#> 3 2023-11-28T23:05:31.450144+00:00 2023-11-28T23:05:31.450149+00:00 -#> description -#> 1 Human Stage That Refers To An Individual Who Is Over 50 And Under 60 Years Old. -#> 2 Human Stage That Refers To An Individual Who Is Over 60 And Under 70 Years Old. -#> 3 Human Stage That Refers To An Individual Who Is Over 70 And Under 80 Years Old. -#> ontology_id -#> 1 HsapDv:0000240 -#> 2 HsapDv:0000241 -#> 3 HsapDv:0000242
Finally, for artifact records only, you can download the associated -data:
+db$finish()
+Save notebooks and code +
+Save the tracked notebook to your instance:
+-
+
-
+
-
+
All vignettes
Architecture
- Source:vignettes/architecture.qmd
+ Source: vignettes/architecture.qmd
This vignette provides a more detailed introduction to the concepts +and features of {laminr}. We’ll start with a brief +overview of key concepts and then walk through the basic steps to +connect to a LaminDB instance and work with its core components.
+Key Concepts in LaminDB +
+Before diving into the practical usage of {laminr},
+it’s helpful to understand some core concepts in LaminDB. For a more
+detailed explanation, refer to the Architecture vignette
+(vignette("architecture", package = "laminr")
).
-
+
Initial setup +
+Now, let’s set up your environment to use +{laminr}.
+ +R setup +
+-
+
+install.packages("laminr")
-
+
+install.packages("laminr", dependencies = TRUE)
This includes packages like {anndata} for working +with AnnData objects and {s3} for interacting with S3 +storage.
+Connecting to LaminDB from R +
+Connect to the laminlabs/cellxgene
instance from your R
+session:
The db
object now represents your connection to the
+LaminDB instance. You can explore the available registries (like
+Artifact
, Collection
, Feature
,
+etc.) by simply printing the db
object:
+db
+#> cellxgene
+#> Core registries
+#> $Run
+#> $User
+#> $Param
+#> $ULabel
+#> $Feature
+#> $Storage
+#> $Artifact
+#> $Transform
+#> $Collection
+#> $FeatureSet
+#> $ParamValue
+#> $FeatureValue
+#> Additional modules
+#> bionty
These registries correspond to Python classes in LaminDB.
+To access registries within specific modules, use the $ operator. For +example, to access the bionty module:
+
+db$bionty
+#> bionty
+#> Registries
+#> $Gene
+#> $Source
+#> $Tissue
+#> $Disease
+#> $Pathway
+#> $Protein
+#> $CellLine
+#> $CellType
+#> $Organism
+#> $Ethnicity
+#> $Phenotype
+#> $CellMarker
+#> $DevelopmentalStage
+#> $ExperimentalFactor
The bionty
and other registries also have corresponding
+Python classes.
Working with registries +
+Let’s use the Artifact
registry as an example. This
+registry stores datasets, models, and other data entities.
To see the available functions for the Artifact
+registry, print the registry object:
+db$Artifact
+#> Artifact
+#> Simple fields
+#> id: AutoField
+#> key: CharField
+#> uid: CharField
+#> hash: CharField
+#> size: BigIntegerField
+#> type: CharField
+#> suffix: CharField
+#> version: CharField
+#> is_latest: BooleanField
+#> n_objects: BigIntegerField
+#> created_at: DateTimeField
+#> updated_at: DateTimeField
+#> visibility: SmallIntegerField
+#> description: CharField
+#> n_observations: BigIntegerField
+#> Relational fields
+#> run: Run (many-to-one)
+#> storage: Storage (many-to-one)
+#> ulabels: ULabel (many-to-many)
+#> transform: Transform (many-to-one)
+#> created_by: User (many-to-one)
+#> collections: Collection (many-to-many)
+#> feature_sets: FeatureSet (many-to-many)
+#> input_of_runs: Run (many-to-many)
+#> Bionty fields
+#> genes: bionty$Gene (many-to-many)
+#> tissues: bionty$Tissue (many-to-many)
+#> diseases: bionty$Disease (many-to-many)
+#> pathways: bionty$Pathway (many-to-many)
+#> proteins: bionty$Protein (many-to-many)
+#> organisms: bionty$Organism (many-to-many)
+#> cell_lines: bionty$CellLine (many-to-many)
+#> cell_types: bionty$CellType (many-to-many)
+#> phenotypes: bionty$Phenotype (many-to-many)
+#> ethnicities: bionty$Ethnicity (many-to-many)
+#> cell_markers: bionty$CellMarker (many-to-many)
+#> developmental_stages: bionty$DevelopmentalStage (many-to-many)
+#> experimental_factors: bionty$ExperimentalFactor (many-to-many)
You can also get a data frame summarising the records associated with +a registry.
+
+db$Artifact$df(limit = 5)
+#> id suffix X_accessor n_objects visibility
+#> 1 2846 tiledbsoma 290 1
+#> 2 3665 tiledbsoma 330 1
+#> 3 1270 .h5ad AnnData NA 1
+#> 4 2840 .ipynb <NA> NA 0
+#> 5 2842 .html <NA> NA 0
+#> key
+#> 1 cell-census/2023-12-15/soma
+#> 2 cell-census/2024-07-01/soma
+#> 3 cell-census/2023-07-25/h5ads/7a0a8891-9a22-4549-a55b-c2aca23c3a2a.h5ad
+#> 4 <NA>
+#> 5 <NA>
+#> uid size hash
+#> 1 FYMewVq5twKMDXVy0000 635848093433 Mfyw8VuqftX5REITfQH_yg
+#> 2 FYMewVq5twKMDXVy0001 870700998221 bzrXBPNvitSVKvb3GG38_w
+#> 3 tczTlSHFPOcAcBnfyxKA 1297573950 UlsVvBz9kMzn2r9RdoAAOg
+#> 4 JIIPyQX5l9qELPl42d75 36297 gNdUkonYgQJP_Mi3xLzt_g
+#> 5 Whyxwf3k2GjJwTPCl1FK 716529 BDGZac3qU3oLVFpO035Qhg
+#> description n_observations is_latest X_hash_type
+#> 1 Census 2023-12-15 68683222 FALSE md5-d
+#> 2 Census 2024-07-01 115556140 TRUE md5-d
+#> 3 Supercluster: Hippocampal CA1-3 74979 FALSE md5-n
+#> 4 Source of transform G69jtgzKO0eJ6K79 NA FALSE md5
+#> 5 Report of run UAAiLAi0BrLvlKnsuvP3 NA FALSE md5
+#> type created_at X_key_is_virtual
+#> 1 dataset 2024-07-12T12:12:16.091881+00:00 FALSE
+#> 2 dataset 2024-07-16T12:52:01.424629+00:00 FALSE
+#> 3 <NA> 2023-11-28T21:46:12.685907+00:00 FALSE
+#> 4 <NA> 2024-01-29T08:32:13.311741+00:00 TRUE
+#> 5 <NA> 2024-01-29T08:32:18.346499+00:00 TRUE
+#> updated_at version
+#> 1 2024-09-17T13:00:13.714256+00:00 2023-12-15
+#> 2 2024-09-17T13:01:23.739635+00:00 2024-07-01
+#> 3 2024-01-24T07:10:21.725547+00:00 2023-07-25
+#> 4 2024-01-29T08:32:13.311792+00:00 0
+#> 5 2024-01-30T09:12:06.027928+00:00 1
Working with records +
+You can fetch a specific record from a registry using its ID or UID. +For instance, to get the artifact with UID KBW89Mf7IGcekja2hADu:
+
+artifact <- db$Artifact$get("KBW89Mf7IGcekja2hADu")
This artifact contains an AnnData
object with myeloid
+cell data. You can view its metadata:
+artifact
+#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE)
For artifact records, you can get more detailed information:
+
+artifact$describe()
+#> Artifact(uid='KBW89Mf7IGcekja2hADu', description='Myeloid compartment', key='cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad', id=3659, run_id=27, hash='SZ5tB0T4YKfiUuUkAL09ZA', size=691757462, type='dataset', suffix='.h5ad', storage_id=2, version='2024-07-01', _accessor='AnnData', is_latest=TRUE, transform_id=22, _hash_type='md5-n', created_at='2024-07-12T12:34:10.345829+00:00', created_by_id=1, updated_at='2024-07-12T12:40:48.837026+00:00', visibility=1, n_observations=51552, _key_is_virtual=FALSE)
+#> Provenance
+#> $storage = 's3://cellxgene-data-public'
+#> $transform = 'Census release 2024-07-01 (LTS)'
+#> $run = '2024-07-16T12:49:41.81955+00:00'
+#> $created_by = 'sunnyosun'
Access specific fields of the record using the $
+operator:
+artifact$id
+#> [1] 3659
+artifact$uid
+#> [1] "KBW89Mf7IGcekja2hADu"
+artifact$key
+#> [1] "cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad"
Some fields of a record contain links to related information.
+
+artifact$storage
+#> Storage(uid='oIYGbD74', root='s3://cellxgene-data-public', id=2, type='s3', region='us-west-2', created_at='2023-09-19T13:17:56.273068+00:00', created_by_id=1, updated_at='2023-10-16T15:04:08.998203+00:00')
+artifact$developmental_stages
+#> RelatedRecords(field_name='developmental_stages', relation_type='many-to-many', related_to='KBW89Mf7IGcekja2hADu')
When those that are one-to-many or many-to-many relationship, a +summary of the related information can be retrieved as a data frame.
+
+artifact$developmental_stages$df()
+#> id uid abbr name synonyms
+#> 1 422 1xebUrrX NA sixth decade human stage NA
+#> 2 423 3yuYMeZt NA seventh decade human stage NA
+#> 3 424 2EztBuvx NA eighth decade human stage NA
+#> created_at updated_at
+#> 1 2023-11-28T23:05:31.450102+00:00 2023-11-28T23:05:31.450106+00:00
+#> 2 2023-11-28T23:05:31.450123+00:00 2023-11-28T23:05:31.450127+00:00
+#> 3 2023-11-28T23:05:31.450144+00:00 2023-11-28T23:05:31.450149+00:00
+#> description
+#> 1 Human Stage That Refers To An Individual Who Is Over 50 And Under 60 Years Old.
+#> 2 Human Stage That Refers To An Individual Who Is Over 60 And Under 70 Years Old.
+#> 3 Human Stage That Refers To An Individual Who Is Over 70 And Under 80 Years Old.
+#> ontology_id
+#> 1 HsapDv:0000240
+#> 2 HsapDv:0000241
+#> 3 HsapDv:0000242
Finally, for artifact records only, you can download the associated +data:
+
+artifact$cache() # Cache the data locally
+#> | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |== | 4% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========= | 14% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================= | 34% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================ | 41% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |============================== | 44% | |=============================== | 44% | |=============================== | 45% | |================================ | 45% | |================================ | 46% | |================================= | 46% | |================================= | 47% | |================================= | 48% | |================================== | 48% | |================================== | 49% | |=================================== | 49% | |=================================== | 50% | |=================================== | 51% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |===================================== | 54% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================= | 71% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |=================================================== | 74% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================= | 94% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 96% | |==================================================================== | 97% | |==================================================================== | 98% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 99% | |======================================================================| 100%
+artifact$load() # Load the data into memory
+#> ℹ s3://cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad already exists at /home/runner/.cache/lamindb/cellxgene-data-public/cell-census/2024-07-01/h5ads/fe52003e-1460-4a65-a213-2bb1a508332f.h5ad
+#> AnnData object with n_obs × n_vars = 51552 × 36398
+#> obs: 'donor_id', 'Predicted_labels_CellTypist', 'Majority_voting_CellTypist', 'Manually_curated_celltype', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sex_ontology_term_id', 'tissue_ontology_term_id', 'suspension_type', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid'
+#> var: 'gene_symbols', 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype', 'feature_length'
+#> uns: 'cell_type_ontology_term_id_colors', 'citation', 'default_embedding', 'schema_reference', 'schema_version', 'sex_ontology_term_id_colors', 'title'
+#> obsm: 'X_umap'
Currently, {laminr} primarily supports S3 storage.
+Support for other storage backends will be added in the future. For more
+information related to planned features and the roadmap, please refer to
+the Development vignette
+(vignette("development", package = "laminr")
).
Feature List and Roadmap
- Source:vignettes/development.qmd
+ Source: vignettes/development.qmd