From 6f635709a883e9502cb36a06a9a3eb1a717a30ae Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Tue, 12 Nov 2024 08:44:31 +0100 Subject: [PATCH 01/11] Add CELLxGENE workflow outline --- vignettes/cellxgene_basic.Rmd | 53 +++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 vignettes/cellxgene_basic.Rmd diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd new file mode 100644 index 0000000..ae94a23 --- /dev/null +++ b/vignettes/cellxgene_basic.Rmd @@ -0,0 +1,53 @@ +--- +title: "Basic CELLxGENE workflow" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Wetlab Module} + %\VignetteEncoding{UTF-8} + %\VignetteEngine{knitr::rmarkdown} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +# Introduction + +- CELLxGENE +- What's in this tutorial? + - Connect to CELLxGENE + - Download a dataset + - Convert to Seurat + - Perform simple analysis + - Save and upload results + - Save and upload report (?) + +# Connect to a LaminDB instance + +- Connect to CELLxGENE instance +- Show instance object + +# Downloading a dataset + +- Show Artifact registry +- Go to CELLxGENE Lamin website +- Find a dataset, get the ID +- Cache and load the object locally + +# Convert to Seurat + +- CELLxGENE stores AnnData, we want Seurat +- Do the conversion + +# Analysis + +- Follow the Seurat tutorial to calculate marker genes +- Save results as a text file + +# Add results + +- Add the results as an artifact +- Render and add report (?) From e02c1ebcf292bcb4cad9b38c749720b9f57c35b8 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Tue, 12 Nov 2024 16:41:46 +0100 Subject: [PATCH 02/11] Add introduction to CELLxGENE vignette --- vignettes/cellxgene_basic.Rmd | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index ae94a23..9b55962 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -15,16 +15,16 @@ knitr::opts_chunk$set( ``` # Introduction - -- CELLxGENE -- What's in this tutorial? - - Connect to CELLxGENE - - Download a dataset - - Convert to Seurat - - Perform simple analysis - - Save and upload results - - Save and upload report (?) +This vignette demonstrates a basic LaminDB workflow using the public CELLxGENE instance. +[CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) is a standardised collection of scRNA-seq datasets and LaminDB makes it easy to query and access data in this repository. +We will go through the steps of finding and downloading a dataset using **{laminr}**, performing some simple analysis using **{Seurat}** and saving the results in a LaminDB database. + +# Before we start + +Before we go begin, please take some time to check out the Getting Started vignette (`vignette("laminr", package = "laminr")`). +In particular, make sure you have run the commands in the "Initial Setup" section. + # Connect to a LaminDB instance - Connect to CELLxGENE instance From 9c7ecab5280aaeb51447c38d61270040f378b068 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Tue, 12 Nov 2024 17:12:23 +0100 Subject: [PATCH 03/11] Find and download artifact in CxG vignette --- vignettes/cellxgene_basic.Rmd | 52 +++++++++++++++++++++++++++++++---- 1 file changed, 46 insertions(+), 6 deletions(-) diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index 9b55962..b669589 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -25,17 +25,57 @@ We will go through the steps of finding and downloading a dataset using **{lamin Before we go begin, please take some time to check out the Getting Started vignette (`vignette("laminr", package = "laminr")`). In particular, make sure you have run the commands in the "Initial Setup" section. +Once that is done, we can load the **{laminr}** library. + +```{r library} +library(laminr) +``` + # Connect to a LaminDB instance -- Connect to CELLxGENE instance -- Show instance object +We will start by connecting to the CELLxGENE instance. +This gives us an object we can use to interact with the database. + +```{r connect} +cellxgene <- connect("laminlabs/cellxgene") + +cellxgene +``` # Downloading a dataset -- Show Artifact registry -- Go to CELLxGENE Lamin website -- Find a dataset, get the ID -- Cache and load the object locally +In Lamin, artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. +You can see what artifacts are available using the database object. + +```{r list-artifacts} +cellxgene$Artifact$df(limit = 5) +``` + +This is useful, but it's not the nicest or easiest way to find a particular dataset. +Instead, we will use the Lamin Hub website to find the data we want to load. + +1. Open a browser and go to https://lamin.ai/laminlabs/cellxgene +2. On the top toolbar, click the "Artifacts" tab +3. Use the search field and the filters to find a dataset you are interested in. + - We use the "Suffix" filter to find `.h5ad` files and search for "renal cell carcinoma" +4. Select the entry for the dataset you want to load to open a page with more details +5. Click the copy button at the top right, this copies a command including the ID for the artifact + +Once we have the artifact ID, we can load information about the artifact, similar to what we see on the website. +Notice that we use a slightly different command to what we copied from the website. + +```{r get-artifact} +artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk") +artifact +``` + +So far we have only retrieved the metadata about this object. +To download the data itself we need to run another command. + +```{r load-artifact} +adata <- artifact$load() +adata +``` # Convert to Seurat From 2fb1b9a047f3e10e1243ce6c796508379d315e7c Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Wed, 13 Nov 2024 11:38:07 +0100 Subject: [PATCH 04/11] Create Seurat object and get markers --- vignettes/cellxgene_basic.Rmd | 36 +++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index b669589..1dfa0ec 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -38,13 +38,12 @@ This gives us an object we can use to interact with the database. ```{r connect} cellxgene <- connect("laminlabs/cellxgene") - cellxgene ``` # Downloading a dataset -In Lamin, artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. +In Lamin, artifacts are objects that[;] contain information (single-cell data, images, data frames etc.) as well as associated metadata. You can see what artifacts are available using the database object. ```{r list-artifacts} @@ -77,15 +76,40 @@ adata <- artifact$load() adata ``` +This dataset has been stored as an [`AnnData`](https://anndata.readthedocs.io) object. +In the next sections we will convert it to a [`Seurat`](https://satijalab.org/seurat/) object and perform some simple analysis. + # Convert to Seurat -- CELLxGENE stores AnnData, we want Seurat -- Do the conversion +There are various approaches for converting between different single-cell objects, some of which are described in the [Interoperability chapter](https://www.sc-best-practices.org/introduction/interoperability.html) of the Single-cell Best Practices book. + +Because we already have the data loaded in memory, the simplest option is to extract the information we need and create a new `Seurat` object. + +```{r create-seurat} +seurat <- SeuratObject::CreateSeuratObject( + counts = Matrix::t(adata$X), + meta.data = adata$obs, +) +seurat +``` # Analysis -- Follow the Seurat tutorial to calculate marker genes -- Save results as a text file +We could perform any normal analysis using **{Seurat}** but as an example we will calculate marker genes for each of the annotated cell types. +To make things a bit quicker we only test the first 1000 genes but if you have a few minutes you can get results for all features. + +```{r markers} +# Set cell identities to the provided cell type annotation +SeuratObject::Idents(seurat) <- "Cell_Type" +# Normalise the data +seurat <- Seurat::NormalizeData(seurat) +# Test for marker genes +markers <- Seurat::FindAllMarkers( + seurat, features = SeuratObject::Features(seurat)[1:1000] +) +# The output is a data.frame +head(markers) +``` # Add results From d6f82b5f48ce25c36666738afbc264ebadc7096e Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Tue, 19 Nov 2024 15:07:15 +0100 Subject: [PATCH 05/11] Connect to default instance in CxG vignette --- vignettes/cellxgene_basic.Rmd | 39 +++++++++++++++++++++++++++++------ 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index 1dfa0ec..416cccb 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -18,7 +18,7 @@ knitr::opts_chunk$set( This vignette demonstrates a basic LaminDB workflow using the public CELLxGENE instance. [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) is a standardised collection of scRNA-seq datasets and LaminDB makes it easy to query and access data in this repository. -We will go through the steps of finding and downloading a dataset using **{laminr}**, performing some simple analysis using **{Seurat}** and saving the results in a LaminDB database. +We will go through the steps of finding and downloading a dataset using **{laminr}**, performing some simple analysis using **{Seurat}** and saving the results your own LaminDB database. # Before we start @@ -31,20 +31,47 @@ Once that is done, we can load the **{laminr}** library. library(laminr) ``` -# Connect to a LaminDB instance +# Connecting to LaminDB + +The first thing we need to do is connect to the LaminDB database. +For this tutorial, we will connect a default instance (where we will store results) and the CELLxGENE instance that we will search for datasets. + +## Connect to the default instance + +We will start by connecting to your default LaminDB instance. +You can set set the default instance using the `lamin` CLI on the command line: + +```shell +lamin connect / +``` + +Once a default instance has been set, we can connect to it with **{laminr}**: + +```{r connect-default} +db <- connect() +db +``` -We will start by connecting to the CELLxGENE instance. This gives us an object we can use to interact with the database. -```{r connect} +**Note** that only the default instance can create new records. +This tutorial assumes you have access to an instance where you have permission to add data. + +## Connect to the CELLxGENE instance + +We can connect to other instances by providing a slug to the `connect()` function. +Instances connected to in this way can be used to query data but cannot make any changes. +Let's connect to the CELLxGENE instance: + +```{r connect-cellxgene} cellxgene <- connect("laminlabs/cellxgene") cellxgene ``` # Downloading a dataset -In Lamin, artifacts are objects that[;] contain information (single-cell data, images, data frames etc.) as well as associated metadata. -You can see what artifacts are available using the database object. +In Lamin, artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. +You can see what artifacts are available using the database instance object. ```{r list-artifacts} cellxgene$Artifact$df(limit = 5) From 1d0f8cd7a31f90c89b865356063690f35da3b6cb Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 11:53:47 +0100 Subject: [PATCH 06/11] update vignette --- vignettes/cellxgene_basic.Rmd | 58 +++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index 416cccb..88ea768 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -57,6 +57,16 @@ This gives us an object we can use to interact with the database. **Note** that only the default instance can create new records. This tutorial assumes you have access to an instance where you have permission to add data. +## Track data provenance + +Before we start, we will track the code that is run in this notebook. + +```{r} +db$track("4p2CNy60f3CR0002", path = "cellxgene_basic.Rmd") +``` + +Tip: The ID should be obtained by running `db$track(path = "cellxgene_basic.Rmd")` and copying the ID from the output. + ## Connect to the CELLxGENE instance We can connect to other instances by providing a slug to the `connect()` function. @@ -138,7 +148,49 @@ markers <- Seurat::FindAllMarkers( head(markers) ``` -# Add results +# Store the results in LaminDB + +Now that we have our results, we can save them to the LaminDB instance. + +```{r} +seu_path <- tempfile(fileext = ".rds") +saveRDS(seurat, seu_path) + +db$Artifact$from_df( + markers, + description = "Marker genes for renal cell carcinoma dataset" +)$save() + +db$Artifact$from_path( + seu_path, + description = "Seurat object for renal cell carcinoma dataset" +)$save() +``` -- Add the results as an artifact -- Render and add report (?) +# Close the connection + +Finally, we can close the connection to the database. + +```{r} +db$finish() +``` + +# Render and upload the notebook + +You can render this notebook to HTML: + +- In RStudio, click the "Knit" button +- From the command line, run: + ```bash + Rscript -e 'rmarkdown::render("cellxgene_basic.Rmd")' + ``` +- Or use the `rmarkdown` package in R: + ```r + rmarkdown::render("cellxgene_basic.Rmd") + ``` + +And then save it to your LaminDB instance using the `lamin` CLI: + +```bash +lamin save cellxgene_basic.Rmd +``` From 24fb7aabadd6f1b94773c0831d8a89de0eb08623 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 12:07:56 +0100 Subject: [PATCH 07/11] run styler --- tests/testthat/test-Artifact.R | 12 ++++++++---- vignettes/cellxgene_basic.Rmd | 7 ++++--- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/tests/testthat/test-Artifact.R b/tests/testthat/test-Artifact.R index ef62d04..0952671 100644 --- a/tests/testthat/test-Artifact.R +++ b/tests/testthat/test-Artifact.R @@ -14,7 +14,8 @@ test_that("creating an artifact from a data frame works", { ) new_artifact <- db$Artifact$from_df( - dataframe, description = dataframe$Description + dataframe, + description = dataframe$Description ) expect_s3_class(new_artifact, "TemporaryArtifact") @@ -33,7 +34,8 @@ test_that("creating an artifact from a file works", { ) new_artifact <- db$Artifact$from_path( - temp_file, description = "laminr test file" + temp_file, + description = "laminr test file" ) expect_s3_class(new_artifact, "TemporaryArtifact") @@ -54,7 +56,8 @@ test_that("creating an artifact from a directory works", { ) new_artifact <- db$Artifact$from_path( - temp_dir, description = "laminr test directory" + temp_dir, + description = "laminr test directory" ) expect_s3_class(new_artifact, "TemporaryArtifact") @@ -76,7 +79,8 @@ test_that("creating an artifact from an AnnData works", { ) new_artifact <- db$Artifact$from_df( - adata, description = adata$uns$Description + adata, + description = adata$uns$Description ) expect_s3_class(new_artifact, "TemporaryArtifact") diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/cellxgene_basic.Rmd index 88ea768..0183685 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/cellxgene_basic.Rmd @@ -62,7 +62,7 @@ This tutorial assumes you have access to an instance where you have permission t Before we start, we will track the code that is run in this notebook. ```{r} -db$track("4p2CNy60f3CR0002", path = "cellxgene_basic.Rmd") +db$track("4p2CNy60f3CR0003", path = "cellxgene_basic.Rmd") ``` Tip: The ID should be obtained by running `db$track(path = "cellxgene_basic.Rmd")` and copying the ID from the output. @@ -125,7 +125,7 @@ Because we already have the data loaded in memory, the simplest option is to ext ```{r create-seurat} seurat <- SeuratObject::CreateSeuratObject( counts = Matrix::t(adata$X), - meta.data = adata$obs, + meta.data = adata$obs, ) seurat ``` @@ -142,7 +142,8 @@ SeuratObject::Idents(seurat) <- "Cell_Type" seurat <- Seurat::NormalizeData(seurat) # Test for marker genes markers <- Seurat::FindAllMarkers( - seurat, features = SeuratObject::Features(seurat)[1:1000] + seurat, + features = SeuratObject::Features(seurat)[1:1000] ) # The output is a data.frame head(markers) From 1e471021a230405f5190c987e8347a9dced119f4 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 12:21:09 +0100 Subject: [PATCH 08/11] rename vignette --- _pkgdown.yml | 2 ++ ...{cellxgene_basic.Rmd => example_workflow.Rmd} | 16 ++++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) rename vignettes/{cellxgene_basic.Rmd => example_workflow.Rmd} (91%) diff --git a/_pkgdown.yml b/_pkgdown.yml index 80f9382..0ab246f 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -19,6 +19,8 @@ navbar: text: Articles menu: - text: Introduction + - text: Example Workflow + href: articles/example_workflow.html - text: Package Architecture href: articles/architecture.html - text: Development Roadmap diff --git a/vignettes/cellxgene_basic.Rmd b/vignettes/example_workflow.Rmd similarity index 91% rename from vignettes/cellxgene_basic.Rmd rename to vignettes/example_workflow.Rmd index 0183685..a0f0ff0 100644 --- a/vignettes/cellxgene_basic.Rmd +++ b/vignettes/example_workflow.Rmd @@ -1,8 +1,8 @@ --- -title: "Basic CELLxGENE workflow" +title: "Example Workflow: CELLxGENE" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Wetlab Module} + %\VignetteIndexEntry{Example Workflow: CELLxGENE} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- @@ -16,7 +16,7 @@ knitr::opts_chunk$set( # Introduction -This vignette demonstrates a basic LaminDB workflow using the public CELLxGENE instance. +This vignette demonstrates a basic workflow for accessing and analysing single-cell RNA-seq data from the CELLxGENE repository using **{laminr}**. [CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) is a standardised collection of scRNA-seq datasets and LaminDB makes it easy to query and access data in this repository. We will go through the steps of finding and downloading a dataset using **{laminr}**, performing some simple analysis using **{Seurat}** and saving the results your own LaminDB database. @@ -62,10 +62,10 @@ This tutorial assumes you have access to an instance where you have permission t Before we start, we will track the code that is run in this notebook. ```{r} -db$track("4p2CNy60f3CR0003", path = "cellxgene_basic.Rmd") +db$track("I8BlHXFXqZOG0000", path = "example_workflow.Rmd") ``` -Tip: The ID should be obtained by running `db$track(path = "cellxgene_basic.Rmd")` and copying the ID from the output. +Tip: The ID should be obtained by running `db$track(path = "example_workflow.Rmd")` and copying the ID from the output. ## Connect to the CELLxGENE instance @@ -183,15 +183,15 @@ You can render this notebook to HTML: - In RStudio, click the "Knit" button - From the command line, run: ```bash - Rscript -e 'rmarkdown::render("cellxgene_basic.Rmd")' + Rscript -e 'rmarkdown::render("example_workflow.Rmd")' ``` - Or use the `rmarkdown` package in R: ```r - rmarkdown::render("cellxgene_basic.Rmd") + rmarkdown::render("example_workflow.Rmd") ``` And then save it to your LaminDB instance using the `lamin` CLI: ```bash -lamin save cellxgene_basic.Rmd +lamin save example_workflow.Rmd ``` From 3d384454f6d2025acca712fdd062500191cf0380 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 12:44:13 +0100 Subject: [PATCH 09/11] don't try to submit results on the ci --- vignettes/example_workflow.Rmd | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/vignettes/example_workflow.Rmd b/vignettes/example_workflow.Rmd index a0f0ff0..2b8753c 100644 --- a/vignettes/example_workflow.Rmd +++ b/vignettes/example_workflow.Rmd @@ -12,6 +12,10 @@ knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) +# whether or not this code will be used to +# actually upload results to the LaminDB instance +# -> testuser1 is a test account that cannot upload results +submit_eval <- laminr:::.get_user_settings()$handle != "testuser1" ``` # Introduction @@ -61,7 +65,7 @@ This tutorial assumes you have access to an instance where you have permission t Before we start, we will track the code that is run in this notebook. -```{r} +```{r track, eval = submit_eval} db$track("I8BlHXFXqZOG0000", path = "example_workflow.Rmd") ``` @@ -153,7 +157,7 @@ head(markers) Now that we have our results, we can save them to the LaminDB instance. -```{r} +```{r save-results, eval = submit_eval} seu_path <- tempfile(fileext = ".rds") saveRDS(seurat, seu_path) @@ -172,7 +176,7 @@ db$Artifact$from_path( Finally, we can close the connection to the database. -```{r} +```{r close, eval = submit_eval} db$finish() ``` From 1e709123bcca815f409ef104309a268dfdcb2e49 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 12:50:44 +0100 Subject: [PATCH 10/11] add seurat to suggests for vignette --- DESCRIPTION | 1 + 1 file changed, 1 insertion(+) diff --git a/DESCRIPTION b/DESCRIPTION index 8654040..762529a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -33,6 +33,7 @@ Suggests: reticulate, rsvg, s3 (>= 1.1.0), + Seurat, testthat (>= 3.0.0), withr, yaml From 7b24e53494524ed8c045e5f888ddc5f359aecce5 Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Wed, 20 Nov 2024 12:58:41 +0100 Subject: [PATCH 11/11] styler --- vignettes/example_workflow.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/example_workflow.Rmd b/vignettes/example_workflow.Rmd index 2b8753c..5cf3975 100644 --- a/vignettes/example_workflow.Rmd +++ b/vignettes/example_workflow.Rmd @@ -12,7 +12,7 @@ knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) -# whether or not this code will be used to +# whether or not this code will be used to # actually upload results to the LaminDB instance # -> testuser1 is a test account that cannot upload results submit_eval <- laminr:::.get_user_settings()$handle != "testuser1"