From 1783c8a797f7fc08f839d738637c3db6524e4383 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 08:14:41 +0100 Subject: [PATCH 1/9] Replace "interface" with "client" --- DESCRIPTION | 2 +- README.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index 18e0659..930b6d6 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: laminr -Title: Interface for 'LaminDB' +Title: Client for 'LaminDB' Version: 0.2.0 Authors@R: c( person("Robrecht", "Cannoodt", email = "robrecht@data-intuitive.com", role = c("aut", "cre"), diff --git a/README.md b/README.md index 5d32f41..46ff51d 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# {laminr}: An R interface to LaminDB +# {laminr}: An R client for LaminDB [![CRAN status](https://www.r-pkg.org/badges/version/laminr)](https://CRAN.R-project.org/package=laminr) From f63662d3cd47f227ddfb9bbcd6438c7a610eb330 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 08:24:39 +0100 Subject: [PATCH 2/9] Adjust installation instructions in README --- README.md | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 46ff51d..69010e3 100644 --- a/README.md +++ b/README.md @@ -44,22 +44,30 @@ Get started with **{laminr}** by installing the package from CRAN: install.packages("laminr") ``` -To include all suggested dependencies for enhanced functionality, use: +You will also need to install the `lamindb` Python package: + +```bash +pip install lamindb[aws] +``` + +### Additional packages + +Some functionality requires additional packages. To install all of these use: ```r install.packages("laminr", dependencies = TRUE) ``` -This further installs: - -- anndata: For native AnnData support in R -- S3: To fetch datasets from AWS S3 +This will also install these package for the following tasks: -For now, you will also need to install the `lamindb` Python package: +- **{anndata}** - Native `AnnData` support in R +- **{nanoparquet}** - Reading `.parquet` files +- **{readr}** - Reading CSV/TSV files +- **{reticulate}** - Functionality that requires the Python `lamindb` package +- **{rsvg}** - Reading SVG files +- **{s3}** - Fetching datasets from AWS S3 -```bash -pip install lamindb[aws] -``` +If you choose not to install all packages now you will be prompted to do so whenever one is required. ## Getting started From 6a819df54ed52bdfa699afed09fee3cbdbcd39e8 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 08:32:45 +0100 Subject: [PATCH 3/9] Move "Getting started" to "Concepts and features" --- _pkgdown.yml | 4 ++-- vignettes/{laminr.Rmd => concepts_features.Rmd} | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) rename vignettes/{laminr.Rmd => concepts_features.Rmd} (97%) diff --git a/_pkgdown.yml b/_pkgdown.yml index 0ab246f..267ab3a 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -19,8 +19,8 @@ navbar: text: Articles menu: - text: Introduction - - text: Example Workflow - href: articles/example_workflow.html + - text: Concepts and features + href: articles/concepts_features.html - text: Package Architecture href: articles/architecture.html - text: Development Roadmap diff --git a/vignettes/laminr.Rmd b/vignettes/concepts_features.Rmd similarity index 97% rename from vignettes/laminr.Rmd rename to vignettes/concepts_features.Rmd index 18651f6..332d0a3 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/concepts_features.Rmd @@ -1,5 +1,5 @@ --- -title: "Getting Started" +title: "Concepts and features" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started} @@ -14,7 +14,7 @@ knitr::opts_chunk$set( ) ``` -This vignette provides a practical introduction to using the **{laminr}** package to interact with LaminDB. +This vignette provides a more detailed introduction to the concepts and features of **{laminr}**. We'll start with a brief overview of key concepts and then walk through the basic steps to connect to a LaminDB instance and work with its core components. ## Key Concepts in LaminDB From 321cc18cc61096b3639b463af4fd39df141b83bb Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 08:40:21 +0100 Subject: [PATCH 4/9] Move "Example workflow" to "Getting started" --- vignettes/concepts_features.Rmd | 2 +- vignettes/{example_workflow.Rmd => laminr.Rmd} | 11 +++++------ 2 files changed, 6 insertions(+), 7 deletions(-) rename vignettes/{example_workflow.Rmd => laminr.Rmd} (90%) diff --git a/vignettes/concepts_features.Rmd b/vignettes/concepts_features.Rmd index 332d0a3..3d8488b 100644 --- a/vignettes/concepts_features.Rmd +++ b/vignettes/concepts_features.Rmd @@ -2,7 +2,7 @@ title: "Concepts and features" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Getting Started} + %\VignetteIndexEntry{Concepts and features} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- diff --git a/vignettes/example_workflow.Rmd b/vignettes/laminr.Rmd similarity index 90% rename from vignettes/example_workflow.Rmd rename to vignettes/laminr.Rmd index 5cf3975..2def024 100644 --- a/vignettes/example_workflow.Rmd +++ b/vignettes/laminr.Rmd @@ -1,8 +1,8 @@ --- -title: "Example Workflow: CELLxGENE" +title: "Getting started" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Example Workflow: CELLxGENE} + %\VignetteIndexEntry{Getting started} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- @@ -19,10 +19,9 @@ submit_eval <- laminr:::.get_user_settings()$handle != "testuser1" ``` # Introduction - -This vignette demonstrates a basic workflow for accessing and analysing single-cell RNA-seq data from the CELLxGENE repository using **{laminr}**. -[CZ CELLxGENE Discover](https://cellxgene.cziscience.com/) is a standardised collection of scRNA-seq datasets and LaminDB makes it easy to query and access data in this repository. -We will go through the steps of finding and downloading a dataset using **{laminr}**, performing some simple analysis using **{Seurat}** and saving the results your own LaminDB database. + +This vignettes provides a quick introduction to the **{laminr}** workflow. +For more details about how **{laminr}** works see `vignette("concepts_features", package = "laminr")`. # Before we start From 528f8b8e6619753d57f5ebaae57dd4f5c67fb640 Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 09:59:09 +0100 Subject: [PATCH 5/9] Update the getting started vignette --- vignettes/laminr.Rmd | 138 +++++++++++++++++++++---------------------- 1 file changed, 66 insertions(+), 72 deletions(-) diff --git a/vignettes/laminr.Rmd b/vignettes/laminr.Rmd index 2def024..00ab9e9 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/laminr.Rmd @@ -23,26 +23,40 @@ submit_eval <- laminr:::.get_user_settings()$handle != "testuser1" This vignettes provides a quick introduction to the **{laminr}** workflow. For more details about how **{laminr}** works see `vignette("concepts_features", package = "laminr")`. -# Before we start +# Installation -Before we go begin, please take some time to check out the Getting Started vignette (`vignette("laminr", package = "laminr")`). -In particular, make sure you have run the commands in the "Initial Setup" section. +Install **{laminr}** from CRAN using: -Once that is done, we can load the **{laminr}** library. +```r +install.packages("laminr") +``` -```{r library} -library(laminr) +You will also need to install the `lamindb` Python package: + +```bash +pip install lamindb[aws] +``` + +Some functionality requires additional packages. +You will be prompted to install them as needed or you can install them all now with: + +```r +install.packages("laminr", dependencies = TRUE) ``` +See the "Initial setup" section of `vignette("concepts_features", package = "laminr")` for more details. + # Connecting to LaminDB -The first thing we need to do is connect to the LaminDB database. -For this tutorial, we will connect a default instance (where we will store results) and the CELLxGENE instance that we will search for datasets. +Load **{laminr}** to get started. + +```{r library} +library(laminr) +``` ## Connect to the default instance -We will start by connecting to your default LaminDB instance. -You can set set the default instance using the `lamin` CLI on the command line: +The default LaminDB instance is set using the `lamin` CLI on the command line: ```shell lamin connect / @@ -55,60 +69,44 @@ db <- connect() db ``` -This gives us an object we can use to interact with the database. - -**Note** that only the default instance can create new records. +**Note:** Only the default instance can create new records. This tutorial assumes you have access to an instance where you have permission to add data. -## Track data provenance +## Connect to other instances -Before we start, we will track the code that is run in this notebook. - -```{r track, eval = submit_eval} -db$track("I8BlHXFXqZOG0000", path = "example_workflow.Rmd") -``` - -Tip: The ID should be obtained by running `db$track(path = "example_workflow.Rmd")` and copying the ID from the output. - -## Connect to the CELLxGENE instance - -We can connect to other instances by providing a slug to the `connect()` function. +It is possible to connect to non-default instances by providing a slug to the `connect()` function. Instances connected to in this way can be used to query data but cannot make any changes. -Let's connect to the CELLxGENE instance: +Let's connect to the public CELLxGENE instance: ```{r connect-cellxgene} cellxgene <- connect("laminlabs/cellxgene") cellxgene ``` -# Downloading a dataset +# Track data provenance -In Lamin, artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. -You can see what artifacts are available using the database instance object. +LaminDB can track which scripts or notebooks were used to create data. +This command starts the tracking process. -```{r list-artifacts} -cellxgene$Artifact$df(limit = 5) +```{r track, eval = submit_eval} +db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd") ``` -This is useful, but it's not the nicest or easiest way to find a particular dataset. -Instead, we will use the Lamin Hub website to find the data we want to load. +**Tip:** The ID should be obtained by running `db$track(path = "your_file.R")` and copying the ID from the output. -1. Open a browser and go to https://lamin.ai/laminlabs/cellxgene -2. On the top toolbar, click the "Artifacts" tab -3. Use the search field and the filters to find a dataset you are interested in. - - We use the "Suffix" filter to find `.h5ad` files and search for "renal cell carcinoma" -4. Select the entry for the dataset you want to load to open a page with more details -5. Click the copy button at the top right, this copies a command including the ID for the artifact +# Download a dataset -Once we have the artifact ID, we can load information about the artifact, similar to what we see on the website. -Notice that we use a slightly different command to what we copied from the website. +Artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. ```{r get-artifact} artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk") artifact ``` -So far we have only retrieved the metadata about this object. +**Tip:** You can view information about this dataset at https://lamin.ai/laminlabs/cellxgene/artifact/7dVluLROpalzEh8mNyxk. +This website can also be used to search for other CELLxGENE datasets. + +So far we have only retrieved the metadata about this artifact. To download the data itself we need to run another command. ```{r load-artifact} @@ -116,49 +114,42 @@ adata <- artifact$load() adata ``` -This dataset has been stored as an [`AnnData`](https://anndata.readthedocs.io) object. -In the next sections we will convert it to a [`Seurat`](https://satijalab.org/seurat/) object and perform some simple analysis. - -# Convert to Seurat +You can see that this artifact contains an [`AnnData`](https://anndata.readthedocs.io) object. -There are various approaches for converting between different single-cell objects, some of which are described in the [Interoperability chapter](https://www.sc-best-practices.org/introduction/interoperability.html) of the Single-cell Best Practices book. +# Work with the data -Because we already have the data loaded in memory, the simplest option is to extract the information we need and create a new `Seurat` object. +Once you have loaded a dataset you can perform any analysis with it as you would normally. +As a quick example we calculate marker genes for each of the provided cell type labels using [**{Seurat}**](https://satijalab.org/seurat/). ```{r create-seurat} +# Create a Seurat object seurat <- SeuratObject::CreateSeuratObject( - counts = Matrix::t(adata$X), + counts = as(Matrix::t(adata$X), "CsparseMatrix"), meta.data = adata$obs, ) -seurat -``` - -# Analysis - -We could perform any normal analysis using **{Seurat}** but as an example we will calculate marker genes for each of the annotated cell types. -To make things a bit quicker we only test the first 1000 genes but if you have a few minutes you can get results for all features. - -```{r markers} # Set cell identities to the provided cell type annotation SeuratObject::Idents(seurat) <- "Cell_Type" # Normalise the data seurat <- Seurat::NormalizeData(seurat) -# Test for marker genes +seurat <- Seurat::ScaleData(seurat) +# Test for marker genes (the output is a data.frame) markers <- Seurat::FindAllMarkers( seurat, - features = SeuratObject::Features(seurat)[1:1000] + features = SeuratObject::Features(seurat)[1:100] # Only test a few features for speed ) -# The output is a data.frame -head(markers) +# Display the marker genes +knitr::kable(markers) +# Plot the marker genes +Seurat::DoHeatmap(seurat, features = markers$gene) ``` # Store the results in LaminDB -Now that we have our results, we can save them to the LaminDB instance. +Any results can be saved to the default LaminDB instance. ```{r save-results, eval = submit_eval} -seu_path <- tempfile(fileext = ".rds") -saveRDS(seurat, seu_path) +seurat_path <- tempfile(fileext = ".rds") +saveRDS(seurat, seurat_path) db$Artifact$from_df( markers, @@ -166,22 +157,25 @@ db$Artifact$from_df( )$save() db$Artifact$from_path( - seu_path, + seurat_path, description = "Seurat object for renal cell carcinoma dataset" )$save() ``` -# Close the connection +# Stop tracking -Finally, we can close the connection to the database. +When we are done we end the tracking run. -```{r close, eval = submit_eval} +```{r finish, eval = submit_eval} db$finish() ``` -# Render and upload the notebook +# Upload notebooks and code + +Running this code with upload the created datasets but not the code itself. +Do do that: -You can render this notebook to HTML: +1. Render the notebook to HTML (not needed for `.R` scripts) - In RStudio, click the "Knit" button - From the command line, run: @@ -196,5 +190,5 @@ You can render this notebook to HTML: And then save it to your LaminDB instance using the `lamin` CLI: ```bash -lamin save example_workflow.Rmd +lamin save laminr.Rmd ``` From 7ad7fd14893e7c09126313df7738b618d5af979c Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 10:12:46 +0100 Subject: [PATCH 6/9] Update getting started vignette content --- vignettes/laminr.Rmd | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/vignettes/laminr.Rmd b/vignettes/laminr.Rmd index 00ab9e9..dd790fb 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/laminr.Rmd @@ -69,8 +69,12 @@ db <- connect() db ``` -**Note:** Only the default instance can create new records. + ## Connect to other instances @@ -92,7 +96,11 @@ This command starts the tracking process. db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd") ``` -**Tip:** The ID should be obtained by running `db$track(path = "your_file.R")` and copying the ID from the output. + # Download a dataset @@ -103,8 +111,12 @@ artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk") artifact ``` -**Tip:** You can view information about this dataset at https://lamin.ai/laminlabs/cellxgene/artifact/7dVluLROpalzEh8mNyxk. + So far we have only retrieved the metadata about this artifact. To download the data itself we need to run another command. @@ -178,16 +190,16 @@ Do do that: 1. Render the notebook to HTML (not needed for `.R` scripts) - In RStudio, click the "Knit" button -- From the command line, run: +- **OR** From the command line, run: ```bash - Rscript -e 'rmarkdown::render("example_workflow.Rmd")' + Rscript -e 'rmarkdown::render("laminr.Rmd")' ``` -- Or use the `rmarkdown` package in R: +- **OR** Use the `rmarkdown` package in R: ```r - rmarkdown::render("example_workflow.Rmd") + rmarkdown::render("laminr.Rmd") ``` -And then save it to your LaminDB instance using the `lamin` CLI: +2. Save it to your LaminDB instance using the `lamin` CLI: ```bash lamin save laminr.Rmd From eb05681968a820fbfe84a4a89f336a9bb9a0e23e Mon Sep 17 00:00:00 2001 From: Luke Zappia Date: Thu, 21 Nov 2024 10:59:03 +0100 Subject: [PATCH 7/9] Change Seurat plotting function Avoid scaling so the file stays small --- vignettes/laminr.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/laminr.Rmd b/vignettes/laminr.Rmd index dd790fb..215d06a 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/laminr.Rmd @@ -143,7 +143,6 @@ seurat <- SeuratObject::CreateSeuratObject( SeuratObject::Idents(seurat) <- "Cell_Type" # Normalise the data seurat <- Seurat::NormalizeData(seurat) -seurat <- Seurat::ScaleData(seurat) # Test for marker genes (the output is a data.frame) markers <- Seurat::FindAllMarkers( seurat, @@ -152,7 +151,8 @@ markers <- Seurat::FindAllMarkers( # Display the marker genes knitr::kable(markers) # Plot the marker genes -Seurat::DoHeatmap(seurat, features = markers$gene) +Seurat::DotPlot(seurat, features = unique(markers$gene)) + + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)) ``` # Store the results in LaminDB From 8c3a1c5c3f4c7684bd204f689022d2acd4e7b996 Mon Sep 17 00:00:00 2001 From: zethson Date: Thu, 21 Nov 2024 11:37:00 +0100 Subject: [PATCH 8/9] Wording Signed-off-by: zethson --- vignettes/laminr.Rmd | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/vignettes/laminr.Rmd b/vignettes/laminr.Rmd index 215d06a..f2308e3 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/laminr.Rmd @@ -62,7 +62,7 @@ The default LaminDB instance is set using the `lamin` CLI on the command line: lamin connect / ``` -Once a default instance has been set, we can connect to it with **{laminr}**: +Once a default instance has been set, connect to it with **{laminr}**: ```{r connect-default} db <- connect() @@ -80,7 +80,7 @@ This tutorial assumes you have access to an instance where you have permission t It is possible to connect to non-default instances by providing a slug to the `connect()` function. Instances connected to in this way can be used to query data but cannot make any changes. -Let's connect to the public CELLxGENE instance: +Connect to the public CELLxGENE instance: ```{r connect-cellxgene} cellxgene <- connect("laminlabs/cellxgene") @@ -90,7 +90,7 @@ cellxgene # Track data provenance LaminDB can track which scripts or notebooks were used to create data. -This command starts the tracking process. +Starts the tracking process: ```{r track, eval = submit_eval} db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd") @@ -104,7 +104,7 @@ The ID should be obtained by running `db$track(path = "your_file.R")` and copyin # Download a dataset -Artifacts are objects that contain information (single-cell data, images, data frames etc.) as well as associated metadata. +Artifacts are objects that contain measurements as well as associated metadata. ```{r get-artifact} artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk") @@ -114,12 +114,12 @@ artifact -So far we have only retrieved the metadata about this artifact. -To download the data itself we need to run another command. +So far only retrieved the metadata of this artifact has been retrieved. +To download the data itself, run: ```{r load-artifact} adata <- artifact$load() @@ -131,7 +131,7 @@ You can see that this artifact contains an [`AnnData`](https://anndata.readthedo # Work with the data Once you have loaded a dataset you can perform any analysis with it as you would normally. -As a quick example we calculate marker genes for each of the provided cell type labels using [**{Seurat}**](https://satijalab.org/seurat/). +Here, marker genes are exemplarily calculated for each of the provided cell type labels using [**{Seurat}**](https://satijalab.org/seurat/). ```{r create-seurat} # Create a Seurat object @@ -155,7 +155,7 @@ Seurat::DotPlot(seurat, features = unique(markers$gene)) + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5)) ``` -# Store the results in LaminDB +# Save the results to your instance Any results can be saved to the default LaminDB instance. @@ -174,27 +174,29 @@ db$Artifact$from_path( )$save() ``` -# Stop tracking +# Finish tracking -When we are done we end the tracking run. +End the tracking run to generate a timestamp: ```{r finish, eval = submit_eval} db$finish() ``` -# Upload notebooks and code +## Save notebooks and code -Running this code with upload the created datasets but not the code itself. -Do do that: +Save the tracked notebook to your instance: 1. Render the notebook to HTML (not needed for `.R` scripts) - In RStudio, click the "Knit" button -- **OR** From the command line, run: +- **OR** From the command line, run: + ```bash Rscript -e 'rmarkdown::render("laminr.Rmd")' ``` -- **OR** Use the `rmarkdown` package in R: + +- **OR** Use the `rmarkdown` package in R: + ```r rmarkdown::render("laminr.Rmd") ``` From 0be1f034c7ee2f9c419095da9fbd0eb5c9bd35ac Mon Sep 17 00:00:00 2001 From: Robrecht Cannoodt Date: Thu, 21 Nov 2024 13:36:08 +0100 Subject: [PATCH 9/9] Update vignettes/laminr.Rmd Co-authored-by: Luke Zappia --- vignettes/laminr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/laminr.Rmd b/vignettes/laminr.Rmd index f2308e3..c1195d9 100644 --- a/vignettes/laminr.Rmd +++ b/vignettes/laminr.Rmd @@ -131,7 +131,7 @@ You can see that this artifact contains an [`AnnData`](https://anndata.readthedo # Work with the data Once you have loaded a dataset you can perform any analysis with it as you would normally. -Here, marker genes are exemplarily calculated for each of the provided cell type labels using [**{Seurat}**](https://satijalab.org/seurat/). +Here, marker genes are calculated for each of the provided cell type labels using [**{Seurat}**](https://satijalab.org/seurat/). ```{r create-seurat} # Create a Seurat object