Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish the get-started guide #97

Merged
merged 3 commits into from
Nov 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 54 additions & 60 deletions vignettes/laminr.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: "Getting started"
title: "Get started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Getting started}
%\VignetteIndexEntry{Get started}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---
Expand All @@ -18,79 +18,52 @@ knitr::opts_chunk$set(
submit_eval <- laminr:::.get_user_settings()$handle != "testuser1"
```

# Introduction
This vignette introduces the basic **{laminr}** workflow.

This vignettes provides a quick introduction to the **{laminr}** workflow.
For more details about how **{laminr}** works see `vignette("concepts_features", package = "laminr")`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence now comes at the very end of the tutorial so that users don't lose time reading boilerplate.


# Installation
# Setup

Install **{laminr}** from CRAN using:
Install **{laminr}** from CRAN:

```r
install.packages("laminr")
```

You will also need to install the `lamindb` Python package:
Install `lamindb` from PyPI:

```bash
pip install lamindb[aws]
pip install 'lamindb[aws]'
```

Some functionality requires additional packages.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's super critical that connecting on the command line to configure the default instance comes now.

You will be prompted to install them as needed or you can install them all now with:
Connect to a LaminDB instance on the command line:

```r
install.packages("laminr", dependencies = TRUE)
```shell
lamin connect <owner>/<name>
```

See the "Initial setup" section of `vignette("concepts_features", package = "laminr")` for more details.
This instance acts as the default instance for everything that follows.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is clear enough. If another sentence is needed please add.

Any new records or other changes will be added here.

# Connecting to LaminDB
# Connect to the default instance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it problematic to say "connect to the default instance" in this heading, @lazappi, because that just happened 3 lines above on the command line. Don't you agree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it reads now:

image

I felt that if one calls it "Start an R session" and then people see that they create an instance object by calling connect() in R one avoids calling two different things (CLI connect vs R connect()) with the same name "connect to the default instance".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in new PR.


Load **{laminr}** to get started.

```{r library}
library(laminr)
```

## Connect to the default instance

The default LaminDB instance is set using the `lamin` CLI on the command line:

```shell
lamin connect <owner>/<name>
```

Once a default instance has been set, connect to it with **{laminr}**:
Create your default database `db` object for this R session:

```{r connect-default}
db <- connect()
db
```

<div class="alert alert-warning" role="alert">
**Note**

Only the default instance can create new records.
This tutorial assumes you have access to an instance where you have permission to add data.
</div>

## Connect to other instances

It is possible to connect to non-default instances by providing a slug to the `connect()` function.
Instances connected to in this way can be used to query data but cannot make any changes.
Connect to the public CELLxGENE instance:

```{r connect-cellxgene}
cellxgene <- connect("laminlabs/cellxgene")
cellxgene
```
It is used to manage all datasets and metadata entities.

# Track data provenance
# Track data lineage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People should get used to calling track() first thing.


LaminDB can track which scripts or notebooks were used to create data.
Starts the tracking process:
To track the current source code, run:

```{r track, eval = submit_eval}
db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd")
Expand All @@ -99,12 +72,23 @@ db$track("I8BlHXFXqZOG0000", path = "laminr.Rmd")
<div class="alert alert-info" role="alert">
**Tip**

The ID should be obtained by running `db$track(path = "your_file.R")` and copying the ID from the output.
The UID (here "I8BlHXFXqZOG0000") is obtained by running `db$track(path = "your_file.R")` and copying the UID from the output.
</div>

## Connect to other instances

It is possible to connect to any LaminDB instance for reading data.
Connect to the public CELLxGENE instance:

```{r connect-cellxgene}
cellxgene <- connect("laminlabs/cellxgene")
cellxgene
```

# Download a dataset

Artifacts are objects that contain measurements as well as associated metadata.
Artifacts are objects that bundle data and associated metadata.
An artifact can be any file or folder but is typically a dataset.

```{r get-artifact}
artifact <- cellxgene$Artifact$get("7dVluLROpalzEh8mNyxk")
Expand All @@ -114,19 +98,25 @@ artifact
<div class="alert alert-info" role="alert">
**Tip**

You can view information about this dataset on Lamin Hub https://lamin.ai/laminlabs/cellxgene/artifact/7dVluLROpalzEh8mNyxk.
It can also be used to search for other CELLxGENE datasets.
You can view detailed information about this dataset on LaminHub: https://lamin.ai/laminlabs/cellxgene/artifact/7dVluLROpalzEh8mNyxk.

You can search and query more CELLxGENE datasets here: https://lamin.ai/laminlabs/cellxgene/artifacts.
</div>

So far only retrieved the metadata of this artifact has been retrieved.
To download the data itself, run:
To download the dataset and load it into memory, run:

```{r load-artifact}
adata <- artifact$load()
adata
```

You can see that this artifact contains an [`AnnData`](https://anndata.readthedocs.io) object.
This artifact contains an [`AnnData`](https://anndata.readthedocs.io) object.

<div class="alert alert-info" role="alert">
**Tip**

If you prefer a path to a local file or folder, call `path <- artifact$cache()`.
</div>

# Work with the data

Expand All @@ -137,10 +127,10 @@ Here, marker genes are calculated for each of the provided cell type labels usin
# Create a Seurat object
seurat <- SeuratObject::CreateSeuratObject(
counts = as(Matrix::t(adata$X), "CsparseMatrix"),
meta.data = adata$obs,
meta.data = adata$obs
)
# Set cell identities to the provided cell type annotation
SeuratObject::Idents(seurat) <- "Cell_Type"
SeuratObject::Idents(seurat) <- "cell_type"
# Normalise the data
seurat <- Seurat::NormalizeData(seurat)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this let's peoples memory and storage explode; that's why I removed it. It'd be a bad experience if upload or compute etc. took long -- so keeping data small would be good.

If it's not actually an issue anymore, it's all good!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignored in new PR.

# Test for marker genes (the output is a data.frame)
Expand All @@ -155,9 +145,9 @@ Seurat::DotPlot(seurat, features = unique(markers$gene)) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5))
```

# Save the results to your instance
# Save the results

Any results can be saved to the default LaminDB instance.
Save results as new artifacts to the default LaminDB instance.

```{r save-results, eval = submit_eval}
seurat_path <- tempfile(fileext = ".rds")
Expand All @@ -174,19 +164,19 @@ db$Artifact$from_path(
)$save()
```

# Finish tracking
# Mark the analysis as finished

End the tracking run to generate a timestamp:
Mark the analysis run as finished to create a time stamp and upload source code to the hub.

```{r finish, eval = submit_eval}
db$finish()
```

## Save notebooks and code
## Save a notebook report (not needed for `.R` scripts)

Save the tracked notebook to your instance:
Save a run report of your notebook (`.Rmd` or `.qmd` file) to your instance:

1. Render the notebook to HTML (not needed for `.R` scripts)
1. Render the notebook to HTML

- In RStudio, click the "Knit" button
- **OR** From the command line, run:
Expand All @@ -206,3 +196,7 @@ Save the tracked notebook to your instance:
```bash
lamin save laminr.Rmd
```

# Further reading

For more details about how **{laminr}** works see `vignette("concepts_features", package = "laminr")`.
Loading