Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vignettes #94

Merged
merged 9 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: laminr
Title: Interface for 'LaminDB'
Title: Client for 'LaminDB'
Version: 0.2.0
Authors@R: c(
person("Robrecht", "Cannoodt", email = "[email protected]", role = c("aut", "cre"),
Expand Down
28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# {laminr}: An R interface to LaminDB
# {laminr}: An R client for LaminDB

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/laminr)](https://CRAN.R-project.org/package=laminr)
Expand Down Expand Up @@ -44,22 +44,30 @@ Get started with **{laminr}** by installing the package from CRAN:
install.packages("laminr")
```

To include all suggested dependencies for enhanced functionality, use:
You will also need to install the `lamindb` Python package:

```bash
pip install lamindb[aws]
```

### Additional packages

Some functionality requires additional packages. To install all of these use:

```r
install.packages("laminr", dependencies = TRUE)
```

This further installs:

- anndata: For native AnnData support in R
- S3: To fetch datasets from AWS S3
This will also install these package for the following tasks:

For now, you will also need to install the `lamindb` Python package:
- **{anndata}** - Native `AnnData` support in R
- **{nanoparquet}** - Reading `.parquet` files
- **{readr}** - Reading CSV/TSV files
- **{reticulate}** - Functionality that requires the Python `lamindb` package
- **{rsvg}** - Reading SVG files
- **{s3}** - Fetching datasets from AWS S3

```bash
pip install lamindb[aws]
```
If you choose not to install all packages now you will be prompted to do so whenever one is required.

## Getting started

Expand Down
4 changes: 2 additions & 2 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ navbar:
text: Articles
menu:
- text: Introduction
- text: Example Workflow
href: articles/example_workflow.html
- text: Concepts and features
href: articles/concepts_features.html
- text: Package Architecture
href: articles/architecture.html
- text: Development Roadmap
Expand Down
160 changes: 160 additions & 0 deletions vignettes/concepts_features.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: "Concepts and features"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Concepts and features}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

This vignette provides a more detailed introduction to the concepts and features of **{laminr}**.
We'll start with a brief overview of key concepts and then walk through the basic steps to connect to a LaminDB instance and work with its core components.

## Key Concepts in LaminDB

Before diving into the practical usage of **{laminr}**, it's helpful to understand some core concepts in LaminDB.
For a more detailed explanation, refer to the Architecture vignette (`vignette("architecture", package = "laminr")`).

* **Instance**: A LaminDB instance is a self-contained environment for storing and managing data and metadata. Think of it like a database or a project directory. Each instance has its own schema, storage location, and metadata database.
* **Module**: A module is a collection of related registries that provide specific functionality. For example, the core module contains essential registries for general data management, while the bionty module provides registries for biological entities like genes and proteins.
* **Registry**: A registry is a centralized collection of related records, similar to a table in a database. Each registry holds a specific type of metadata, such as information about artifacts, transforms, or features.
* **Record**: A record is a single entry within a registry, analogous to a row in a database table. Each record represents a specific entity and combines multiple fields of information.
* **Field**: A field is a single piece of information within a record, like a column in a database table. For example, an artifact record might have fields for its name, description, and creation date.

## Initial setup

Now, let's set up your environment to use **{laminr}**.

### Python setup

1. Install the `lamindb` Python package.

```bash
pip install lamindb[aws]
```

2. Connect to a LaminDB instance:

```bash
lamin connect laminlabs/cellxgene
```

### R setup

1. Install the **{laminr}** package.

```r
install.packages("laminr")
```

2. (Optional) Install suggested dependencies.

```r
install.packages("laminr", dependencies = TRUE)
```

This includes packages like **{anndata}** for working with
AnnData objects and **{s3}** for interacting with S3 storage.

## Connecting to LaminDB from R

Connect to the `laminlabs/cellxgene` instance from your R session:

```{r connect}
library(laminr)

db <- connect("laminlabs/cellxgene")
```

The `db` object now represents your connection to the LaminDB
instance. You can explore the available registries (like `Artifact`,
`Collection`, `Feature`, etc.) by simply printing the `db` object:

```{r print_instance}
db
```

These registries correspond to [Python classes in LaminDB](https://docs.lamin.ai/lamindb).

To access registries within specific modules, use the $ operator. For example, to access the bionty module:

```{r get_module}
db$bionty
```

The `bionty` and other registries also have corresponding [Python classes](https://docs.lamin.ai/bionty).

## Working with registries

Let's use the `Artifact` registry as an example. This registry stores datasets, models, and other data entities.

To see the available functions for the `Artifact` registry, print the registry object:

```{r get_artifact_registry}
db$Artifact
```

You can also get a data frame summarising the records associated with a registry.

```{r artifact_registry_df}
db$Artifact$df(limit = 5)
```

## Working with records

You can fetch a specific record from a registry using its ID or UID. For instance, to get the artifact with UID [KBW89Mf7IGcekja2hADu](https://lamin.ai/laminlabs/cellxgene/artifact/KBW89Mf7IGcekja2hADu):

```{r get_artifact}
artifact <- db$Artifact$get("KBW89Mf7IGcekja2hADu")
```

This artifact contains an `AnnData` object with myeloid cell data. You can view its metadata:

```{r print_artifact}
artifact
```

For artifact records, you can get more detailed information:

```{r describe_artifact}
artifact$describe()
```

Access specific fields of the record using the `$` operator:

```{r access_fields}
artifact$id
artifact$uid
artifact$key
```

Some fields of a record contain links to related information.

```{r artifact_related}
artifact$storage
artifact$developmental_stages
```

When those that are one-to-many or many-to-many relationship, a summary of the related information can be retrieved as a data frame.

```{r artifact_related_df}
artifact$developmental_stages$df()
```

Finally, for artifact records only, you can download the associated data:

```{r cache_artifact}
artifact$cache() # Cache the data locally
artifact$load() # Load the data into memory
```

<div class="alert alert-warning" role="alert">
Currently, **{laminr}** primarily supports S3 storage. Support for other storage backends will be added in the future. For more information related to planned features and the roadmap, please refer to the Development vignette (`vignette("development", package = "laminr")`).
</div>
Loading