diff --git a/topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md b/topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md index 8865a794d4e86e..82c21c4e927164 100644 --- a/topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md +++ b/topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md @@ -17,7 +17,7 @@ objectives: - Create a count-matrix from a 10X fragment file - Perform filtering, dimension reduction and clustering on AnnData matrices - Generate and filter a cell-by-gene matrix -- Identify marker genes for the clusters and annotate clusters +- Identify marker genes for the clusters and annotate the cell types time_estimation: 2H key_points: - Single-cell ATAC-seq can identify open chromatin sites @@ -176,7 +176,7 @@ SnapATAC2 requires 3 input files for the standard pathway of processing: > > > > 3. Rename the generated file to `Fragments 500 PBMC` > > 4. Now you can continue with either the `fragments_file` from earlier or the new file `Fragments 500 PBMC`. -> > - {% icon galaxy-info %} Please note that `Fragments 500 PBMC` only contains 500 {PBMC} and thus the clustering will produce different outputs compared to the outputs generated by `fragments_file` (with 5k PBMC). +> > - {% icon galaxy-info %} Please note that `Fragments 500 PBMC` only contains 500 {PBMC's} and thus the clustering will produce different outputs compared to the outputs generated by `fragments_file` (with 5k PBMC). > {: .hands_on} {: .details} @@ -451,7 +451,7 @@ Doublets are removed by calling a customized [**scrublet**](https://github.com/A > - The observed features of the "cells" are then compared to the simulated doublets and scored on their doublet probability. > - SnapATAC2's *pp.filter_doublets* then removes all cells with a doublet probability >50%. > -> ![Doublet removal with scrublet]({% link topics/single-cell/images/scatac-standard-snapatac2/doublets-and-scrublet.png %} "Scrublet simulates expected doublets and produces doublet scores for each cell.") +> ![Doublet removal with scrublet]({% link topics/single-cell/images/scatac-standard-snapatac2/doublets-and-scrublet.png %} "Scrublet simulates expected doublets and produces doublet scores for each cell. ({% cite Wolock2019 %})") > {: .details} @@ -502,13 +502,14 @@ Dimension reduction is a very important step during the analysis of single cell > > - Dimension reduction algorithms can be either linear or non-linear. > - Linear methods are generally computationally efficient and well scalable. +> > A popular linear dimension reduction algorithm is: > - **PCA** (Principle Component Analysis), implemented in **Scanpy** (please check out our [Scanpy]({% link topics/single-cell/tutorials/scrna-scanpy-pbmc3k/tutorial.md %}) tutorial for an explanation). > - Nonlinear methods however are well suited for multimodal and complex datasets. > - in contrast to linear methods, which often preserve global structures, non-linear methods have a locality-preserving character. > - This makes non-linear methods relatively insensitive to outliers and noise, while emphasizing natural clusters in the data ({% cite Belkin2003%}) > - As such, they are implemented in many algorithms to visualize the data in 2 dimensions (f.ex. **UMAP** embedding). -> - The nonlinear dimension reduction algorithm, through *spectral embedding*, used in **SnapATAC2** is a very fast and memory efficient non-linear algorithm ({% cite Zhang2024%}). +> - The nonlinear dimension reduction algorithm, through *matrix-free spectral embedding*, used in **SnapATAC2** is a very fast and memory efficient non-linear algorithm ({% cite Zhang2024%}). > - **Spectral embedding** utilizes an iterative algorithm to calculate the **spectrum** (*eigenvalues* and *eigenvectors*) of a matrix without computing the matrix itself. {: .details} @@ -524,7 +525,7 @@ The dimension reduction, produced by the algorithm *tl.spectral*, is required fo > > > Distance metric > > -> > - The fast and well scalable *matrix-free spectral embedding* algorithm depends on the distance metric: `cosine` +> > - The fast and well scalable *"matrix-free spectral embedding"* algorithm depends on the distance metric: `cosine` > {: .comment} > > 2. Rename the generated file to `Anndata 5k PBMC spectral` or add the tag {% icon galaxy-tags %} `spectral` to the dataset @@ -968,7 +969,7 @@ To manually annotate the *Leiden* clusters, we will need to perform multiple ste # Conclusion -{% icon congratulations %} Well done, you’ve made it to the end! You might want to consult your results with this [control history](https://usegalaxy.eu/u/timonschlegel/w/workflow---standard-processing-of-10x-single-cell-atac-seq-data-with-snapatac2), or check out the [full workflow](https://singlecell.usegalaxy.eu/u/timonschlegel/w/2combined-snapatac2) for this tutorial. +{% icon congratulations %} Well done, you’ve made it to the end! You might want to consult your results with this [control history](https://singlecell.usegalaxy.eu/u/timonschlegel/h/test-of-5k-pbmc-tutorial-workflow), or check out the [full workflow](https://usegalaxy.eu/u/timonschlegel/w/workflow---standard-processing-of-10x-single-cell-atac-seq-data-with-snapatac2) for this tutorial. In this tutorial, we produced a count matrix of {scATAC-seq} reads in the `AnnData` format and performed: 1. Preprocessing: