Skip to content

Commit

Permalink
Metadata and data input
Browse files Browse the repository at this point in the history
  • Loading branch information
timonschlegel committed Jun 24, 2024
1 parent 03a7cf1 commit e3fa450
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 80 deletions.
4 changes: 4 additions & 0 deletions CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2160,6 +2160,10 @@ thomaswollmann:
- deNBI
- elixir-europe

timonschlegel:
name: Timon Schlegel
joined: 2024-05

timothygriffin:
name: Timothy J. Griffin
email: [email protected]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,34 @@ @article{Batut2018
title = {Community-Driven Data Analysis Training for Biology},
journal = {Cell Systems}
}

@article{Korsunsky2019,
title = {Fast, sensitive and accurate integration of single-cell data with Harmony},
volume = {16},
ISSN = {1548-7105},
url = {http://dx.doi.org/10.1038/s41592-019-0619-0},
DOI = {10.1038/s41592-019-0619-0},
number = {12},
journal = {Nature Methods},
publisher = {Springer Science and Business Media LLC},
author = {Korsunsky, Ilya and Millard, Nghia and Fan, Jean and Slowikowski, Kamil and Zhang, Fan and Wei, Kevin and Baglaenko, Yuriy and Brenner, Michael and Loh, Po-ru and Raychaudhuri, Soumya},
year = {2019},
month = nov,
pages = {1289–1296}
}
@article{Zhang2024,
title = {A fast, scalable and versatile tool for analysis of single-cell omics data},
volume = {21},
ISSN = {1548-7105},
url = {http://dx.doi.org/10.1038/s41592-023-02139-9},
DOI = {10.1038/s41592-023-02139-9},
number = {2},
journal = {Nature Methods},
publisher = {Springer Science and Business Media LLC},
author = {Zhang, Kai and Zemke, Nathan R. and Armand, Ethan J. and Ren, Bing},
year = {2024},
month = jan,
pages = {217–227}
}
@online{gtn-website,
author = {GTN community},
title = {GTN Training Materials: Collection of tutorials developed and maintained by the worldwide Galaxy community},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,64 @@
layout: tutorial_hands_on

title: Multi-sample batch correction with Harmony and SnapATAC2
subtopic: scmultiomics
priority: 3
level: Intermediate
zenodo_link: ''
questions:
- Which biological questions are addressed by the tutorial?
- Which bioinformatics techniques are important to know for this type of data?
- Why is batch correction important when analyzing data from multiple samples?
- How is batch correction performed on single cell ATAC-seq data?
objectives:
- The learning objectives are the goals of the tutorial
- They will be informed by your audience and will communicate to them and to yourself
what you should focus on during the course
- They are single sentences describing what a learner should be able to do once they
have completed the tutorial
- You can use Bloom's Taxonomy to write effective learning objectives
- Perform batch correction on a collection of single cell ATAC-seq data
- Learn how Harmony integrates different samples
time_estimation: 3H
key_points:
- The take-home messages
- They will appear at the end of the tutorial
- Batch correction is important for integration of data from multiple experiments
- How harmony works
requirements:
-
type: "internal"
topic_name: single-cell
tutorials:
- scatac-preprocessing-tenx
- scatac-standard-processing-snapatac2
tags:
- 10x
- epigenetics
abbreviations:
scATAC-seq: Single-cell Assay for Transposase-Accessible Chromatin using sequencing
QC: quality control
TSSe: transcription start site enrichment
TSS: transcription start sites
UMAP: Uniform Manifold Approximation and Projection
contributors:
- contributor1
- contributor2
- timonschlegel
gitter: Galaxy-Training-Network/galaxy-single-cell


---


# Introduction

<!-- This is a comment. -->
Performing biological experiments in replicates is one of the cornerstones of modern science. However, when integrating data from multiple single-cell sequencing experiments, technical confounders might impact the results.
To reduce technical confounders, such as different experimenters, experimental protocols, sequencing lanes or sequencing technologies, a batch correction might be beneficial.

General introduction about the topic and then an introduction of the
tutorial (the questions and the objectives). It is nice also to have a
scheme to sum up the pipeline used during the tutorial. The idea is to
give to trainees insight into the content of the tutorial and the (theoretical
and technical) key concepts they will learn.

You may want to cite some publications; this can be done by adding citations to the
bibliography file (`tutorial.bib` file next to your `tutorial.md` file). These citations
must be in bibtex format. If you have the DOI for the paper you wish to cite, you can
get the corresponding bibtex entry using [doi2bib.org](https://doi2bib.org).
In this tutorial, we will perform batch correction on five datasets of {scATAC-seq} data with the algorithm *Harmony* ({% cite Korsunsky2019 %}) and the tool suite [**SnapATAC2**] (https://kzhang.org/SnapATAC2/version/2.5/index.html) ({% cite Zhang2024 %}).

With the example you will find in the `tutorial.bib` file, you can add a citation to
this article here in your tutorial like this:
{% raw %} `{% cite Batut2018 %}`{% endraw %}.
This will be rendered like this: {% cite Batut2018 %}, and links to a
[bibliography section](#bibliography) which will automatically be created at the end of the
tutorial.
{% snippet topics/single-cell/faqs/single_cell_omics.md %}

{% snippet faqs/galaxy/tutorial_mode.md %}

**Please follow our
[tutorial to learn how to fill the Markdown]({{ site.baseurl }}/topics/contributing/tutorials/create-new-tutorial-content/tutorial.html)**
> <comment-title></comment-title>
>
> This tutorial is significantly based on ["Multi-sample Pipeline" tutorial from SnapATAC2](https://kzhang.org/SnapATAC2/version/2.5/tutorials/integration.html).
> The data analysis is performed with the same tools shown in the tutorial [Single-cell ATAC-seq standard processing with SnapATAC2]( {% link topics/single-cell/tutorials/scatac-standard-processing-snapatac2/tutorial.md %} ).
> - That tutorial also explains the steps of the ATAC-seq analysis with SnapATAC2 in more detail.
> - We recommend completing that tutorial before continuing with this one.
>
{: .comment}

> <agenda-title></agenda-title>
>
Expand All @@ -59,26 +70,9 @@ tutorial.
>
{: .agenda}

# Title for your first section

Give some background about what the trainees will be doing in the section.
Remember that many people reading your materials will likely be novices,
so make sure to explain all the relevant concepts.

## Title for a subsection
Section and subsection titles will be displayed in the tutorial index on the left side of
the page, so try to make them informative and concise!
# Data

# Hands-on Sections
Below are a series of hand-on boxes, one for each tool in your workflow file.
Often you may wish to combine several boxes into one or make other adjustments such
as breaking the tutorial into sections, we encourage you to make such changes as you
see fit, this is just a starting point :)

Anywhere you find the word "***TODO***", there is something that needs to be changed
depending on the specifics of your tutorial.

have fun!
The datasets for this tutorial are colon samples from multiple donors, provided by the [SnapATAC2 documentation](https://kzhang.org/SnapATAC2/version/2.5/tutorials/integration.html).

## Get data

Expand All @@ -90,50 +84,35 @@ have fun!
> -> `{{ page.title }}`):
>
> ```
>
> {{ page.zenodo_link }}/files/colon_multisample.tar
> {{ page.zenodo_link }}/files/chrom_sizes.txt
> {{ page.zenodo_link }}/files/gencode.v46.annotation.gtf.gz
> ```
> ***TODO***: *Add the files by the ones on Zenodo here (if not added)*
>
> ***TODO***: *Remove the useless files (if added)*
>
> {% snippet faqs/galaxy/datasets_import_via_link.md %}
>
> {% snippet faqs/galaxy/datasets_import_from_data_library.md %}
>
> 3. Rename the datasets
> 4. Check that the datatype
>
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="datatypes" %}
> 3. Rename the datasets if necessary
>
> {% snippet faqs/galaxy/datasets_rename.md %}
>
> 5. Add to each database a tag corresponding to ...
> 4. Check that the datatype of the `colon_multisample` files is set to `bed`
>
> {% snippet faqs/galaxy/datasets_add_tag.md %}
> {% snippet faqs/galaxy/datasets_change_datatype.md datatype="datatypes" %}
> 5. Create a dataset collection with all `colon_multisample` datasets.
>
> {% snippet faqs/galaxy/collections_build_list.mdu name="Colon Multisample" %}
>
{: .hands_on}
# Title of the section usually corresponding to a big step in the analysis
It comes first a description of the step: some background and some theory.
Some image can be added there to support the theory explanation:
![Alternative text](../../images/image_name "Legend of the image")
# SnapATAC2 preprocessing and filtering
The idea is to keep the theory description before quite simple to focus more on the practical part.
With our data imported and the collection built, we can now begin the {scATAC-seq} data preprocessing with SnapATAC2.
***TODO***: *Consider adding a detail box to expand the theory*
> <details-title> More details about the theory </details-title>
>
> But to describe more details, it is possible to use the detail boxes which are expandable
>
{: .details}
The first step is importing the datasets into an AnnData object with the tool *pp.import_data*. Next, the {TSSe} will be calculated. The {TSS} serves as a {QC} measurement to only filter droplets containing high-quality cells.
A big step can have several subsections or sub steps:
## Sub-step with **SnapATAC2 Preprocessing**
> <hands-on-title> Task description </hands-on-title>
> <hands-on-title> Preprocessing and Filtering </hands-on-title>
>
> 1. {% tool [SnapATAC2 Preprocessing](toolshed.g2.bx.psu.edu/repos/iuc/snapatac2_preprocessing/snapatac2_preprocessing/2.5.3+galaxy1) %} with the following parameters:
> - *"Method used for preprocessing"*: `Import data fragment files and compute basic QC metrics, using 'pp.import_data'`
Expand Down

0 comments on commit e3fa450

Please sign in to comment.