Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check guidelines for release and add corrections #132

Merged
merged 12 commits into from
Oct 29, 2024
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
</picture>
</h1>

**Multi-steps pipeline dedicated to genetic imputation from simulation to validation**
**Multi-step pipeline dedicated to genetic imputation from simulation to validation**

[![GitHub Actions CI Status](https://github.com/nf-core/phaseimpute/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/ci.yml)
[![GitHub Actions Linting Status](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/phaseimpute/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/phaseimpute/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
Expand All @@ -20,11 +20,11 @@

## Introduction

**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. Different steps are available each corresponding to a dedicated modes.
**nf-core/phaseimpute** is a bioinformatics pipeline to phase and impute genetic data. Different steps are available, each corresponding to a dedicated mode.

### Main steps of the pipeline

The **phaseimpute** pipeline is constituted of 5 main steps:
The **phaseimpute** pipeline consists of 5 main steps:

| Metro map | Modes |
| ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -45,7 +45,7 @@ sample,file,index
SAMPLE_1X,/path/to/.<bam/cram>,/path/to/.<bai,crai>
```

Each row represents a bam or a cram file with its index file. All input files need to be of the same extension.
Each row represents a BAM or CRAM file along with its index file. All input files need to be of the same extension.
For some tools and steps, you will also need to submit a samplesheet with the reference panel.

A final samplesheet file for the reference panel may look something like the one below. This is for 3 chromosomes.
Expand Down Expand Up @@ -80,18 +80,18 @@ For more details and further functionality, please refer to the [usage documenta
Here is a short description of the different steps of the pipeline.
For more information please refer to the [documentation](https://nf-core.github.io/phaseimpute/usage/).

| steps | Flow chart | Description |
| --------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **--panelprep** | <img src="docs/images/metro/PanelPrep.png" alt="Panel preparation" width="600"/> | The preprocessing mode is responsible to the preparation of the multiple input file that will be used by the phasing process. <br> The main processes are : <br> - **Haplotypes phasing** of the reference panel using [**Shapeit5**](https://odelaneau.github.io/shapeit5/). <br> - **Normalize** the reference panel to select only the necessary variants. <br> - **Chunking the reference panel** in a subset of region for all the chromosomes. <br> - **Extract** the positions where to perform the imputation. |
| **--impute** | <img src="docs/images/metro/Impute.png" alt="Impute target" width="600"/> | The imputation mode is the core mode of this pipeline. <br> It is constituted of 3 main steps: <br> - **Imputation**: Impute the target dataset on the reference panel using either: <br> &emsp; - [**Glimpse1**](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html): It's come with the necessety to compute the genotype likelihoods of the target dataset (done using [BCFTOOLS_mpileup](https://samtools.github.io/bcftools/bcftools.html#mpileup)). <br> &emsp; - [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) <br> &emsp; - [**Stitch**](https://github.com/rwdavies/stitch) This steps does not require a reference panel but needs to merge the samples. <br> &emsp; - [**Quilt**](https://github.com/rwdavies/QUILT) <br> - **Ligation**: all the different chunks are merged together then all chromosomes are reunited to output one VCF per sample. |
| **--simulate** | <img src="docs/images/metro/Simulate.png" alt="simulate_metro" width="600"/> | The simulation mode is used to create artificial low informative genetic information from high density data. This allow to compare the imputed result to a _truth_ and therefore evaluate the quality of the imputation. <br> For the moment it is possible to simulate: <br> - Low-pass data by **downsample** BAM or CRAM using [SAMTOOLS_VIEW -s](https://www.htslib.org/doc/samtools-view.html) at different depth. |
| **--validate** | <img src="docs/images/metro/Validate.png" alt="concordance_metro" width="600"/> | This mode compare two vcf together to compute a summary of the differences between them. <br> This step use [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) concordance process. |
| steps | Flow chart | Description |
| ------------------------------------------------------------------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **--panelprep** | <img src="docs/images/metro/PanelPrep.png" alt="Panel preparation" width="600"/> | The preprocessing mode is responsible for preparing multiple input files that will be used by the phasing and imputation process. <br> The main processes are : <br> - **Haplotypes phasing** of the reference panel using [**Shapeit5**](https://odelaneau.github.io/shapeit5/). <br> - **Normalize** the reference panel to select only the necessary variants. <br> - **Chunking the reference panel** into a subset of regions for all the chromosomes. |
| . <br> - **Extract** the positions where to perform the imputation. |
LouisLeNezet marked this conversation as resolved.
Show resolved Hide resolved
| **--impute** | <img src="docs/images/metro/Impute.png" alt="Impute target" width="600"/> | The imputation mode is the core mode of this pipeline. <br> It consists of 3 main steps: <br> - **Imputation**: Impute the target dataset on the reference panel using either: <br> &emsp; - [**Glimpse1**](https://odelaneau.github.io/GLIMPSE/glimpse1/index.html): It comes with the necessity to compute the genotype likelihoods of the target dataset (done using [BCFTOOLS_mpileup](https://samtools.github.io/bcftools/bcftools.html#mpileup)). <br> &emsp; - [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) <br> &emsp; - [**Stitch**](https://github.com/rwdavies/stitch) This step does not require a reference panel but needs to merge the samples. <br> &emsp; - [**Quilt**](https://github.com/rwdavies/QUILT) <br> - **Ligation**: all the different chunks are merged together then all chromosomes are reunited to output one VCF per sample. |
| **--simulate** | <img src="docs/images/metro/Simulate.png" alt="simulate_metro" width="600"/> | The simulation mode is used to create artificial low informative genetic information from high density data. This allows the comparison of the imputed result to a _truth_ and therefore evaluates the quality of the imputation. <br> For the moment it is possible to simulate: <br> - Low-pass data by **downsample** BAM or CRAM using [SAMTOOLS_VIEW -s](https://www.htslib.org/doc/samtools-view.html) at different depth. |
| **--validate** | <img src="docs/images/metro/Validate.png" alt="concordance_metro" width="600"/> | This mode compares two VCF files together to compute a summary of the differences between them. <br> This step uses [**Glimpse2**](https://odelaneau.github.io/GLIMPSE/glimpse2/index.html) concordance process. |

## Pipeline output

To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/phaseimpute/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/phaseimpute/output).
For more details on the output files and reports, please refer to the [output documentation](https://nf-co.re/phaseimpute/output).

## Credits

Expand All @@ -112,41 +112,42 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/phaseimpute for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
If you use nf-core/phaseimpute for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX)

You can cite one of the main imputation methods ([`QUILT`](https://github.com/rwdavies/QUILT)) as follows:
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

You can cite the main imputation methods as follows:

[`QUILT`](https://github.com/rwdavies/QUILT):

> **Rapid genotype imputation from sequence with reference panels.**
>
> Davies, R. W., Kucka, M., Su, D., Shi, S., Flanagan, M., Cunniff, C. M., Chan, Y. F., & Myers, S.
>
> _Nature genetics_ 2021 June 03. doi: [10.1038/s41588-021-00877-0](https://doi.org/10.1038/s41588-021-00877-0)

You can cite one of the main imputation methods ([`GLIMPSE`](https://github.com/odelaneau/GLIMPSE)) as follows:
[`GLIMPSE`](https://github.com/odelaneau/GLIMPSE):

> **Efficient phasing and imputation of low-coverage sequencing data using large reference panels.**
>
> Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J., & Delaneau, O.
>
> _Nature genetics_ 2021. doi:[]()
> _Nature genetics_ 2021. doi:[10.1038/s41588-020-00756-0](https://doi.org/10.1038/s41588-020-00756-0)

> **Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes**
>
> Rubinacci, S., Hofmeister, R. J., Sousa da Mota, B., & Delaneau, O.
>
> _Nature genetics_ 2023. doi:[]()
> _Nature genetics_ 2023. doi:[10.1038/s41588-023-01438-3](https://doi.org/10.1038/s41588-023-01438-3)

You can cite one of the main imputation methods ([`STITCH`](https://github.com/rwdavies/STITCH)) as follows:
[`STITCH`](https://github.com/rwdavies/STITCH):

> **Rapid genotype imputation from sequence without reference panels.**
>
> Davies, R. W., Flint, J., Myers, S., & Mott, R.
>
> _Nature genetics_ 2016 . doi: []().

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
> _Nature genetics_ 2016 . doi: [10.1038/ng.3594](https://doi.org/10.1038/ng.3594).

You can cite the `nf-core` publication as follows:

Expand Down
2 changes: 0 additions & 2 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@

process {

// TODO nf-core: Check the defaults for all processes
cpus = { 1 * task.attempt }
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
Expand All @@ -24,7 +23,6 @@ process {
// These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
// If possible, it would be nice to keep the same label naming convention when
// adding in your local modules too.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { 1 }
Expand Down
2 changes: 0 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ This document describes the output produced by the pipeline. Most of the plots a

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

<!-- TODO nf-core: Write this documentation describing your workflow's output -->

## Pipeline overview

## Panel preparation outputs `--steps panelprep`
Expand Down
2 changes: 0 additions & 2 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@

## Introduction

<!-- TODO nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. -->

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
Expand Down
1 change: 0 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,6 @@ profiles {
includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"

// Load nf-core/phaseimpute custom profiles from different institutions.
// TODO nf-core: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
// includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/phaseimpute.config" : "/dev/null"

// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile
Expand Down
Loading
Loading