Skip to content

Comprehensive pipeline for metagenomic sequencing experiments. Generates HTML report.

Notifications You must be signed in to change notification settings

ctho1/metagenomics_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Metagenomics Pipeline

A pipeline for metagenomic sequencing experiments. Takes paired-end .fastq.gz files as input and generates a detailed HTML report with result tables. After quality trimming and filtering of low-complexity sequences with fqtrim, high-quality reads are being aligned to the human reference genome using Bowtie 2. Unaligned (non-human) reads are then subjected to several metagenomics tools.

First, taxonomic classification with Kraken 2 and Centrifuge using different reference databases (standard, viral, EUPATHDB48) is being performed. Next, reads are aligned against ~12,000 RefSeq virus genomes (as well as ~4,500 human-infecting virus strains related to the RefSeq viruses) and detection of viral integration sites into the human genome are detected using Arriba and STAR. In a last step, de novo assembly using MEGAHIT is being performed. Contigs > 1000 bp are automatically classified with Kraken 2 and Centrifuge. Output files can be used for manual downstream analyses such as BLAST or phylogenetic studies. All results are summarized in a comprehensive HTML report.

The pipeline is adapted to the SLURM job scheduler for parallel processing of multiple samples. Requires 140 GB memory (Centrifuge index with all non-redunandt NCBI sequences) and adjustable number of CPUs.

Prerequisites

The following tools need to be installed and available in your $PATH:

Additionally, the following reference data is required:

HTML report

report

About

Comprehensive pipeline for metagenomic sequencing experiments. Generates HTML report.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages