ResMAG is a state-of-the-art and user-friendly Snakemake workflow designed for the analysis of metagenomic data. It integrates multiple bioinformatics tools and algorithms to facilitate key steps in metagenome analysis, including bin refinement, metagenome-assembled genome (MAG) reconstruction, taxonomic classification of MAGs, and identification of antibiotic resistance genes.
Binning Techniques: Employ a collection of five state-of-the-art binning tools to partition metagenomic contigs into individual bins, allowing for comprehensive and accurate analysis.
MAG Reconstruction: Utilize cutting-edge algorithms to reconstruct high-quality metagenome-assembled genomes (MAGs) from sequencing data.
Taxonomic Classification: Apply advanced taxonomic classification methods to assign taxonomic labels to MAGs and identify the microbial community composition within the metagenomic samples.
Antibiotic Resistance Gene Identification: Perform in-depth analysis to detect and characterize antibiotic resistance genes within the metagenomic data, providing valuable insights into antimicrobial resistance profiles.
Performance Refinement: Continuously optimize the pipeline by incorporating the latest advancements in metagenomics research, ensuring the highest accuracy and efficiency in metagenomic data analysis.
%%{init: {
'theme':'base',
'themeVariables': {
'secondaryColor': '#fff',
'tertiaryColor': '#fff',
'tertiaryBorderColor' : '#fff'}
}}%%
flowchart TB;
subgraph " "
direction TB
%% Nodes
A[/short reads/]
B["<b>QC</b> <br> <i>fastp<i>"]
C["<b>Host read filtering</b> <br> <i>Kraken 2<i>"]
D["<b>Assembly</b> <br> <i>MegaHIT</i>"]
E["<b>Binning</b>"]
F["<b>Bin refinement</b> <br> <i>DAS Tool<i>"]
G[/MAGs/]
H["<b>Resistance analysis</b> <br> <i>HyDRA<i>"]
I["<b>taxonomic classification</b> <br> <i>Kaiju and GTDBTk<i>"]
J[/MultiQC report/]
K[/Assembly summary/]
%% input & output node design
classDef in_output fill:#fff,stroke:#cde498,stroke-width:4px
class A,G,J,K in_output
%% rule node design
classDef rule fill:#cde498,stroke:#000
class B,C,D,E,F,H,I rule
%% Node links
A --> B
B --> C
B --- J
C --> D
D --> E
D ---- K
E --"<i>MetaBAT 2<i>"--> F
E --"<i>MetaBinner<i>"--> F
E --"<i>MetaCoAG<i>"--> F
E --"<i>Rosella<i>"--> F
E --"<i>Vamb<i>"--> F
F --> G
G --- H
G --- I
end
To prepare the workflow
- Clone it to your desired working folder via git or your preferred IDE
- Edit the
config/config.yaml
file:- Specify a project name (
project-name
) - Specify filtering options for human reads (
human-filtering
) - Specify host filtering options, if you have a non-human host (
host-filtering
) - Specify options for GTDB database (see Downloading GTDB)
- Specify a project name (
- Provide a sample information in the
config/pep/samples.csv
file with keeping the header and format as.csv
:
sample_name,fq1,fq2
sample1,path/to/your/fastq/sample1_R1.fastq.gz,path/to/your/fastq/sample1_R2.fastq.gz
The GTDB files need to be downloaded and unarchived, it requires about 110 Gb.
- Create a new folder
resources/gtdb/
and change to this directory - Download the latest version of GTDB
- Unarchive the downloaded file
- After successful step 3: the archive can be removed
mkdir resources/gtdb/
cd resources/gtdb/
wget https://data.ace.uq.edu.au/public/gtdb/data/releases/latest/auxillary_files/gtdbtk_package/full_package/gtdbtk_data.tar.gz
tar xvzf gtdbtk_data.tar.gz
rm gtdbtk_data.tar.gz
snakemake --use-conda --cores all --rerun-incomplete
The usage of this workflow is described in the Snakemake Workflow Catalog.
Bug report
Feature request
ResMAG is released under the BSD-2 Clause. Please review the license file for more details.
For any questions, or feedback, please contact the project maintainer at [email protected] or [email protected]. We appreciate your input and support in using and improving ResMAG.
We would like to express our gratitude towards Adrian Doerr, Alexander Thomas, Johannes Köster, Ann-Kathrin Brüggemann and the IKIM who have contributed to the development and testing of ResMAG. Their valuable insights and feedback have been helpful throughout the creation of the workflow.
CoverM
DAS Tool
fastp
FastQC
Kraken 2
MEGAHIT
MetaBAT 2
MetaBinner
MetaCoAG
minimap2
pandas
Rosella
samtools
VAMB
MultiQC
A paper is on its way. If you use ResMAG in your work before the paper, then please consider citing this GitHub.