Sunbeam is a pipeline written in snakemake that simplifies and automates many of the steps in metagenomic sequencing analysis. It uses conda to manage dependencies, so it doesn't have pre-existing dependencies or admin privileges, and can be deployed on most Linux workstations and clusters. To read more, check out our paper in Microbiome.
Overview / Usefull commands
- all
- all_qc:
- cutadapt
- trimmomatic
- komplexity
- all_decontam
- bwa
- krakenuniq
- all_metaspades
- all_annotate
- all_assembly
- all_coverage
- all_classify
- all_mapping
- all_reports
Sunbeam currently automates the following tasks:
- Quality control, including adaptor trimming, host read removal, and quality filtering;
- Taxonomic assignment of reads to databases using Kraken;
- Assembly of reads into contigs using Megahit;
- Contig annotation using BLAST[n/p/x];
- Mapping of reads to target genomes; and
- ORF prediction using Prodigal.
Sunbeam was designed to be modular and extensible. Some extensions have been built for:
- IGV for viewing read alignments
- KrakenHLL, an alternate read classifier
- Kaiju, a read classifier that uses BWA rather than kmers
- Anvi'o, a downstream analysis pipeline that does lots of stuff!
More extensions can be found at the extension page: https://www.sunbeam-labs.org/.
To get started, see our documentation!
If you use the Sunbeam pipeline in your research, please cite:
EL Clarke, LJ Taylor, C Zhao, A Connell, J Lee, FD Bushman, K Bittinger. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome 7:46 (2019)
See how people are using Sunbeam:
- Shi, Z et al. Segmented Filamentous Bacteria Prevent and Cure Rotavirus Infection. Cell 179, 644-658.e13 (2019).
- Abbas, AA et al. Redondoviridae, a Family of Small, Circular DNA Viruses of the Human Oro-Respiratory Tract Associated with Periodontitis and Critical Illness. Cell Host Microbe 25, 719–729 (2019).
- Leiby, JS et al. Lack of detection of a human placenta microbiome in samples from preterm and term deliveries. Microbiome 6, 1–11 (2018).
Nobvember 19, 2020:
- added decom step with krakenuniq to remove human and PhiX classified reads
- New command
sunbeam extend
to automatically install Sunbeam extensions! Use likesunbeam extend https://github.com/sunbeam-labs/sbx_report
sunbeam init
andsunbeam config update
now add options for extensions you've installed to your default config file! (#247)- Updated the path to the Illumina adapter sequences from hardcoded to templated (fixes #150 and #152)
- Use the updated kraken2 classifier instead of kraken
- Update other dependencies (trimmomatic -> 0.3.9; grabseqs -> 0.6.1; snakemake -> <5.7.0)
- Added a build manifest, which is run every time on integration testing and can be fed into conda by users to install the most recent successful dependencies
- Updates to documentation (#169, #230, #231)
- Fix missing samtools (#224)
- Integration test updates to schedule weekly builds (#222)
- Fix issues with old paired-end illumina adapters (#221)
- Script updates to use conda commands instead of source commands (#220)
- Add h5py package explicitly to avoid dependency metadata problem (#219)
- Add multiQC to build QC report (#203)
- Use multithreading for cutadapt in QC (#202)
- Correct conda channel priority during install (#201)
- Update documentation to spell out requirements (#199)
- New megahit failure handling (#194)
- Enforce sample wildcard constraints in Snakemake rules (#190)
- Run megahit multithreaded (#189)
- Add implicit dependencies (samtools and bcftools) to environment file to make them explicit
- Increment Snakemake version requirement for compatibility with recent conda
- Specify earlier megahit version to ensure compatbility with existing assembly behavior
- Integration test improvements
- Start a project using resources directly from the SRA using
sunbeam init --data_acc [SRA ###]
. For more information, see the docs - New extension website: https://www.sunbeam-labs.org/
- Improved documentation
- Numerous bugfixes and optimizations
- Minor bugfixes
- Low-complexity reads are now removed by default rather than masked
- Bug fixes related to single-end sequencing experiments
- Documentation updates
- Reports include number of filtered reads per host, rather than in aggregate
- Static binary dependency for komplexity for easier deployment
- Remove max length filter for contigs
- First stable release!
- Support for single-end sequencing experiments
- Low-complexity read masking via komplexity
- Support for extensions
- Documentation on ReadTheDocs.io
- Better assembler (megahit)
- Better ORF finder (prodigal)
- Can remove reads from any number of host/contaminant genomes
- Semantic versioning checks
- Integration tests and continuous deployment
- Erik Clarke (@eclarke)
- Chunyu Zhao (@zhaoc1)
- Jesse Connell (@ressy)
- Louis Taylor (@louiejtaylor)
- Kyle Bittinger (@kylebittinger)