Home

Welcome to the Cecret wiki!

Consensus Extraction and Contig Reconstruction using Enriched libraries against a Template (CECRET)

---
Cecret
---
flowchart LR
fastq --> cleaning
cleaning --> alignment
reference --> alignment
alignment --> A[primer trimming]
B[primer schema] --> A
A --> consensus

This workflow is for intended for amplicon-based NGS libraries and an intended reference. There are options to skip primer removal, but there are no options to skip alignment to a reference.

There are several references and primer schemes supplied with this workflow which are listed in their corresponding subspecies workflow. More can be added if the reference is small. Please submit an issue to let us know what else we should include.

The primer scheme and reference fasta file may also be supplied by the end user.

Introduction

Named after the beautiful Cecret lake

Location: 40.570°N 111.622°W , Elevation: 9,875 feet (3,010 m), Hiking level: easy

(Image credit: Intermountain Healthcare)

Cecret was originally developed by @erinyoung at the Utah Public Health Laborotory for SARS-COV-2 sequencing with the artic/Illumina hybrid library prep workflow for MiSeq data with protocols here and here. This nextflow workflow, however, is flexible for many additional organisms and primer schemes as long as the reference genome is "small" and "good enough." In 2022, @tives82 added in contributions for Monkeypox virus, including converting IDT's primer scheme to NC_063383.1 coordinates. We are grateful to everyone that has contributed to this repo.

The library preparation method greatly impacts which bioinformatic tools are recommended for creating a consensus sequence. For example, amplicon-based library preparation methods will required primer trimming and an elevated minimum depth for base-calling. Some bait-derived library preparation methods have a PCR amplification step, and PCR duplicates will need to be removed. This has added complexity and several (admittedly confusing) options to this workflow. Please submit an issue if/when you run into issues.

It is possible to use this workflow to simply annotate fastas generated from any workflow or downloaded from GISAID or NCBI. There are also options for multiple sequence alignment (MSA) and phylogenetic tree creation from the fasta files.

Cecret is also part of the staphb-toolkit.

Dependencies

Nextflow
Singularity or Docker - set the profile as singularity or docker during runtime

General Usage

The default usage of Cecret is to run on fastq files for SARS-CoV-2 sequencing.

nextflow run UPHL-BioNGS/Cecret -profile singularity --reads reads

There are, however, a lot of ways this workflow can be adjusted. Cecret does include 100+ parameters after all. There are also only so many words a typical end user is willing to read to understand how to adjust these parameters for their use case. We've divided this wiki into sections of reading that we think a typical user will be able to absorb, but please create an issue if something is unclear.

Typical use-cases with wiki pages:

A complete list of all params with their default values can be found in (Cecret/nextflow_schema.json)[https://github.com/UPHL-BioNGS/Cecret/blob/master/nextflow_schema.json]

Cecret is a nextflow workflow that strings together a variety of tools, and would not be possible without them.

aci - for depth estimation over amplicons
artic network - for aligning and consensus creation of nanopore reads
bwa - for aligning reads to the reference
fastp - for cleaning reads ; optional, faster alternative to seqyclean
fastqc - for QC metrics
freyja - for multiple SARS-CoV-2 lineage classifications
heatcluster - for visualization of a SNP matrix
igv-reports - for creating igv-reports for each suspected variant
iqtree2 - for phylogenetic tree generation (optional, relatedness must be set to "true")
ivar - calling variants and creating a consensus fasta; optional primer trimmer
kraken2 - for read classification
mafft - for multiple sequence alignment (optional, relatedness must be set to "true")
minimap2 - an alternative to bwa
multiqc - summary of results
nextalign - for phylogenetic tree generation (optional, relatedness must be set to "true", and msa must be set to "nextalign")
nextclade - for SARS-CoV-2 clade classification
pango-aliasor - to identify parent pangolin lineages
pangolin - for SARS-CoV-2 lineage classification
pangolincollapse - to identify parent pangolin lineages
phytreeviv - for visualization of the phylogenetic tree
samtools - for QC metrics and sorting; optional primer trimmer; optional converting bam to fastq files; optional duplication marking
seqyclean - for cleaning reads
snp-dists - for relatedness determination (optional, relatedness must be set to "true")
vadr - for annotating fastas like NCBI
viridian - for primer detection and trimming

alt text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the Cecret wiki!

Introduction

Dependencies

General Usage

Clone this wiki locally