The current pipeline takes a directory with gzipped fastq files (.fastq.gz) as input to a loader shell script (run_onh_pipeline.sh) which feeds them to a python script (onh_pipeline.py) which, using the python subprocess module executes shell commands in the order shown below:
- Trimmomatic
- Remove adapters
- Remove leading low quality or N bases
- Remove trailing low quality or N bases
- Scan the read with a n-base wide sliding window, cutting when the average quality per base drops below k
- Drop reads below a given length
- BwaMem
- local alignment
- SamFormatConverter
- Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.
- SortSam
- Sorts a SAM or BAM file
- MarkDuplicates
- Identifies duplicate reads.
- AddOrReplaceReadGroups
- Replace read groups in a BAM file
- BuildBamIndex
- Generates a BAM index ".bai" file.
- Mosdepth
- fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
- BaseRecalibrator
- Detect systematic errors in base quality scores
- PrintReads - Write out sequence read data (for filtering, merging, subsetting etc)
- VariantFiltration - Filter variant calls based on INFO and FORMAT annotations
- SelectVariants - Select a subset of variants from a larger callset
- HaplotypeCaller - Call germline SNPs and indels via local re-assembly of haplotypes
- GenotypeGVCFs - Perform joint genotyping on gVCF files produced by HaplotypeCaller
- VariantRecalibrator - Build a recalibration model to score variant quality for filtering purposes
- ApplyRecalibration - Apply a score cutoff to filter variants based on a recalibration table
- CalculateGenotypePosteriors - Calculate genotype posterior likelihoods given panel data
- VariantAnnotator
- Annotate variant calls with context information
- TableAnnovar - takes an input variant file (such as a VCF file) and generate a tab-delimited output file with many columns, each representing one set of annotations. Additionally, if the input is a VCF file, the program also generates a new output VCF file with the INFO field filled with annotation information.
- VcfAnno
- vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. I am using it to annotate
- gnomad minor allele frequency
- dbsnp ids
- Genmod - GENMOD is a simple to use command line tool for annotating and analyzing genomic variations in the VCF file format. GENMOD can annotate genetic patterns of inheritance in vcf:s with single or multiple families of arbitrary size.