Skip to content

Latest commit

 

History

History
64 lines (56 loc) · 5 KB

README.md

File metadata and controls

64 lines (56 loc) · 5 KB

onh_pipeline

The current pipeline takes a directory with gzipped fastq files (.fastq.gz) as input to a loader shell script (run_onh_pipeline.sh) which feeds them to a python script (onh_pipeline.py) which, using the python subprocess module executes shell commands in the order shown below:

  1. Trimmomatic
    • Remove adapters
    • Remove leading low quality or N bases
    • Remove trailing low quality or N bases
    • Scan the read with a n-base wide sliding window, cutting when the average quality per base drops below k
    • Drop reads below a given length
  2. BwaMem
    • local alignment

Picard

  1. SamFormatConverter
    • Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.
  2. SortSam
    • Sorts a SAM or BAM file
  3. MarkDuplicates
    • Identifies duplicate reads.
  4. AddOrReplaceReadGroups
    • Replace read groups in a BAM file
  5. BuildBamIndex
    • Generates a BAM index ".bai" file.
  6. Mosdepth
    • fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

GATK

  1. BaseRecalibrator
    • Detect systematic errors in base quality scores
  2. PrintReads - Write out sequence read data (for filtering, merging, subsetting etc)
  3. VariantFiltration - Filter variant calls based on INFO and FORMAT annotations
  4. SelectVariants - Select a subset of variants from a larger callset
  5. HaplotypeCaller - Call germline SNPs and indels via local re-assembly of haplotypes
  6. GenotypeGVCFs - Perform joint genotyping on gVCF files produced by HaplotypeCaller
  7. VariantRecalibrator - Build a recalibration model to score variant quality for filtering purposes
  8. ApplyRecalibration - Apply a score cutoff to filter variants based on a recalibration table
  9. CalculateGenotypePosteriors - Calculate genotype posterior likelihoods given panel data
  10. VariantAnnotator
    • Annotate variant calls with context information

  1. TableAnnovar - takes an input variant file (such as a VCF file) and generate a tab-delimited output file with many columns, each representing one set of annotations. Additionally, if the input is a VCF file, the program also generates a new output VCF file with the INFO field filled with annotation information.
  2. VcfAnno - vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. I am using it to annotate
    1. gnomad minor allele frequency
    2. dbsnp ids
  3. Genmod - GENMOD is a simple to use command line tool for annotating and analyzing genomic variations in the VCF file format. GENMOD can annotate genetic patterns of inheritance in vcf:s with single or multiple families of arbitrary size.