onh_pipeline

The current pipeline takes a directory with gzipped fastq files (.fastq.gz) as input to a loader shell script (run_onh_pipeline.sh) which feeds them to a python script (onh_pipeline.py) which, using the python subprocess module executes shell commands in the order shown below:

Trimmomatic
- Remove adapters
- Remove leading low quality or N bases
- Remove trailing low quality or N bases
- Scan the read with a n-base wide sliding window, cutting when the average quality per base drops below k
- Drop reads below a given length
BwaMem
- local alignment

Picard

SamFormatConverter
- Convert a BAM file to a SAM file, or SAM to BAM. Input and output formats are determined by file extension.
SortSam
- Sorts a SAM or BAM file
MarkDuplicates
- Identifies duplicate reads.
AddOrReplaceReadGroups
- Replace read groups in a BAM file
BuildBamIndex
- Generates a BAM index ".bai" file.
Mosdepth
- fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

GATK

BaseRecalibrator
- Detect systematic errors in base quality scores
PrintReads - Write out sequence read data (for filtering, merging, subsetting etc)
VariantFiltration - Filter variant calls based on INFO and FORMAT annotations
SelectVariants - Select a subset of variants from a larger callset
HaplotypeCaller - Call germline SNPs and indels via local re-assembly of haplotypes
GenotypeGVCFs - Perform joint genotyping on gVCF files produced by HaplotypeCaller
VariantRecalibrator - Build a recalibration model to score variant quality for filtering purposes
ApplyRecalibration - Apply a score cutoff to filter variants based on a recalibration table
CalculateGenotypePosteriors - Calculate genotype posterior likelihoods given panel data
VariantAnnotator
- Annotate variant calls with context information

TableAnnovar - takes an input variant file (such as a VCF file) and generate a tab-delimited output file with many columns, each representing one set of annotations. Additionally, if the input is a VCF file, the program also generates a new output VCF file with the INFO field filled with annotation information.
VcfAnno - vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. I am using it to annotate
1. gnomad minor allele frequency
2. dbsnp ids
Genmod - GENMOD is a simple to use command line tool for annotating and analyzing genomic variations in the VCF file format. GENMOD can annotate genetic patterns of inheritance in vcf:s with single or multiple families of arbitrary size.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
batch_all.txt		batch_all.txt
genmod.temp		genmod.temp
gvcfss_complete.txt		gvcfss_complete.txt
nohup.out		nohup.out
qc_test.txt		qc_test.txt
recalibrate_INDEL.recal		recalibrate_INDEL.recal
recalibrate_INDEL.recal.idx		recalibrate_INDEL.recal.idx
recalibrate_INDEL.tranches		recalibrate_INDEL.tranches
recalibrate_INDEL_plots.R		recalibrate_INDEL_plots.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

onh_pipeline

Picard

GATK

About

Releases

Packages

Languages

cobriniklab/onh_pipeline

Folders and files

Latest commit

History

Repository files navigation

onh_pipeline

Picard

GATK

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages