Skip to content

Hybrid DeepVariant + GATK germline variant calling model for the detection of rare pathogenic germline variants

License

Notifications You must be signed in to change notification settings

dannyrabiz/HybridVC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Germline Variant Caller

Introduction

This repository hosts the code for a novel hybrid germline variant caller model which combines GATK and DeepVariant outputs for improving the detection accuracy of rare pathogenic variants in whole-exome sequencing (WES) and whole-genome sequencing (WGS) data.

The challenge of accurately identifying rare germline variants is significant in the detection and treatment of genetic diseases such as cancer. Despite the overall variant calling accuracy consistently above 99% in WES and WGS samples, the accuracy for rare variants decreases notably. The sensitivity and specificity of the leading germline variant callers, GATK and DeepVariant, has been shown to drop to approximately 70% and 50% respectively.

This model is designed to tackle this issue by developing a unique hybrid model of GATK and DeepVariant. The training and testing of the model has been conducted using a large cohort of patients with metastatic prostate cancer.

Installation

No installation is necessary. The scripts are written in Python, so a Python interpreter is required.

Requirements

  • Python 3.x
  • GATK v4.x
  • DeepVariant v1.x

Please ensure you have the correct versions of Python, GATK and DeepVariant installed before running the scripts.

Usage

To run the hybrid model, use the following command:

python Hybrid_Model.py /path/GATK/Sample.vcf /path/DeepVariant/Sample.vcf SampleID Output_Hybrid.vcf

The script requires four inputs:

  1. /path/GATK/Sample.vcf : File path for the VCF file generated by GATK.
  2. /path/DeepVariant/Sample.vcf : File path for the VCF file generated by DeepVariant.
  3. SampleID : A unique identifier for the sample being analyzed.
  4. Output_Hybrid.vcf : The desired filename for the output VCF file generated by the hybrid model.

Files

  • Hybrid_Model.py: This is the main script for the hybrid germline variant caller. It takes in the VCF files generated by GATK and DeepVariant, and outputs a VCF file with the predicted variants from the hybrid model.

Contact

For any questions or feedback, feel free to reach out to us.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

About

Hybrid DeepVariant + GATK germline variant calling model for the detection of rare pathogenic germline variants

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published