This repository hosts the code for a novel hybrid germline variant caller model which combines GATK and DeepVariant outputs for improving the detection accuracy of rare pathogenic variants in whole-exome sequencing (WES) and whole-genome sequencing (WGS) data.
The challenge of accurately identifying rare germline variants is significant in the detection and treatment of genetic diseases such as cancer. Despite the overall variant calling accuracy consistently above 99% in WES and WGS samples, the accuracy for rare variants decreases notably. The sensitivity and specificity of the leading germline variant callers, GATK and DeepVariant, has been shown to drop to approximately 70% and 50% respectively.
This model is designed to tackle this issue by developing a unique hybrid model of GATK and DeepVariant. The training and testing of the model has been conducted using a large cohort of patients with metastatic prostate cancer.
No installation is necessary. The scripts are written in Python, so a Python interpreter is required.
- Python 3.x
- GATK v4.x
- DeepVariant v1.x
Please ensure you have the correct versions of Python, GATK and DeepVariant installed before running the scripts.
To run the hybrid model, use the following command:
python Hybrid_Model.py /path/GATK/Sample.vcf /path/DeepVariant/Sample.vcf SampleID Output_Hybrid.vcf
The script requires four inputs:
/path/GATK/Sample.vcf
: File path for the VCF file generated by GATK./path/DeepVariant/Sample.vcf
: File path for the VCF file generated by DeepVariant.SampleID
: A unique identifier for the sample being analyzed.Output_Hybrid.vcf
: The desired filename for the output VCF file generated by the hybrid model.
Hybrid_Model.py
: This is the main script for the hybrid germline variant caller. It takes in the VCF files generated by GATK and DeepVariant, and outputs a VCF file with the predicted variants from the hybrid model.
For any questions or feedback, feel free to reach out to us.
This project is licensed under the MIT License - see the LICENSE.md file for details.