Skip to content

Latest commit

 

History

History
161 lines (129 loc) · 7.84 KB

README.md

File metadata and controls

161 lines (129 loc) · 7.84 KB

VNtyper - A pipeline to genotype MUC1-VNTR

Genotyping MUC1 coding-VNTR in ADTKD-MUC1 using short-read sequencing (SRS) data. Vntyper pipeline embedded two different variant calling algorithms:

Installation & Requirements

The tool can be downloaded by cloning from the github page:

# Make a directory that you want to download VNtyper
mkdir vntyper
git clone https://github.com/hassansaei/VNtyper.git
# Go to the directory that you downloaded the source code
cd VNtyper

The following command will automatically download and install all prerequisites:

chmod u+x install_prerequisites.sh
bash install_prerequisites.sh or ./install_prerequisites.sh

The requeirments are as follows:

  1. Python >= 3.9 and libraries

    • Pandas pip3 install pandas
    • numpy pip3 install numpy
    • regex pip3 install regex
    • biopython pip3 install biopython
    • setuptools==58 pip3 install setuptools
    • pysam pip3 install pysam
  2. Install (BWA)

  3. Download chr1.fa file form (UCSC genome browser)

  4. Index fasta file with BWA

  5. Install Singularity

  6. Download (Kestrel)

  7. Building singularity image for code-adVNTR

  8. Download (VNTR database) for code-adVNTR

  9. The MUC1 VNTR motif dictionary and index files are provided in the File directory

VNtyper docker image

Docker images is also provided and can be pulled from docker hub. You have to make a directory to store both you inputs and outputs in the host machine. The instructions for installing docker on linux can be found (here)

mkdir shared
sudo docker pull saei/vntyper:1.2.0

The image files can also be downloaded and loaded via:

  1. (VNtyper_1.2.0)
Sudo docker load Docker_VNtyper_1.2.0.tar
# Or use the scripts below:
cat Docker_vntyper_v1.2.0.tar | docker import - vntyper:1.2.0

Run docker with only the kmer method:

sudo docker run --rm -it -v /PATH to the shared directory/shared:/SOFT/shared saei/vntyper:1.0.0 \
-t 8 --bam  -p /SOFT/VNtyper/  -ref  /SOFT/VNtyper/Files/chr1.fa  \
-ref_VNTR /SOFT/VNtyper/Files/MUC1-VNTR.fa \
-a /SOFT/shared/SAAMPLE.bam -t 8 -w /SOFT/shared/ -o SAMPLE_NAME --ignore_advntr

Run docker with both methods:


sudo docker run --rm -it -v /PATH to the shared directory/shared:/SOFT/shared saei/vntyper:1.0.0 \
-t 8 --bam  -p /SOFT/VNtyper/  -ref  /SOFT/VNtyper/Files/chr1.fa  \
-ref_VNTR /SOFT/VNtyper/Files/MUC1-VNTR.fa  -m /SOFT/VNtyper/Files/hg19_genic_VNTRs.db \
-a /SOFT/shared/SAMPLE.bam -t 8 -w /SOFT/shared/ -o SAMPLE_NAME

Execution

Use following command to see the help for running the tool.

python3 VNtyper.py --help 

usage: VNtyper_FV.py [-h] -ref Referense [-r1 FASTQ1] [-r2 FASTQ2] -o OUTPUT -ref_VNTR Referense [-t THREADS] -p TOOLS_PATH -w WORKING_DIR [-m REFERENCE_VNTR]
                     [--ignore_advntr] [--bam] [--fastq] [-a ALIGNMENT]

Given raw fastq files, this pipeline genotype MUC1-VNTR using kestrel (Mapping-free genotyping) and Code-adVNTR mathods

optional arguments:
  -h, --help            show this help message and exit
  -ref Referense, --reference_file Referense
                        FASTA-formatted reference file and indexes
  -r1 FASTQ1, --fastq1 FASTQ1
                        Fastq file first pair
  -r2 FASTQ2, --fastq2 FASTQ2
                        Fastq file second pair
  -o OUTPUT, --output OUTPUT
                        Output file name
  -ref_VNTR Referense, --reference_VNTR Referense
                        MUC1-specific reference file
  -t THREADS, --threads THREADS
                        Number of threads (CPU)
  -p TOOLS_PATH, --tools_path TOOLS_PATH
                        Path to the VNtyper directory
  -w WORKING_DIR, --working_dir WORKING_DIR
                        the path to the output
  -m REFERENCE_VNTR, --reference_vntr REFERENCE_VNTR
                        adVNTR reference vntr database
  --ignore_advntr       Skip adVNTR genotyping of MUC1-VNTR
  --bam                 BAM file as an input
  --fastq               Paired-end fastq files as an input
  -a ALIGNMENT, --alignment ALIGNMENT
                        Alignment File (with an index file .bai)


[Note] Since the program uses python3.9 logging system, it can not be executed using lower versions of the python.

Running only kmer-based genotyping:

python3 VNtyper.py --bam -ref Files/chr1.fa -a SAMPLE.bam -o SAMPLE_NAME -ref_VNTR Files/MUC1-VNTR.fa -t Threads -p VNtyper/ -w WORKING_DIRECTORY --ignore_advntr

[Note] This algorithm is far more faster than the second method.

Running both genotyping methods:

python3 VNtyper.py --bam -ref Files/chr1.fa -a SAMPLE.bam -o SAMPLE_NAME -ref_VNTR Files/MUC1-VNTR.fa -t Threads -p VNtyper/  -w WORKING_DIRECTORY -m Files/vntr_data/hg19_genic_VNTRs.db

[Note] This algorithm is far more slower than the first method.

Results from high-coverage 1000G project

We analyzed MUC1 region in 2300 samples from 1000G 30X project. The results from this analysis could be found (here)

Evaluating MUC1 VNTR region coverage using samtools

for f in *.bam; do  samtools depth -b MUC_hg19.bed $f | awk '{sum+=$3} END { print sum/NR}' > $f.coverage; done

MUC_hg19.bed is provide. MUC1_hg19.bed could also be replaced by : -r chr1:155160500-155162000

Sample bam files MUC1 8C positive

Here we provided five (example_1.bam to example_5.bam) MUC1 8C positive bam files for evaluation. Link to bam files: (Bam) Example_1 to 3 from NTI cohort and example_4 and 5 from renome cohort.

Output

The tool creates a folder for each case in the working directory which is assigned by the user. Inside the folder there is directory for temporary files and log files, and the final output:

  • Temp folder: Fastp QC report (.html) and log file for VNtyper
  • The output of VNtyper '*_Final_result.tsv'

The Kestrel output is a VCF file, which is proceessed by VNtyper and final result is stored in *_Final_result.tsv. The result file contains information for the motifs, varinant types, position of the varinat and its corresponding depth. The output for code-adVNTR is a bed or vcf file with varinat information and Pvalue.

##NOTE: This tool is for research use only.

##NOTE: Clinically boosted WES data should be used to genotype MUC1 VNTR using WES data.

Citations

  1. Saei H, Morinière V, Heidet L, Gribouval O, Lebbah S, Tores F, Mautret-Godefroy M, Knebelmann B, Burtey S, Vuiblet V, Antignac C, Nitschké P, Dorval G. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data in autosomal dominant tubulointerstitial kidney disease. iScience. 2023 Jun 17;26(7):107171. doi: 10.1016/j.isci.2023.107171. PMID: 37456840; PMCID: PMC10338300.
  2. Peter A Audano, Shashidhar Ravishankar, Fredrik O Vannberg, Mapping-free variant calling using haplotype reconstruction from k-mer frequencies, Bioinformatics, Volume 34, Issue 10, May 2018, Pages 1659–1665, https://doi.org/10.1093/bioinformatics/btx753
  3. Park J, Bakhtiari M, Popp B, Wiesener M, Bafna V. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience. 2022 Jul 19;25(8):104785. doi: 10.1016/j.isci.2022.104785. PMID: 35982790; PMCID: PMC9379575.