You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
conda activate hmmer
cd ~/virus1/temp/MGV/mgv/viral_detection_pipeline
cp ~/virus/viwrap_result/paper/vOTUs.fna ./input/
#Call viral genes
./prodigal.linux -i input/vOTUs.fna -a input/vOTUs.faa -d input/vOTUs.ffn -p meta -f gff > input/vOTUs.gff #5251
#Output files include proteins (.faa), genes (.ffn), and gene coordinates (.gff). Genes called in metagenomic mode (-p meta).
#Run HMMER on imgvr and pfam databases
hmmsearch -Z 1 --cpu 16 --noali --tblout output/imgvr.out input/imgvr.hmm input/vOTUs.faa
hmmsearch -Z 1 --cut_tc --cpu 16 --noali --tblout output/pfam.out input/pfam.hmm input/vOTUs.faa
#Here the -Z 1 flag is specified to make E-values comparable between databases and between samples. The IMG/VR database contains viral genes while the Pfam database contains non-viral genes.
#Count genes hitting viral and microbial marker genes
python count_hmm_hits.py input/vOTUs.fna input/vOTUs.faa output/imgvr.out output/pfam.out > output/hmm_hits.tsv
#Each gene assigned according to its best hit with E-value <1e-10. We found that several of the HMMs from IMG/VR HMMs are commonly found in non-viral genomes and several of the HMMs from Pfam are commonly found in viral genomes. Those HMMs are excluded from the analysis.
#Run VirFinder
Rscript virfinder.R input/vOTUs.fna output/virfinder.tsv
#VirFinder scores each contig using a machine-learning algorithm based on kmers https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0283-5
#Quantify strand switch rate of genes
python strand_switch.py input/vOTUs.fna input/vOTUs.faa > output/strand_switch.tsv
#Here the code scans the proteins from each genome in genomic order. It counts the nuber of strand switches (+ to - or - to +) and divides by the total number of genes.
#Create master table of sequence features
python master_table.py output/hmm_hits.tsv output/virfinder.tsv output/strand_switch.tsv > output/master_table.tsv
#Predict viral contigs
python viral_classify.py \
--features output/master_table.tsv \
--in_base input/vOTUs \
--out_base output/vOTUs #vOTUs.tsv #最终得到2897个病毒
During the run no error was reported, all steps were 5251, after running the last step python viral_classify.py, finally only got 2897 viruses, in the literature the result was 4422, where is the problem, I see viral_classify.py used to a rules file classification_rules.tsv, is that file different from what you guys were using. Thanks for answering the questions!
The text was updated successfully, but these errors were encountered:
Your result sounds quite different. You could run a tool like Vibrant to get closer to the set of 4422 vOTUs used in the analysis (see article "Furthermore, we also parsed vOTUs using VIBRANT (v.1.2.1, default settings) and found a 4240/4422 (96%) viral prediction agreement.").
hi,
I download the vOTU.fna (There are 5251 vOTUs in there.)data in the text and used the MGV pipeline(https://github.com/snayfach/MGV/tree/master/viral_detection_pipeline), like this
During the run no error was reported, all steps were 5251, after running the last step python viral_classify.py, finally only got 2897 viruses, in the literature the result was 4422, where is the problem, I see viral_classify.py used to a rules file classification_rules.tsv, is that file different from what you guys were using. Thanks for answering the questions!
The text was updated successfully, but these errors were encountered: