Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How virus abundance is calculated? #3

Open
AndAvia opened this issue Oct 31, 2024 · 0 comments
Open

How virus abundance is calculated? #3

AndAvia opened this issue Oct 31, 2024 · 0 comments

Comments

@AndAvia
Copy link

AndAvia commented Oct 31, 2024

hi there,
You guys did a great job with the visualization part of the article and I've been reproducing it lately. But I don't know how the viral abundance files you provided (Figures 2 and 3 viral abundance matrix..txt) were obtained. I checked your code.
`

  makeblastdb -in HQMQLQ.bins.fna -dbtype nucl -out HQMQLQdb
  blastn -query HQMQLQ.bins.fna -db HQMQLQdb -outfmt '6 std qlen slen' \
  -max_target_seqs 10000 -perc_identity 90 \
  -out virusxvirus.m6 -num_threads 40

  python /app/anicalc.py -i virusxvirus.m6 -o virusxvirus.ani.tsv
  python /app/aniclust.py --fna HQMQLQ.bins.fna --ani virusxvirus.ani.tsv --out virus.clusters --min_ani 95 --min_qcov 0 --min_tcov 85
  
  cut -f1 virus.clusters  > viralreps.txt
  seqtk subseq HQMQLQ.bins.fna viralreps.txt > viralreps.fna 

  ### Combine with MGVdb 
  pigz -fdc ${MGVdb} > mgvreps.fna

  #cat viralreps.fna mgvreps.fna > viruses.fna
  cat viralreps.fna > viruses.fna
  makeblastdb -in viruses.fna -dbtype nucl -out HQMQLQdbreps 

  ### Blast Genecatalogue against the viral references
  gunzip -fdc ${Genecatalogue} > MGX.genecat.fna

  blastn -task megablast -db HQMQLQdbreps -query MGX.genecat.fna -out HQMQLQ.genecat.m6 -outfmt 6 -num_threads 40 \
  -evalue 0.01 -qcov_hsp_perc 80 -perc_identity 90 -word_size 20 -reward 2 -penalty -3 -max_target_seqs 10 -max_hsps 1

`Question 1: I would like to know how to get the MGX.genecat.fna file inside?

Rscript --vanilla --slave Rscripts/get_viral_abundance_matrix.R HQMQLQ.genecat.m6 TPMmatrix.txt matrices

Question 2: I would like to know if the TPMmatrix.txt file for this R script input was obtained by directly quantifying the 5251 vOTUs with coverm, or was it obtained by some other method?

I would like to follow your quantitative approach to my own data,so your Q&A is very important to me, thank you most sincerely!

@AndAvia AndAvia closed this as completed Nov 4, 2024
@AndAvia AndAvia reopened this Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant