Binning question, how to use vamb? #54

ChaoXianSen · 2023-10-10T10:39:27Z

Hi !
Before operating phamb, i use vamb process binning,
Vamb rum mode: vamb --outdir output63
--fasta R63.contigs.fa.gz
--bamfiles R63_sort.bam
-o C

report err.log : Traceback (most recent call last):
File "/public/home/bioinfo_wang/00_software/miniconda3/envs/avamb/bin/vamb", line 33, in
sys.exit(load_entry_point('vamb', 'console_scripts', 'vamb')())
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 1395, in main
run(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 834, in run
cluster(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 665, in cluster
clusternumber, ncontigs = vamb.vambtools.write_clusters(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 440, in write_clusters
for clustername, contigs in clusters:
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 701, in binsplit
for newbinname, splitheaders in _split_bin(binname, headers, separator):
File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 676, in _split_bin
raise KeyError(f"Separator '{separator}' not in sequence label: '{header}'")
KeyError: "Separator 'C' not in sequence label: 'k141_84347'"

But, the reuslt contain ‘k141_84347 ’ ：
‘ less contignames |grep "k141_84347" -A2 -B2 ' --> 'k141_512747
k141_170723
k141_84347
k141_170724
k141_512748'

the vamb operation result file contain :
'0 Oct 9 23:52 vae_clusters.tsv # why the file is empty？
7.7M Oct 9 23:52 contignames
2.6M Oct 9 23:52 lengths.npz
41K Oct 9 23:52 log.txt
77M Oct 9 23:52 latent.npz
815K Oct 9 23:51 model.pt
894 Oct 9 14:40 mask.npz
2.3M Oct 9 14:40 abundance.npz
252M Oct 9 14:38 composition.npz'

Thanks!

joacjo · 2023-10-10T11:13:09Z

Hi Chao

I see, this will not fly. It looks like there is an issue in the steps where you run vamb, your contignames has to be named with a strict format that makes you able to identify the original sample origin of each sample.
When that works and you get a non zero vae_clusters.tsv file, the downstream step using phamb should be straight forward.

Maybe @enryH - Can help you identify what went wrong.

enryH · 2023-10-10T11:49:08Z

KeyError: "Separator 'C' not in sequence label: 'k141_84347'"

So what is your separator? I guess you know VAMB better than me:)

ChaoXianSen · 2023-10-10T11:51:53Z

the file R63.contigs.fa.gz format like this:

k141_514373 flag=1 multi=3.0000 len=356
CCATAAATCTGATTTTAGTCAAAAAAATATGCAGTTTTTCAAAAAGGGTGTATAATTCTTTCGTTACATGAAATATTTTGGAGGTGCTATTTTTATGAAAA
k141_0 flag=1 multi=2.0000 len=345
AGGTGAAGATGACCGAAGAGGAGATTAAGGCCCGTGAGTTTGCCAAAGCGGCGCAGAAGGAGAAGGAGGACCGTGAGGCCAAGAAAGCGCTGG

joacjo · 2023-10-24T14:21:39Z

Hi Chao - did you manage to run VAMB as described in the example on the VAMB Github repo? I think that will solve many of your issues posted here. I hope that you are trying to process more than just one sample of course.

Concatenate assemblies from different samples

python concatenate.py /path/to/catalogue.fna.gz /path/to/assemblies/sample1/contigs.fasta /path/to/assemblies/sample2/contigs.fasta  [ ... ]

Do the mapping

minimap2 -d catalogue.mmi /path/to/catalogue.fna.gz; # make index
minimap2 -t 8 -N 5 -ax sr catalogue.mmi --split-prefix mmsplit /path/to/reads/sample1.fw.fq.gz /path/to/reads/sample1.rv.fq.gz | samtools view -F 3584 -b --threads 8 > /path/to/bam/sample1.bam

run VAMB

vamb --outdir path/to/outdir --fasta /path/to/catalogue.fna.gz --bamfiles /path/to/bam/*.bam -o C

It will be hard to help if the naming of the contigs in the fasta file and in the clusters.tsv file are not correctly formatted.

ChaoXianSen · 2023-10-25T02:56:21Z

Hi Chao - did you manage to run VAMB as described in the example on the VAMB Github repo? I think that will solve many of your issues posted here. I hope that you are trying to process more than just one sample of course.

Concatenate assemblies from different samples
python concatenate.py /path/to/catalogue.fna.gz /path/to/assemblies/sample1/contigs.fasta /path/to/assemblies/sample2/contigs.fasta  [ ... ]
Do the mapping
minimap2 -d catalogue.mmi /path/to/catalogue.fna.gz; # make index
minimap2 -t 8 -N 5 -ax sr catalogue.mmi --split-prefix mmsplit /path/to/reads/sample1.fw.fq.gz /path/to/reads/sample1.rv.fq.gz | samtools view -F 3584 -b --threads 8 > /path/to/bam/sample1.bam
run VAMB
vamb --outdir path/to/outdir --fasta /path/to/catalogue.fna.gz --bamfiles /path/to/bam/*.bam -o C
It will be hard to help if the naming of the contigs in the fasta file and in the clusters.tsv file are not correctly formatted.

Dear@joacjo
Thanks for your reply! I have solved this problem.
But, I have another question:
Run the RF model, the result file 'resultdir/vamb_bins/vamb_bins.1.fna' , how to cluster to get the vOTUs ?
which tools and databse can we use to get the vOTUs ?

Thanks a lot !

joacjo · 2023-10-25T09:28:58Z

Hi Chao

The most straight forward to way to get vOTUs is following the approach described by the CheckV people

Copied from https://bitbucket.org/berkeleylab/checkv/src/master/

Rapid genome clustering based on pairwise ANI

First, create a blast+ database:
makeblastdb -in <my_seqs.fna> -dbtype nucl -out <my_db>

Next, use megablast from blast+ package to perform all-vs-all blastn of sequences:
blastn -query <my_seqs.fna> -db <my_db> -outfmt '6 std qlen slen' -max_target_seqs 10000 -o <my_blast.tsv> -num_threads 32

Next, calculate pairwise ANI by combining local alignments between sequence pairs:
anicalc.py -i <my_blast.tsv> -o <my_ani.tsv>

Finally, perform UCLUST-like clustering using the MIUVIG recommended-parameters (95% ANI + 85% AF):
aniclust.py --fna <my_seqs.fna> --ani <my_ani.tsv> --out <my_clusters.tsv> --min_ani 95 --min_tcov 85 --min_qcov 0

Relevant ani-scripts can be found the checkv repo.

ChaoXianSen · 2023-10-27T10:13:49Z

Dear@joacjo
Thanks a lot ! Thanks for your reminding. I'll go over the instructions of checkV in detail.

But i have another problem:
' hmmsearch --cpu 16
-E 1.0e-05
-o R1_output.txt
--tblout ./R1_annotations/all.hmmVOG.tbl
AllVOG.hmm ./R1_proteins.faa '

This step runs very slowly ,is there anything I can do to speed up ?

I provid 16CPUs and 240G memeoy. ( #SBATCH -n 16 #SBATCH --mem=240G )
the file 'AllVOG.hmm' is about 3.2G, file 'R1_proteins.faa' is about 630M;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binning question, how to use vamb? #54

Binning question, how to use vamb? #54

ChaoXianSen commented Oct 10, 2023

joacjo commented Oct 10, 2023

enryH commented Oct 10, 2023

ChaoXianSen commented Oct 10, 2023

joacjo commented Oct 24, 2023 •

edited

Loading

ChaoXianSen commented Oct 25, 2023

joacjo commented Oct 25, 2023 •

edited

Loading

ChaoXianSen commented Oct 27, 2023

Binning question, how to use vamb? #54

Binning question, how to use vamb? #54

Comments

ChaoXianSen commented Oct 10, 2023

joacjo commented Oct 10, 2023

enryH commented Oct 10, 2023

ChaoXianSen commented Oct 10, 2023

joacjo commented Oct 24, 2023 • edited Loading

ChaoXianSen commented Oct 25, 2023

joacjo commented Oct 25, 2023 • edited Loading

ChaoXianSen commented Oct 27, 2023

joacjo commented Oct 24, 2023 •

edited

Loading

joacjo commented Oct 25, 2023 •

edited

Loading