You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a talk with @shiraz-shah, we should investigate if we can estimate the abundance more accurately.
He recommended using msamtools with 80% query coverage, 95% id over 80 bp, and using msamtools profile to estimate abundances. This is supposedly more accurate than the naive read counting that CoverM does.
We do not yet know if more accurate abundance estimation actually leads to a better binning. So, we should test the following and compare to our current defaults:
95% id over 80 bp, 80% query coverage filter with CoverM
The same filters can be applied using CoverM with coverm contig --min-read-aligned-length 80 --min-read-percent-identity 95 --min-read-aligned-percent 80
@Las02 if you have time, it would be good to also test this (lower priority than the current strobealign tests)
The text was updated successfully, but these errors were encountered:
In our experience, the individual abundance estimates for each contig without qc'ing mappings with msamtools filter are 50% noise, and 50% signal. In terms of presence/absence, ten times as many contigs are found in a sample if the above qc'ing is not applied. We have benchmarked this using the CAMI data set and we can see that it's all noise.
Additionally, msamtools profile iteratively redistributes ambiguous read mappings to the correct contig based on its unique matches. So if you have two contigs that are 95% identical, normally they would both get reads assigned, but msatools can tell if one contig or the other is present by looking at the reads that map to the dissimilar portions of the contigs. If both are present, their abundances are tweaked so they become more accurate.
These two steps will make your abudances much much more accurate, and I can't help but wonder whether it would make VAMB more accurate than the competition, all of a sudden.
After a talk with @shiraz-shah, we should investigate if we can estimate the abundance more accurately.
He recommended using msamtools with 80% query coverage, 95% id over 80 bp, and using
msamtools profile
to estimate abundances. This is supposedly more accurate than the naive read counting that CoverM does.We do not yet know if more accurate abundance estimation actually leads to a better binning. So, we should test the following and compare to our current defaults:
Probably the msamtools pipeline should be:
The same filters can be applied using CoverM with
coverm contig --min-read-aligned-length 80 --min-read-percent-identity 95 --min-read-aligned-percent 80
@Las02 if you have time, it would be good to also test this (lower priority than the current strobealign tests)
The text was updated successfully, but these errors were encountered: