-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourmash tax question - genus level matches? #3497
Comments
hi @jodyphelan, some hot takes per @bluegenes (who wrote this part of tax):
You might also consider using k=21 to get better genus level matching and/or use a protein molecule type/sketch (but we don't provide standardized protein databases yet, so you'd have to sketch your own matching genomes). |
Thanks @ctb I've managed to boost the proportion in Mycobacterium to upt 11% by making k=21
|
@jodyphelan this looks great! I will say that an 11.1% overlap in k=21 space between microbial genomes is actually surprisingly stringent (despite looking "low") and would match ~family or genus level, as the cANI suggests. |
If a hypothetical new species was sequenced which had max 85% with all other species in the genus, would it be possible with sourmash to identify that it belonged to that genus?
To give a more concrete example, this is the output from gather with
gtdb-rs207.genomic-reps.dna.k31.zip
:The text was updated successfully, but these errors were encountered: