Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
raufs authored Aug 15, 2023
1 parent 4fc1057 commit e8d8694
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ pip install -e .

This program will perform dereplication of genomes using skani average nucleotide identity (ANI) and aligned fraction (AF) estimates and a dynamic programming based approach. It assesses pairwise ANI estimates and chooses which genomes to keep if they are deemed redundant to each other based on assembly N50 (keeping the more contiguous assembly) and connectedness (favoring genomes deemed similar to a greater number of alternate genomes).

Compared to [dRep](https://github.com/MrOlm/drep) by [Olm et al. 2017](https://www.nature.com/articles/ismej2017126) and [galah](https://github.com/wwood/galah), skDER does not use a divide-and-conquer approach based on primary clustering with MASH or dashing followed by greedy clustering of more precise ANI estimates (for instance computed using FastANI) in a secondary round. It leverages advances in accurate yet speedy ANI calculations by [skani](https://github.com/bluenote-1577/skani) by [Shaw and Yu](https://www.biorxiv.org/content/10.1101/2023.01.18.524587v2) to simply do one round of clustering and is primarily designed for selecting distinct genomes for a taxonomic group for comparative genomics rather than for metagenomic application.
Compared to [dRep](https://github.com/MrOlm/drep) by [Olm et al. 2017](https://www.nature.com/articles/ismej2017126) and [galah](https://github.com/wwood/galah), skDER does not use a divide-and-conquer approach based on primary clustering with MASH or dashing followed by greedy clustering/dereplication based on more precise ANI estimates (for instance computed using FastANI) in a secondary round. skDER instead leverages advances in accurate yet speedy ANI calculations by [skani](https://github.com/bluenote-1577/skani) by [Shaw and Yu](https://www.biorxiv.org/content/10.1101/2023.01.18.524587v2) to simply take a "one-round" approach. skDER is also primarily designed for selecting distinct genomes for a taxonomic group for comparative genomics rather than for metagenomic application.

It can still be used for metagenomic application if users are cautious and filter out MAGs which have high levels of contamination, which can be assessed using CheckM for instance. To support this application and in particular the realization that most MAGs likely suffer from incompleteness, we have introduced a parameter/cutoff for the max alignment fraction difference for each pair of genomes. For example, if the AF for genome 1 to genome 2 is 95% (95% of genome 1 is contained in genome 2) and the AF for genome 2 to genome 1 is 80%, then the difference is 15%. Because the default value for the difference cutoff is 10%, in that example the genome with the larger value will automatically be regarded as redundant and become disqualified as a potential representative genome.

Expand Down

0 comments on commit e8d8694

Please sign in to comment.