-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinstate obsoleted commands e.g. calc_distmx, fastx_getseqs etc. #8
Comments
As noted in README.md: Compared to earlier versions, functionality which is sufficiently covered by other open-source projects has been removed. In particular, there is no support for OTU table manipulation or diversity analysis which is well supported by other tools such as QIIME and DADA2. The goal here is to simplify the package as much as reasonably possible to encourage collaborators to join the open-source project. Binaries for usearch versions 5 though 11 are provided at https://github.com/rcedgar/usearch_old_binaries/, licensed under CC0-1.0 (public domain). There are no plans to provide source code for the older versions. If you find the obsoleted commands useful, then you can use the older binaries. |
Alright. I would just consider all the |
Let's leave this one open so that people can see it, this is already the second time a similar issue has been opened. |
Hi @rcedgar! What about |
I think cross-talk is too hard to measure accurately, and you cannot filter it without losing too many low-abundance species. I don't use it myself, but it's all a judgement call, it's impossible to account for all the errors and biases in amplicon sequencing so it's totally up to the user what they feel comfortable with. If you think it is useful, you can use one of the older binaries, they are all licensed in the public domain now. |
Ok, but many people just use a static cutoff value on read abundances to filter out crosstalk. I work with metazoan metabarcoding, and because of multicellularity I think compositional analysis is very rough. I'm happy with just presence/absence, basically. I understand that when you do real microbial compositional analysis, whether there are a handful or no reads of certain OTUs doesn't matter that much, so skipping cross-talk filtering altogether isn't a big deal. But for presence/absence, it does matter a lot more. Surely, using UNCROSS2 with lenient parameters must be better than just applying a static read-cutoff and discard all abundances less than say 10 for example? And speaking of which... I can't find any documentation on how to adjust the parameters of |
Setting an abundance cutoff makes a trade-off between FPs and FNs where in general you cannot get a meaningful estimate of the FP rate or the FN rate. Perhaps you could estimate rates from mock community control samples with a strongly skewed abundance curve, but including control samples seems not to be a widely accepted practice, and it's unclear to me how dependent cross-talk rates are on the index sequences plus abundance biases such as operon count, primer differences and so on. So I don't see a principled way to set parameters or make a recommendation, it seems to me it is up to the user (plus PIs, referees, editors, funders etc.) to figure out what seems reasonable to them. |
Yeah, unfortunately not. And I don't have any control over what happens in the wet lab to generate the data I work with. So given there are no mock samples, using UNCROSS2 still seems better than applying a strict read abundance cutoff, right? Is it possible to adjust the parameters of |
Can the
fastx_getseqs
be included too? Seems it has been removed compared to version 11:Unknown command-line option -fastx_getseqs
The text was updated successfully, but these errors were encountered: