Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstate obsoleted commands e.g. calc_distmx, fastx_getseqs etc. #8

Open
KasperSkytte opened this issue Jun 18, 2024 · 8 comments
Open

Comments

@KasperSkytte
Copy link

Can the fastx_getseqs be included too? Seems it has been removed compared to version 11:
Unknown command-line option -fastx_getseqs

@KasperSkytte KasperSkytte changed the title fastx_getseqs include fastx_getseqs Jun 18, 2024
@rcedgar rcedgar changed the title include fastx_getseqs Reinstate obsoleted commands e.g. calc_distmx, fastx_getseqs etc. Jun 18, 2024
@rcedgar
Copy link
Owner

rcedgar commented Jun 18, 2024

As noted in README.md:

Compared to earlier versions, functionality which is sufficiently covered by other open-source projects has been removed. In particular, there is no support for OTU table manipulation or diversity analysis which is well supported by other tools such as QIIME and DADA2. The goal here is to simplify the package as much as reasonably possible to encourage collaborators to join the open-source project.

Binaries for usearch versions 5 though 11 are provided at https://github.com/rcedgar/usearch_old_binaries/, licensed under CC0-1.0 (public domain). There are no plans to provide source code for the older versions.

If you find the obsoleted commands useful, then you can use the older binaries.

@KasperSkytte
Copy link
Author

Alright. I would just consider all the fastx_* commands similarly, some are included. But anyways, I'll use an earlier version then, thanks.

@rcedgar
Copy link
Owner

rcedgar commented Jun 18, 2024

Let's leave this one open so that people can see it, this is already the second time a similar issue has been opened.

@rcedgar rcedgar reopened this Jun 18, 2024
@hjarnek
Copy link

hjarnek commented Sep 8, 2024

Hi @rcedgar! What about otutab_xtalk? I have not seen another software with similar functionality of the UNCROSS2 algorithm. Or do you not recommend using it anymore?

@rcedgar
Copy link
Owner

rcedgar commented Sep 8, 2024

I think cross-talk is too hard to measure accurately, and you cannot filter it without losing too many low-abundance species. I don't use it myself, but it's all a judgement call, it's impossible to account for all the errors and biases in amplicon sequencing so it's totally up to the user what they feel comfortable with. If you think it is useful, you can use one of the older binaries, they are all licensed in the public domain now.

@hjarnek
Copy link

hjarnek commented Sep 8, 2024

I think cross-talk is too hard to measure accurately, and you cannot filter it without losing too many low-abundance species. I don't use it myself, but it's all a judgement call, it's impossible to account for all the errors and biases in amplicon sequencing so it's totally up to the user what they feel comfortable with. If you think it is useful, you can use one of the older binaries, they are all licensed in the public domain now.

Ok, but many people just use a static cutoff value on read abundances to filter out crosstalk. I work with metazoan metabarcoding, and because of multicellularity I think compositional analysis is very rough. I'm happy with just presence/absence, basically. I understand that when you do real microbial compositional analysis, whether there are a handful or no reads of certain OTUs doesn't matter that much, so skipping cross-talk filtering altogether isn't a big deal. But for presence/absence, it does matter a lot more. Surely, using UNCROSS2 with lenient parameters must be better than just applying a static read-cutoff and discard all abundances less than say 10 for example?

And speaking of which... I can't find any documentation on how to adjust the parameters of otutab_xtalk in USEARCH11, although you talk a bit about the effects of adjusting them in your paper. Am I missing something?

@rcedgar
Copy link
Owner

rcedgar commented Sep 8, 2024

Setting an abundance cutoff makes a trade-off between FPs and FNs where in general you cannot get a meaningful estimate of the FP rate or the FN rate. Perhaps you could estimate rates from mock community control samples with a strongly skewed abundance curve, but including control samples seems not to be a widely accepted practice, and it's unclear to me how dependent cross-talk rates are on the index sequences plus abundance biases such as operon count, primer differences and so on. So I don't see a principled way to set parameters or make a recommendation, it seems to me it is up to the user (plus PIs, referees, editors, funders etc.) to figure out what seems reasonable to them.

@hjarnek
Copy link

hjarnek commented Sep 8, 2024

[...] but including control samples seems not to be a widely accepted practice

Yeah, unfortunately not. And I don't have any control over what happens in the wet lab to generate the data I work with. So given there are no mock samples, using UNCROSS2 still seems better than applying a strict read abundance cutoff, right? Is it possible to adjust the parameters of otutab_xtalk in USEARCH11, like s, Nmin and fmax, to make it more lenient? Maybe if the source code was included here, it could be a starting point for people to tweak according to their needs? I just want to get away from the static abundance cutoffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants