-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a swarm-plugin for Qiime 2 #89
Comments
Sounds like a good idea! |
Any update ? |
With swarm 3.0 fast approaching (#122), the increasing popularity of Exact Sequence Variants, and the publication of the Qiime 2 paper, this might be the perfect time to build a q2-swarm plugin. 🚀 |
@colinbrislawn you are right, but I don't really know where to begin. Would you help me kickstart that plugin? |
Thanks @frederic-mahe! I'm honored you reached out to me, but I'm not sure where to begin either. I guess I would look to the q2-vsearch plugin as a template, then build from there. https://github.com/qiime2/q2-vsearch @thermokarst, could you make us an official q2-swarm repo and invite us as contributors? |
Hey there @colinbrislawn! This plugin idea sounds really interesting, and good news, no need for us to make you a repo! Since QIIME 2 is decentralized, you can create the plugin wherever you want, then you can share it with users by registering it at the QIIME 2 Library! The Library entry can contain instructions letting users know how to get your plugin and install it. |
Summary of steps in Fred's-metabarcoding-pipeline, as I understand it, and what's already wrapped in Qiime2:
This is a fully featured pipeline that differs from what's already in Qiime2 in a number of ways. Specifically the per-sample derep... One easy way forward is to make a q2-swarm plugin that replaces only the vsearch cluster-features-de-novo. This is in contrast to the DADA2 plugin that implements its full, unique SOP. Either way, adding |
I should have pointed that sooner, here is my current swarm-based pipeline. The way the pipeline is described (and scripts numbered) might be confusing. The beginning is quite similar to the old pipeline you were referring to:
I realize that replicating the whole pipeline in Qiime2 might not be easy, so I agree we should aim for an easier first target.
In my own work, I only use list_local_clusters() {
# retain only clusters with more than 2 reads
# (do not use the fastidious option here)
${SWARM} \
--differences 1 \
--threads "${THREADS}" \
--usearch-abundance \
--log /dev/null \
--output-file /dev/null \
--statistics-file - \
"${SAMPLE}.fas" | \
awk 'BEGIN {FS = OFS = "\t"} $2 > 2' > "${SAMPLE}.stats"
} clustering() {
# swarm 3 or more recent
"${SWARM}" \
--differences 1 \
--fastidious \
--usearch-abundance \
--threads "${THREADS}" \
--internal-structure "${OUTPUT_STRUCT}" \
--output-file "${OUTPUT_SWARMS}" \
--statistics-file "${OUTPUT_STATS}" \
--seeds "${OUTPUT_REPRESENTATIVES}" \
"${FINAL_FASTA}" 2> "${OUTPUT_LOG}"
} The input file is a dereplicated fasta file with abundance annotations ( |
Thank you, this is extremely helpful! I like the idea of starting small with the q2-swarm plugin.
Naturally! I'm not sure how best to track feature counts through per-sample derep and clustering. I understand what per-sample derep does and why it's faster to do this double-derep step. We could get counts for the feature table by remapping reads, like we did historically, but that loses the efficiency of the pre-sample derep and ignores the internal structure of the swarms. |
My pipeline must be confusing for anyone else but me, sorry about that. The loop processes each pair of fastq files in 6 steps:
The double-derep step allows me to keep track of the origin of each unique sequence. The fasta files are parsed when building the final occurrence table.
|
Qiime 2 now offers an interface for third-party plugins. The plugin creation does not seem complicated: the plugin is a python 3 wrapper presenting some or all the functionalities of swarm.
The text was updated successfully, but these errors were encountered: