plastome assembly from eDNA #318
Replies: 3 comments 4 replies
-
Thanks for reaching out with such details. Given the target coverage, I would stick with the relatively small word size values. What is the smallest word size that you have tried? Should the issues persist, I recommend utilizing |
Beta Was this translation helpful? Give feedback.
-
Thanks for your quick reply! I've experimented with various word sizes, including 30, 60, 90, 120, 150, 180, and 210. Using a word size of 60, I managed to obtain the longest diatom scaffold, approximately 50,000 base pairs in length. However, the remaining sections of the plastome were fragmented into very small pieces, resulting in numerous potential connections and making it very challenging to distinguish our target sequences from those of other species. I tried the assembly-from-graph approach but consistently received the following error (even after altering --disentangle-time-limit): Interestingly, the resulting scaffolds (totaling 5) with word size 60 were nearly identical to those assembled with higher word sizes. I will try out the join_spades_fastg_by_blast.py approach. Thanks for your help! Cheers, |
Beta Was this translation helpful? Give feedback.
-
Do you know what this means? get_organelle_from_assembly.py --expected-max-size 150000 -F other_pt -g extended_K105.assembly_graph.fastg.extend-RefPlastomes.label.selection.gfa -o ManualCorrection --overwrite --no-slim --disentangle-time-limit 28800 GetOrganelle v1.7.7.0 get_organelle_from_assembly.py isolates organelle genomes from assembly graph. Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] 2024-03-14 18:36:21,323 - INFO: Processing assembly graph ... 2024-03-14 18:36:21,354 - INFO: Extracting other_pt from the assemblies ... Total cost 5.94 s |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm working with eDNA extracted from water samples, aiming to assemble plastomes of diatoms. Initially, we were skeptical about the feasibility of this task due to the inherent complexities associated with eDNA samples, such as DNA fragmentation, short paired-end sequencing and the presence of multiple organisms. However, we seem to be making progress in assembling the plastome of a specific diatom species. It appears that this particular species was abundantly present compared to others at the time of sampling.
Consistently, I'm obtaining 5 to 7 scaffolds, each with a depth exceeding 40x, spanning the entire plastome. Some "contamination" is evident, possibly stemming from other diatom plastomes with lower coverage (below 10x), resulting in the generation of parallel contigs. However, increasing the word size to 210 seems to mitigate this issue. My standard parameters include the utilization of my own database comprising similar diatom species (-s and --genes), maximum read count, increased rounds (set to 40), and narrow extending steps (J = 1 and M = 1). Additionally, I'm employing kmers of 21, 45, 65, 85, 105, and 127.
Given that my reads are 220-250bp long, I'm considering testing a wider kmer range as well.
I'm currently investigating the reasons behind the occurrence of breaking points in the scaffolds to ascertain whether longer scaffolds can be retrieved. Experimenting with a smaller word size has resulted in the generation of a single longer scaffold, albeit with numerous smaller scaffolds and potential connections. Despite attempting semi-manual completion, I haven't been successful thus far.
I've explored various avenues already, but I'm open to new ideas or suggestions from anyone who might have additional insights.
Looking forward to your responses.
Cheers,
Manon
get_org.log.txt
Beta Was this translation helpful? Give feedback.
All reactions