plastome assembly from eDNA #318

GeertsManon · 2024-03-14T16:12:37Z

GeertsManon
Mar 14, 2024

Hello,

I'm working with eDNA extracted from water samples, aiming to assemble plastomes of diatoms. Initially, we were skeptical about the feasibility of this task due to the inherent complexities associated with eDNA samples, such as DNA fragmentation, short paired-end sequencing and the presence of multiple organisms. However, we seem to be making progress in assembling the plastome of a specific diatom species. It appears that this particular species was abundantly present compared to others at the time of sampling.

Consistently, I'm obtaining 5 to 7 scaffolds, each with a depth exceeding 40x, spanning the entire plastome. Some "contamination" is evident, possibly stemming from other diatom plastomes with lower coverage (below 10x), resulting in the generation of parallel contigs. However, increasing the word size to 210 seems to mitigate this issue. My standard parameters include the utilization of my own database comprising similar diatom species (-s and --genes), maximum read count, increased rounds (set to 40), and narrow extending steps (J = 1 and M = 1). Additionally, I'm employing kmers of 21, 45, 65, 85, 105, and 127.

Given that my reads are 220-250bp long, I'm considering testing a wider kmer range as well.

I'm currently investigating the reasons behind the occurrence of breaking points in the scaffolds to ascertain whether longer scaffolds can be retrieved. Experimenting with a smaller word size has resulted in the generation of a single longer scaffold, albeit with numerous smaller scaffolds and potential connections. Despite attempting semi-manual completion, I haven't been successful thus far.

I've explored various avenues already, but I'm open to new ideas or suggestions from anyone who might have additional insights.

Looking forward to your responses.

Cheers,
Manon

get_org.log.txt

JianjunJin · 2024-03-14T17:01:49Z

JianjunJin
Mar 14, 2024
Collaborator

Thanks for reaching out with such details.

Given the target coverage, I would stick with the relatively small word size values. What is the smallest word size that you have tried?

Should the issues persist, I recommend utilizing join_spades_fastg_by_blast.py (which you might have encountered in the FAQ) to introduce gaps between the breakpoints. This approach allows for an assessment of the potential gap lengths according to a reference. Subsequently, these results can serve as a foundation for further investigations, such as examining reads associated with these breakpoints or engaging in gap-closure efforts.

0 replies

GeertsManon · 2024-03-14T17:25:00Z

GeertsManon
Mar 14, 2024
Author

Thanks for your quick reply!

I've experimented with various word sizes, including 30, 60, 90, 120, 150, 180, and 210. Using a word size of 60, I managed to obtain the longest diatom scaffold, approximately 50,000 base pairs in length. However, the remaining sections of the plastome were fragmented into very small pieces, resulting in numerous potential connections and making it very challenging to distinguish our target sequences from those of other species. I tried the assembly-from-graph approach but consistently received the following error (even after altering --disentangle-time-limit):
INFO: Disentangling failed: RuntimeError: maximum recursion depth exceeded while calling a Python object

Interestingly, the resulting scaffolds (totaling 5) with word size 60 were nearly identical to those assembled with higher word sizes.

I will try out the join_spades_fastg_by_blast.py approach.

Thanks for your help!

Cheers,
Manon

3 replies

JianjunJin Mar 14, 2024
Collaborator

Could you provide the log and the graph of the -w 60 result?

GeertsManon Mar 14, 2024
Author

get_org.log.txt

These are both files + a zoom of X30 on the target sequences. As you can see, many other connections are made.
The scaffolds of length 48282 and 17040 bp are supposed to be the LSC.
The scaffolds of length 19645 and 7620 bp are supposed to be the SSC.
The scaffold of length 12318, 2934, 2521 bp (and some smaller ones) are supposed to be the IR.

I'm looking forward to your reply.
Thanks already for your time and effort!

Cheers,
Manon

GeertsManon Mar 14, 2024
Author

This is the selection:

GeertsManon · 2024-03-14T17:38:21Z

GeertsManon
Mar 14, 2024
Author

Do you know what this means?

get_organelle_from_assembly.py --expected-max-size 150000 -F other_pt -g extended_K105.assembly_graph.fastg.extend-RefPlastomes.label.selection.gfa -o ManualCorrection --overwrite --no-slim --disentangle-time-limit 28800

GetOrganelle v1.7.7.0

get_organelle_from_assembly.py isolates organelle genomes from assembly graph.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0]
PLATFORM: Linux tier2-p-login-3 4.18.0-513.9.1.el8_9.x86_64 #1 SMP Wed Nov 29 18:55:19 UTC 2023 x86_64 x86_64
PYTHON LIBS: GetOrganelleLib 1.7.7.0; numpy 1.24.3; sympy 1.12; scipy 1.10.1
DEPENDENCIES:
GETORG_PATH=/user/leuven/347/vsc34774/.GetOrganelle
WORKING DIR: /lustre1/scratch/347/vsc34774/DIATOM/6_Assembly_GetOrganelle/Plastomes/Cyclotella-cryptica/ALL-TrimmedReads_W120
/data/leuven/347/vsc34774/miniconda3/envs/GetOrganelle/bin/get_organelle_from_assembly.py --expected-max-size 150000 -F other_pt -g extended_K105.assembly_graph.fastg.extend-RefPlastomes.label.selection.gfa -o ManualCorrection --overwrite --no-slim --disentangle-time-limit 28800

2024-03-14 18:36:21,323 - INFO: Processing assembly graph ...
2024-03-14 18:36:21,353 - INFO: Processing assembly graph finished.

2024-03-14 18:36:21,354 - INFO: Extracting other_pt from the assemblies ...
2024-03-14 18:36:21,354 - INFO: Disentangling ManualCorrection/initial_assembly_graph.gfa as a circular genome ...
2024-03-14 18:36:22,663 - INFO: Disentangling failed: RuntimeError: maximum recursion depth exceeded while calling a Python object
2024-03-14 18:36:22,666 - INFO: Disentangling ManualCorrection/initial_assembly_graph.gfa as a circular genome ...
2024-03-14 18:36:23,984 - INFO: Disentangling failed: RuntimeError: maximum recursion depth exceeded while calling a Python object
2024-03-14 18:36:23,987 - INFO: Disentangling ManualCorrection/initial_assembly_graph.gfa as a/an other_pt-insufficient graph ...
2024-03-14 18:36:25,677 - INFO: Disentangling failed: RuntimeError: maximum recursion depth exceeded while calling a Python object
2024-03-14 18:36:25,681 - INFO: If you have questions for us, please provide us with the get_org.log.txt file and the post-slimming graph in the format you like!
2024-03-14 18:36:25,682 - INFO: Extracting other_pt from the assemblies failed.

Total cost 5.94 s
Thank you!

1 reply

JianjunJin Mar 14, 2024
Collaborator

It will be more helpful to have --verbose to debug.
But I believe it is generally caused by the complexity of the graph.
Could you provide the extended_K105.assembly_graph.fastg.extend-RefPlastomes.label.selection.gfa or its visualized form?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plastome assembly from eDNA #318

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

plastome assembly from eDNA #318

GeertsManon Mar 14, 2024

Replies: 3 comments · 4 replies

JianjunJin Mar 14, 2024 Collaborator

GeertsManon Mar 14, 2024 Author

JianjunJin Mar 14, 2024 Collaborator

GeertsManon Mar 14, 2024 Author

GeertsManon Mar 14, 2024 Author

GeertsManon Mar 14, 2024 Author

JianjunJin Mar 14, 2024 Collaborator

GeertsManon
Mar 14, 2024

Replies: 3 comments 4 replies

JianjunJin
Mar 14, 2024
Collaborator

GeertsManon
Mar 14, 2024
Author

JianjunJin Mar 14, 2024
Collaborator

GeertsManon Mar 14, 2024
Author

GeertsManon Mar 14, 2024
Author

GeertsManon
Mar 14, 2024
Author

JianjunJin Mar 14, 2024
Collaborator