-
Notifications
You must be signed in to change notification settings - Fork 2
9. details on new modules: lsaBGC‐Reconcile and lsaBGC‐Sociate
lsaBGC-Reconcile constructs gene phylogenies for all orthogroups associated with BGCs. It further overlays information of the GCF context orthogroups are found within, including non-BGC contexts (since orthogroup determination is performed at the genome-wide level). It also overlays information on which populations/clades orthogroups are found within - hence leading to its name as it informs whether orthogroup alleles are separated according to their population or whether alleles are interspersed across populations - which might indicate horizontal transfer.
lsaBGC-Reconcile also leads to the construction of a tab in the final consolidated spreadsheet produced by lsaBGC-Pan which provides an overview of information pertaining to each BGC-associated orthogroup.
lsaBGC-Reconcile builds off ideas and studies by the Barona-Gomez lab. Namely, their studies introducing EvoMining and CORASON:
- EvoMining reveals the origin and fate of natural product biosynthetic enzymes by Sélem-Mojica et al. 2019
- A computational framework to explore large-scale biosynthetic diversity by Navarro-Muñoz, Sélem-Mojica, Mullowney, et al. 2020
- Phylogenomic analysis of natural products biosynthetic gene clusters allows discovery of arseno-organic metabolites in model streptomycetes by Cruz-Morales et al. 2016
lsaBGC-Sociate performs genome-wide association studies (GWAS) to find associated and de-sociated orthogroups and GCFs with focal GCFs. Essentially, it treats focal GCF presence as the phenotype and looks for co-occurring and co-absent features. It uses pyseer underneath, specifically the lmm model to perform the association analysis and then uses an auxiliary program in the zol suite to perform annotations of the consensus sequences of associated or de-sociated orthogroups. Multiple testings are accounted via Bonferroni correction and self-hits (e.g. orthogroups from the focal GCF) are also filtered.
⚠️ Doing GWAS generally benefits heavily from the inclusion of more samples and traits being interspersed phylogenetically. While we use the "lmm" model in pyseer to adjust p-values for phylogenetic dispersion of associated orthogroups/GCFs with focal GCFs and apply Bonferroni multiple testing correction, you can still end up with false positives if working with a small number of samples. Do not assess if you have less than 20 samples and ideally incorporate at least 100 samples if this module is your primary interest.
Annotations simply require an E-value < 1e-5 but the best annotation for the consensus sequence of an orthogroup is selected based on score or bitscore.
lsaBGC-Sociate builds off ideas and studies from primarily the Ziemert, McInerney, and Weber labs. In particular, users interested in this type of association/de-sociation analysis should check out the studies:
- Function-related replacement of bacterial siderophore pathways by Bruns et al. 2018
- Coinfinder: detecting significant associations and dissociations in pangenomes by Whelan et al. 2020
- Pangenome analysis of Enterobacteria reveals richness of secondary metabolite gene clusters and their associated gene sets by Mohite et al. 2022
- Prokaryotic Pangenomes Act as Evolving Ecosystems by McInerney et al. 2022
- Goldfinder: Unraveling Networks of Gene Co-occurrence and Avoidance in Bacterial Pangenomes by Gavriilidou, Paulitz, Resl, et al. 2024
- Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis by Wang, Chen, Cruz-Morales et al. 2024
- Beyond the Biosynthetic Gene Cluster Paradigm: Genome-Wide Coexpression Networks Connect Clustered and Unclustered Transcription Factors to Secondary Metabolic Pathways by Kwon et al. 2021