Skip to content

9. details on new modules: lsaBGC‐Reconcile and lsaBGC‐Sociate

Rauf Salamzade edited this page Sep 9, 2024 · 16 revisions

lsaBGC-Reconcile

lsaBGC-Reconcile constructs gene phylogenies for all orthogroups associated with BGCs. It further overlays information of the GCF context orthogroups are found within, including non-BGC contexts (since orthogroup determination is performed at the genome-wide level). It also overlays information on which populations/clades orthogroups are found within - hence leading to its name as it informs whether orthogroup alleles are separated according to their population or whether alleles are interspersed across populations - which might indicate horizontal transfer.

lsaBGC-Reconcile also leads to the construction of a tab in the final consolidated spreadsheet produced by lsaBGC-Pan which provides an overview of information pertaining to each BGC-associated orthogroup.


Influence

lsaBGC-Reconcile builds off ideas and studies by the Barona-Gomez lab. Namely, their studies introducing EvoMining and CORASON:

lsaBGC-Sociate

lsaBGC-Sociate performs genome-wide association studies (GWAS) to find associated and de-sociated orthogroups and GCFs with focal GCFs. Essentially, it treats focal GCF presence as the phenotype and looks for co-occurring and co-absent features. It uses pyseer underneath, specifically the lmm model to perform the association analysis and then uses an auxiliary program in the zol suite to perform annotations of the consensus sequences of associated or de-sociated orthogroups. Multiple testings are accounted via Bonferroni correction and self-hits (e.g. orthogroups from the focal GCF) are also filtered.

⚠️ Doing GWAS generally benefits heavily from the inclusion of more samples and traits being interspersed phylogenetically. While we use the "lmm" model in pyseer to adjust p-values for phylogenetic dispersion of associated orthogroups/GCFs with focal GCFs and apply Bonferroni multiple testing correction, you can still end up with false positives if working with a small number of samples. Do not assess if you have less than 20 samples and ideally incorporate at least 100 samples if this module is your primary interest.

Annotations simply require an E-value < 1e-5 but the best annotation for the consensus sequence of an orthogroup is selected based on score or bitscore.


Influence

lsaBGC-Sociate builds off ideas and studies from primarily the Ziemert, McInerney, and Weber labs. In particular, users interested in this type of association/de-sociation analysis should check out the studies: