Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proportion of sites under selection #1772

Open
mridna opened this issue Dec 4, 2024 · 2 comments
Open

Proportion of sites under selection #1772

mridna opened this issue Dec 4, 2024 · 2 comments

Comments

@mridna
Copy link

mridna commented Dec 4, 2024

Dear @spond,

I wanted your advice on how best to calculate the proportion of sites under positive selection across the entire phylogeny (~30 species) for ~2000 genes. Our intention is to compare different functional categories and see if there are differences in how many sites are under positive selection. We are not concerned about what specific sites are under selection. What we are ultimately trying to check is even if the whole gene is not evolving under positive selection, are there still some sites under positive selection and does this proportion vary based on function. With this in mind, how would you approach calculating proportion of positively selected sites? We are currently thinking about this in two ways:

  1. Proportion of sites with omega > 1 from BUSTED. Here we are a little stumped on how to proceed. We have the model averaged p-values for each alignment, but should the proportion be taken from the best fitting model, since it seems to be different based on BUSTED settings (with/without SRV and MH)? And if there is no significant p-value for positive selection, should we consider that there are no sites with omega > 1 even if BUSTED shows some codons are evolving in this rate class.

  2. Use number of significant positively selected sites from MEME (+MH) and normalise it for number of sites in the alignment to arrive at a significant proportion of sites under positive selection.

Hope this was explained clearly!

Best regards,
Mridula

@spond
Copy link
Member

spond commented Dec 6, 2024

Dear @mridna,

Well, a principled (but not something that you could easily) method would be use your functional class (categorical) as a part of the phylogenetic fit. In other words, ω and its weight would be functions of the functional class (e.g., as a random effect), and all 2000 genes would be fit jointly to directly estimate the effect.

Both approaches you have are sensible but ad hoc.

  1. For BUSTED, I would recommend the following

    • Use the estimate of the weight allocated to ω > 1 directly. You can model average the weight the same way you average the p-value if you already have multiple models.
    • Consider some measure of ω value as well. For example, a gene with ω = 10 and weight of 1% is probably more "selected" than a gene with ω = 2 and weight of 1%. Use some variance stabilized transform of ω e.g. log (&omega). For example you could compute something like log (ω) * weight (ω) for ω > 1 and take that as a measure of "selection"
    • I would not filter by p-value.
  2. For MEME, you need to select a cutoff p-value to decide what sites are selected. Obviously this is going to have a major effect on the estimates of what fraction of the gene is subject to selection. Also, if you want a more comparable estimate to BUSTED, you could weight each site by what fraction of the site has ω > 1 (MEME outputs this as p+).

As an example, here are 10 randomly selected genes from https://pubmed.ncbi.nlm.nih.gov/30620335/ using different measures of "selected weight". I ran BUSTED-E (https://www.biorxiv.org/content/10.1101/2024.11.13.620707v1) and MEME. The alignments, snakefile to run the tools, and a simple Python script to compute various results are in the attached archive.

Mostly, just be aware that there will be a number of hard-to-quantify factors influencing the estimates. You should also explore how the definition of the "fraction of alignment" influences your downstream analyses. If you get very different results based on what you select, then probably there's not much sense in placing too much faith in any associations you find.

image

Best,
Sergei

issue1772.zip

@mridna
Copy link
Author

mridna commented Dec 9, 2024

Dear Sergei,

Thank you so much for your detailed response and suggestions, I really appreciate it!

Best regards,
Mridula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants