-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contrast-FEL analyses #53
Comments
Dear @JJohnSmith, If I understand your procedure correctly then,
If this is the case, then Would you mind sharing some of the alignments and trees and the corresponding results? It'll be much more informative if I could base my comments on the specific data. Best, |
Dear @spond, I apologize if I was not clear enough and it seemed a little bit confusing. I am not comparing mammals with archosaurs. I am comparing mammals with mammals in one analysis, and within archosaurs I am comparing crocodylians with birds. I am analyzing around 50 genes and unfortunately I do not have the time and neither the computational power to perform the analysis in a combined dataset of mammals and archosaurs. Above are the trees, and in red are the branches I labeled as foreground to be tested. The alignments of the mammalian sequences are quite complete, I used Guidance which automatically removed columns with a confidence score bellow 0.93 and additionally I manually refined it to remove columns that contained less than 80% info. For the alignments of archosaurs I lowered a little bit the threshold of Guidance to 0.85 because of the higher divergence between crocodylians and birds. Above is a summary of the results. I also conducted tests with aBSREL and BUSTED. It seems reasonable that I got a lot more positively selected sites in Archosaurs considering the phylogenetic divergence between crocodylians and birds, right? Also, I am thinking that since I lowered the confidence score threshold in Guidance when performing the alignments, maybe some specific regions of lower confidence could lead to some false positives. I intend to look closely at each positively selected site in the alignments to check if maybe problems in the alignment could be leading to some false positives. My main concern is that there isn't enough statistical power in my analyses since I only am testing 2 branches in mammals and only 1 branch in archosaurs, and that may result in unreliable results with high rates of false positives and negatives. Considering that this is for a masters degree dissertation with constraints in time and computational power, do you think that this particular subsets of foreground and background branches can make the results completely unreliable due to the lack of statistical power? I also think that it is important to mention that it is not known yet if the genes I am analyzing are directly regulating the trait of interest. I am just exploring that hypothesis because the genes were identified in a transcriptomic analysis in a mammalian species and the trait is also found in crocodylians. Thank you so much for taking the time to respond. |
Hi ,
I am new to this field and I am currently working on my master's thesis, focusing on the evolution of a specific morphological trait., I've opted to analyze 60 genes across a smaller number of species due to time constraints. I created two separate datasets: one for mammals (15 species) and another for archosaurs (12 species). In the mammalian dataset, there are two groups exhibiting the trait, whereas in the archosaur dataset, only one group does.
For selection analyses, I used several methods in HyPhy, specifically BUSTED, aBSREL, RELAX, and Contrast-FEL. I labeled the branch of the most recent common ancestor of the groups with the trait as the foreground branches. Consequently, I have two foreground branches in the mammalian dataset and one foreground branch in the archosaur dataset.
I've noticed that Contrast-FEL detected significantly more positively selected sites in the archosaur dataset compared to the mammalian dataset. When I re-ran the analyses on the mammalian dataset, this time with just one foreground branch (I removed the other group from the alignments), the number of positively selected sites in Contrast-FEL increased.
I came across this information on the HyPhy website under Contrast-FEL specifications:
"Rules of thumb for when this method is likely to work well, and when it is not.
-Generally, you need 10 or more branches in each set to be able to have any statistical power.
-Too little divergence is also likely to severely throttle statistical power."
It appears that my dataset configuration is problematic, lacking the necessary number of branches in each set, leading to low statistical power and inconsistent results. Given my time constraints, I cannot make significant changes to the datasets.
I was wondering if lowering the q-value threshold from 0.2 to 0.1 (or another value within that range) would be a viable approach to mitigate these issues.
Any advice would be greatly appreciated.
Thank you so much!
The text was updated successfully, but these errors were encountered: