Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to reproduce the best fit PRS for plink #27

Open
ranijames opened this issue Sep 30, 2021 · 17 comments
Open

Not able to reproduce the best fit PRS for plink #27

ranijames opened this issue Sep 30, 2021 · 17 comments

Comments

@ranijames
Copy link

Hi Sam,
Thanks for the great tutorial. I have been trying PLINK for the polygenic risk score. However, with the height dataset and EUR plink files, I am not able to reproduce the results. Especially, the one for best-PRS using linear regression model in R script.

@choishingwan
Copy link
Owner

choishingwan commented Sep 30, 2021 via email

@ranijames
Copy link
Author

So for example. The best PRS according to the tutorial is 0.3 and what I have is 0.5
prs.result[which.max(prs.result$R2),] Threshold R2 P BETA SE 7 0.5 0.1634566 9.256151e-26 55830.85 5004.534
Ok, I see. I just wanna make sure that the whole steps mentioned are appropriate for analysis. I am following the steps for our in-house datasets. So before that as a validation of all steps, I used the provided GWAS summary file and plink datasets.

@choishingwan
Copy link
Owner

choishingwan commented Sep 30, 2021 via email

@ranijames
Copy link
Author

Ok, thanks a lot for the update and for double-checking this. I appreciate your time and help.
I can re-run once again. And make sure steps and same. I have converted the script into Snakemake. Let's see if I miss something.

@ranijames
Copy link
Author

ranijames commented Oct 4, 2021

Hi Sam,
I could now validate my output with what is documented. Thanks for your time and patience.
I have a question. Do the base and target datasets are some different individual or same individuals/samples? I read they are from two sources target data is simulated from 1000 genome and base is from your own lab. I have understood the phenotype (base) dataset should correspond to the phenotype-genotype datasets (target) set, isn't it?
In the paper, I see that both target and base datasets are independent datasets. In my case, my phenotype of interest is from a clinical trial study that we have done internally. The target is also from the same patients. Hence, I have both base and target datasets from the same patients, does that make sense?

@choishingwan
Copy link
Owner

choishingwan commented Oct 4, 2021 via email

@ranijames
Copy link
Author

ranijames commented Oct 4, 2021 via email

@choishingwan
Copy link
Owner

choishingwan commented Oct 4, 2021 via email

@ranijames
Copy link
Author

Thanks a lot for the paper. I have another question, is it possible to have a gene-based polygenic score than on each variant within each patient?

@choishingwan
Copy link
Owner

choishingwan commented Oct 4, 2021 via email

@ranijames
Copy link
Author

ranijames commented Oct 4, 2021 via email

@choishingwan
Copy link
Owner

choishingwan commented Oct 4, 2021 via email

@ranijames
Copy link
Author

ranijames commented Oct 4, 2021 via email

@choishingwan
Copy link
Owner

choishingwan commented Oct 4, 2021 via email

@ranijames
Copy link
Author

ranijames commented Oct 4, 2021 via email

@ranijames
Copy link
Author

ranijames commented Oct 5, 2021 via email

@choishingwan
Copy link
Owner

choishingwan commented Oct 5, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants