Not able to reproduce the best fit PRS for plink #27

ranijames · 2021-09-30T14:38:23Z

Hi Sam,
Thanks for the great tutorial. I have been trying PLINK for the polygenic risk score. However, with the height dataset and EUR plink files, I am not able to reproduce the results. Especially, the one for best-PRS using linear regression model in R script.

choishingwan · 2021-09-30T15:12:38Z

what did you get? I haven't keep the tutorial up to date lately and I know for example, the pre-QCed data for the subsequent data weren't updated.

…

On Thu, Sep 30, 2021 at 10:38 AM Alva Rani James ***@***.***> wrote: Hi Sam, Thanks for the great tutorial. I have been trying PLINK for the polygenic risk score. However, with the height dataset and EUR plink files, I am not able to reproduce the results. Especially, the one for best-PRS using linear regression model in R script. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#27>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTRYV76BRV7IUG6RLYMV3UERY6TANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ranijames · 2021-09-30T15:18:47Z

So for example. The best PRS according to the tutorial is 0.3 and what I have is 0.5
prs.result[which.max(prs.result$R2),] Threshold R2 P BETA SE 7 0.5 0.1634566 9.256151e-26 55830.85 5004.534
Ok, I see. I just wanna make sure that the whole steps mentioned are appropriate for analysis. I am following the steps for our in-house datasets. So before that as a validation of all steps, I used the provided GWAS summary file and plink datasets.

choishingwan · 2021-09-30T15:58:54Z

If I repeat the analysis stated in the tutorial using the provided data set (I re-downloaded everything to ensure it is correct), I still got the same result stated in the tutorial Threshold R2 P BETA SE 5 0.3 0.1612372 2.77407e-25 45316.19 4107.777 And if I use PRSice with info filtering disabled, I will also get the same result. So you might want to double check Sam

…

On Thu, Sep 30, 2021 at 11:18 AM Alva Rani James ***@***.***> wrote: So for example. The best PRS according to the tutorial is 0.3 and what I have is 0.5 prs.result[which.max(prs.result$R2),] Threshold R2 P BETA SE 7 0.5 0.1634566 9.256151e-26 55830.85 5004.534 Ok, I see. I just wanna make sure that the whole steps mentioned are appropriate for analysis. I am following the steps for our in-house datasets. So before that as a validation of all steps, I used the provided GWAS summary file and plink datasets. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTRYWJALFQQFJDDY2KJXTUER5WDANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ranijames · 2021-09-30T16:05:41Z

Ok, thanks a lot for the update and for double-checking this. I appreciate your time and help.
I can re-run once again. And make sure steps and same. I have converted the script into Snakemake. Let's see if I miss something.

ranijames · 2021-10-04T10:48:53Z

Hi Sam,
I could now validate my output with what is documented. Thanks for your time and patience.
I have a question. Do the base and target datasets are some different individual or same individuals/samples? I read they are from two sources target data is simulated from 1000 genome and base is from your own lab. I have understood the phenotype (base) dataset should correspond to the phenotype-genotype datasets (target) set, isn't it?
In the paper, I see that both target and base datasets are independent datasets. In my case, my phenotype of interest is from a clinical trial study that we have done internally. The target is also from the same patients. Hence, I have both base and target datasets from the same patients, does that make sense?

choishingwan · 2021-10-04T11:30:18Z

You should never use the same sample for both the base and target And the base data from the tutorial was from GIANT consortium with some modification

On Mon, 4 Oct 2021 at 6:49 AM, Alva Rani James ***@***.***> wrote: Hi Sam, I could now validate my output with what is documented. Thanks for your time and patience. I have a question. Do the base and target datasets are some different individual or same individuals/samples? I read they are from two sources target data is simulated from 1000 genome and base is from your own lab. I have understood the phenotype (base) dataset should correspond to the phenotype-genotype datasets (target) set, isn't it? I have base and target datasets from the same patients, does that make sense? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTRYV2CGJ7L7AD24JZCLDUFGBCBANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Dr Shing Wan Choi Instructor Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC

ranijames · 2021-10-04T13:06:41Z

I am confused then in our case we do not have a different base and target datasets. The 1 base dataset is Gwas output from plink on the same cohort. The target is the same cohort as well How does this similarity make a problem in the result? Also, we do not have a continuous phenotype we have the binary phenotype. So in that case is it fine to use our logistic regression for finding the best PRS fit? On Mon 4. Oct 2021 at 13:30, Shing Wan Choi ***@***.***> wrote:

You should never use the same sample for both the base and target And the base data from the tutorial was from GIANT consortium with some modification On Mon, 4 Oct 2021 at 6:49 AM, Alva Rani James ***@***.***> wrote: > Hi Sam, > I could now validate my output with what is documented. Thanks for your > time and patience. > I have a question. Do the base and target datasets are some different > individual or same individuals/samples? I read they are from two sources > target data is simulated from 1000 genome and base is from your own lab. I > have understood the phenotype (base) dataset should correspond to the > phenotype-genotype datasets (target) set, isn't it? > I have base and target datasets from the same patients, does that make > sense? > > — > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > < #27 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAJTRYV2CGJ7L7AD24JZCLDUFGBCBANCNFSM5FCQA5LA > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > -- Dr Shing Wan Choi Instructor Genetics and Genomic Sciences Icahn School of Medicine, Mount Sinai, NYC — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4I6JPVLOUA7JHQFYIMNYTUFGF5LANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Sent from my iPad

choishingwan · 2021-10-04T13:49:44Z

See pitfall 1 in this paper: https://www.nature.com/articles/nrg3457 Yes, logistic regression for binary traits

ranijames · 2021-10-04T19:54:49Z

Thanks a lot for the paper. I have another question, is it possible to have a gene-based polygenic score than on each variant within each patient?

choishingwan · 2021-10-04T19:58:39Z

Do you mind elaborating? Do you mean you want to calculate PRS using only one gene? You can use PRSet to calculate pathway specific scores, but that might be a bit different from a "gene" based PRS?

ranijames · 2021-10-04T20:14:46Z

Yes what I mean is we need a score for each gene. A weighted score. Currently from both tools we have score for each patients in each variants/SNP. If we collapse the genes based on their variants and run the analysis would that make sense? Or simply apply the formula for polygenic risk from Wikipedia on the collapse gene would that still make sense https://wikimedia.org/api/rest_v1/media/math/render/svg/7da94c1dc4f882b5cb293ac8415cf9d94f8639b7 At the end we need score for each gene within each sample/individual I would like to hear your opinion on this ? Thanks again for your valuable remarks. Can be still used as polygenic risk score?

On Mon 4. Oct 2021 at 21:58, Shing Wan Choi ***@***.***> wrote: Do you mind elaborating? Do you mean you want to calculate PRS using only one gene? You can use PRSet to calculate pathway specific scores, but that might be a bit different from a "gene" based PRS? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4I6JKVFCRHTH7BDNHIEUDUFIBPVANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Sent from my iPad

choishingwan · 2021-10-04T20:21:17Z

It is implemented as PRSet. You can check our webpage. Problem with going down to gene level is that each of the gene will likely explain such small amount of the phenotypic variance that it will likely not be useful. If you group that into pathway / gene sets, that might provide more power. Sam On Mon, Oct 4, 2021 at 4:14 PM Alva Rani James ***@***.***> wrote:

…

Yes what I mean is we need a score for each gene. A weighted score. Currently from both tools we have score for each patients in each variants/SNP. If we collapse the genes based on their variants and run the analysis would that make sense? Or simply apply the formula for polygenic risk from Wikipedia on the collapse gene would that still make sense https://wikimedia.org/api/rest_v1/media/math/render/svg/7da94c1dc4f882b5cb293ac8415cf9d94f8639b7 At the end we need score for each gene within each sample/individual I would like to hear your opinion on this ? Thanks again for your valuable remarks. Can be still used as polygenic risk score? On Mon 4. Oct 2021 at 21:58, Shing Wan Choi ***@***.***> wrote: > Do you mind elaborating? Do you mean you want to calculate PRS using only > one gene? > > You can use PRSet to calculate pathway specific scores, but that might be a > bit different from a "gene" based PRS? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > < #27 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AB4I6JKVFCRHTH7BDNHIEUDUFIBPVANCNFSM5FCQA5LA > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > -- Sent from my iPad — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJTRYXVUTYSELLFPYISTRDUFIDMBANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

ranijames · 2021-10-04T20:42:41Z

Thanks again for your suggestions and time. Using for pathways enrichment meaning using those genes with a specific threshold for pathway enrichment analysis gives us more meaningful results? Is that you mean? Also what you specifically mean by “small amount “ of phenotypic risk score? On Mon 4. Oct 2021 at 22:21, Shing Wan Choi ***@***.***> wrote:

It is implemented as PRSet. You can check our webpage. Problem with going down to gene level is that each of the gene will likely explain such small amount of the phenotypic variance that it will likely not be useful. If you group that into pathway / gene sets, that might provide more power. Sam On Mon, Oct 4, 2021 at 4:14 PM Alva Rani James ***@***.***> wrote: > Yes what I mean is we need a score for each gene. A weighted score. > Currently from both tools we have score for each patients in each > variants/SNP. If we collapse the genes based on their variants and run the > analysis would that make sense? > Or simply apply the formula for polygenic risk from Wikipedia on the > collapse gene would that still make sense > > > https://wikimedia.org/api/rest_v1/media/math/render/svg/7da94c1dc4f882b5cb293ac8415cf9d94f8639b7 > > At the end we need score for each gene within each sample/individual > I would like to hear your opinion on this ? > > Thanks again for your valuable remarks. > > > Can be still used as polygenic risk score? > > On Mon 4. Oct 2021 at 21:58, Shing Wan Choi ***@***.***> > wrote: > > > Do you mind elaborating? Do you mean you want to calculate PRS using only > > one gene? > > > > You can use PRSet to calculate pathway specific scores, but that might > be a > > bit different from a "gene" based PRS? > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub > > < > #27 (comment) > >, > > or unsubscribe > > < > https://github.com/notifications/unsubscribe-auth/AB4I6JKVFCRHTH7BDNHIEUDUFIBPVANCNFSM5FCQA5LA > > > > . > > Triage notifications on the go with GitHub Mobile for iOS > > < > https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > > > or Android > > < > https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub > >. > > > > > -- > Sent from my iPad > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > < #27 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AAJTRYXVUTYSELLFPYISTRDUFIDMBANCNFSM5FCQA5LA > > . > Triage notifications on the go with GitHub Mobile for iOS > < https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > > or Android > < https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub >. > > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4I6JOQINREUZOBOEN7CG3UFIEERANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Sent from my iPad

choishingwan · 2021-10-04T20:50:31Z

Use pathway (collection of gene based on biochemical signalling or other biological processes) instead of individual genes For most genome wide PRS, an R2 of 0.3 is already really nice. If you are using gene, which represent X% of the genome, your R2 is likely 0.3 * X% (maybe slightly higher than that). When you go down to gene level, X is going to be very small, thus your resulting R2 is likely to be too small to be useful

ranijames · 2021-10-04T20:53:25Z

Thanks a lot. Makes sense to me Thanks again for your time and Patience.

On Mon 4. Oct 2021 at 22:50, Shing Wan Choi ***@***.***> wrote: Use pathway (collection of gene based on biochemical signalling or other biological processes) instead of individual genes For most genome wide PRS, an R2 of 0.3 is already really nice. If you are using gene, which represent X% of the genome, your R2 is likely 0.3 * X% (maybe slightly higher than that). When you go down to gene level, X is going to be very small, thus your resulting R2 is likely to be too small to be useful — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4I6JPN45ZA6T7HYW7RR23UFIHSDANCNFSM5FCQA5LA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Sent from my iPad

ranijames · 2021-10-05T17:57:39Z

By the way, where can find the reference to the plink pRS score formula mentioned in the documentation? I have searched for it in plink’s manuel could not find. Would be great if you could share the source Thanks

On Mon 4. Oct 2021 at 22:53, alva james ***@***.***> wrote: Thanks a lot. Makes sense to me Thanks again for your time and Patience. On Mon 4. Oct 2021 at 22:50, Shing Wan Choi ***@***.***> wrote: > Use pathway (collection of gene based on biochemical signalling or other > biological processes) instead of individual genes > > For most genome wide PRS, an R2 of 0.3 is already really nice. If you are > using gene, which represent X% of the genome, your R2 is likely 0.3 * X% > (maybe slightly higher than that). When you go down to gene level, X is > going to be very small, thus your resulting R2 is likely to be too small > to be useful > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#27 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB4I6JPN45ZA6T7HYW7RR23UFIHSDANCNFSM5FCQA5LA> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > > -- Sent from my iPad

-- Sent from my iPad

choishingwan · 2021-10-05T18:59:13Z

our website has it prsice.info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to reproduce the best fit PRS for plink #27

Not able to reproduce the best fit PRS for plink #27

ranijames commented Sep 30, 2021

choishingwan commented Sep 30, 2021 via email

ranijames commented Sep 30, 2021

choishingwan commented Sep 30, 2021 via email

ranijames commented Sep 30, 2021

ranijames commented Oct 4, 2021 •

edited

Loading

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email •

edited

Loading

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

ranijames commented Oct 5, 2021 via email

choishingwan commented Oct 5, 2021 via email

Not able to reproduce the best fit PRS for plink #27

Not able to reproduce the best fit PRS for plink #27

Comments

ranijames commented Sep 30, 2021

choishingwan commented Sep 30, 2021 via email

ranijames commented Sep 30, 2021

choishingwan commented Sep 30, 2021 via email

ranijames commented Sep 30, 2021

ranijames commented Oct 4, 2021 • edited Loading

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email • edited Loading

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

choishingwan commented Oct 4, 2021 via email

ranijames commented Oct 4, 2021 via email

ranijames commented Oct 5, 2021 via email

choishingwan commented Oct 5, 2021 via email

ranijames commented Oct 4, 2021 •

edited

Loading

ranijames commented Oct 4, 2021 via email •

edited

Loading