Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on GTEx CpG Methylation Data in “vis_toil_TvsN” Function #362

Open
quiquemedina opened this issue Dec 19, 2024 · 3 comments
Assignees

Comments

@quiquemedina
Copy link

Dear Developers,

I have a question regarding the source of the GTEx CpG methylation data used in the “vis_toil_TvsN” function. Specifically, I was unable to confirm the source of the CpG methylation data integrated into this function.

Could you clarify if the GTEx CpG methylation data in “vis_toil_TvsN” is derived from the Methylation EPIC Array, as reported in the article available at https://www.nature.com/articles/s41588-022-01249-y? My understanding is that the mentioned source provides methylation data for only nine (n = 9) tissues. However, the “vis_toil_TvsN” function appears to include data for more than nine tumor versus normal tissue pairs, which raises some inconsistencies.

For the TCGA tissue the data sets are 450K or 27K.

Additionally, I noticed that in the UCSCXenaShiny browser (accessible at https://shixiangwang.shinyapps.io/ucscxenashiny/), the output data table for the “vis_toil_TvsN” function lists the metric as “tpm”. This labeling seems misleading, as it does not correctly reflect CpG methylation beta values. Could this discrepancy be clarified or corrected to avoid confusion?

Thank you for your help and for the development of these valuable tools. I look forward to your response.
Best regards,

Enrique

image

image

@ShixiangWang
Copy link
Member

@quiquemedina Thanks for your question, as always:). The name of tpm is indeed misleading, as as rename all molecular value to such name. The unit in the plot is right.

For the first question, @lishensuo please take a look.

@lishensuo
Copy link
Collaborator

For the first question, only gene expression and transcript expression datasets are available for the GTEx cohort on the UCSC Xena platform. These datasets have been integrated into the TCGA and PACWG cohorts using the TOIL algorithm, as indicated on the respective page).

For TCGA methylation analysis in the series of quick modules, we use the jhu-usc.edu_PANCAN_HumanMethylation450.betaValue_whitelisted.tsv.synapse_download_5096262.xena dataset from the pancanAtlasHub of UCSC Xena by default (excluding GTEx data). This dataset encompasses nearly all TCGA cancer types, as illustrated in the following figure.

About the methylation dataset used in our app, please refer to the last #359

dat_methy_450k = get_pancan_methylation_value("TP53", type = "450K")
dat_methy_450k = dat_methy_450k$data
names(dat_methy_450k)

tcga_clinical_fine %>% 
  dplyr::filter(Sample %in% names(dat_methy_450k)) %>% 
  dplyr::mutate(Group = ifelse(Code=="NT","Normal", "Tumor")) %>% 
  dplyr::group_by(Cancer) %>% 
  dplyr::count(Group) %>% 
  as.data.frame() %>% 
  ggplot(aes(x = Cancer, y = n, fill = Group)) +
  geom_col() + 
  ggtitle("jhu-usc.edu_PANCAN_HumanMethylation450") +
  theme(legend.position = "top") +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust=1)) + 
  theme(text = element_text(size = 15))

image

@lishensuo
Copy link
Collaborator

For the second question, we apologize for the misleading label. However, the figure you provided is from our V1 Shiny app, which we do not plan to update. In the V2 version, we have adjusted the figure layout and will further refine the plot soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants