Input of rna seq to Human Gem #277
-
Thank you for developing this amazing model. I need to figure out: Can we use raw count rna seq instead of Tpm? Is there any particular reason that we have to use tpm? And from your paper tpm/FPKM were used. Copied from your paper: "……The Cancer Genome Atlas (TCGA) using the TCGAbiolinks package (70) in R (71), and converted to TPM. Tumor blood (TB) sample RNA- Seq data were retrieved for acute myeloid leukemia (LAML) as it had no TP samples, and the paired-normal tissue data for skin cutaneous melanoma (SKCM) was excluded because there was only one sample. Healthy tissue RNA-Seq data (TPM) were retrieved from the Genotype-Tissue Expression (GTEx) project (v7) database (https://www.gtexportal.org/home/datasets)..” Thank you again! Any input will be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @studymeow, I assume your question is asking about the use of TPM for constructing context-specific models using tINIT. The general approach of tINIT is to estimate which reactions are "active" (i.e., their corresponding enzyme(s) are present and able to catalyze the reaction) in a given cell or tissue type, and try to include only those reactions while excluding all others. This decision whether to keep or remove a reaction must be based on some form of evidence/information. In the case of tINIT, the information used is most often transcriptomic and/or proteomic data, where we assume that if a transcript/protein is detected at some minimal abundance, then its corresponding reaction is likely to be active. This "minimal abundance" is the tricky part - how do we know what transcript or protein abundance is necessary for the reaction to be considered "active"? Unfortunately there isn't a good answer, because it will depend on the enzyme and many other factors. We therefore have to make some simplifying assumptions. In the case of our Human1 paper, we decided to use a threshold of 1 TPM, where genes expressed at level above this threshold received a positive score and tINIT tried to keep them in the model, and vice versa. While the 1 TPM choice was largely arbitrary, it was roughly based on the TPM distributions that we tended to see among the RNA-seq datasets we used. We chose to use TPM because it is relatively representative of how the cell is allocating its resources, and is normalized such that it is (generally) more comparable across different samples, and therefore a single value threshold is reasonable. Although we used TPM, you can really use any metric (gene count, FPKM, protein level, etc.) that you feel reflects the likelihood that a gene or protein's corresponding reaction is active. However, keep in mind that gene count is not as comparable between different genes, so a single threshold is less appropriate for that datatype. One could instead choose to use gene-specific thresholds; e.g., the average gene count across many different cell/tissue types. So this is a very long way to say that you can certainly use gene counts with probably not so drastic change in performance/outcome, as long as you understand the assumptions involved. And it goes without saying (but I'll say it anyway) that any model/outcome obtained, whether you use TPM or gene counts or something else, should always be treated as a rough approximation of the system - not as truth. |
Beta Was this translation helpful? Give feedback.
Hi @studymeow, I assume your question is asking about the use of TPM for constructing context-specific models using tINIT.
The general approach of tINIT is to estimate which reactions are "active" (i.e., their corresponding enzyme(s) are present and able to catalyze the reaction) in a given cell or tissue type, and try to include only those reactions while excluding all others. This decision whether to keep or remove a reaction must be based on some form of evidence/information. In the case of tINIT, the information used is most often transcriptomic and/or proteomic data, where we assume that if a transcript/protein is detected at some minimal abundance, then its corresponding reaction is…