Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created tutorial for MultiGSEA #5567

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
5 changes: 4 additions & 1 deletion CONTRIBUTORS.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2800,11 +2800,14 @@ tbrown91:
orcid: 0000-0001-8293-4816
joined: 2024-08

tStehling:
tStehling marked this conversation as resolved.
Show resolved Hide resolved
name: Thorben Stehling
joined: 2024-11

rmassei:
name: Riccardo Massei
email: [email protected]
orcid: 0000-0003-2104-9519
joined: 2024-11
affiliations:
- nfdi4bioimage

Binary file added topics/proteomics/images/p-value.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
destination:
type: library
name: GTN - Material
description: Galaxy Training Network Material
synopsis: Galaxy Training Network Material. See https://training.galaxyproject.org
items:
- name: The new topic
description: Summary
items:
- name: Using MultiGSEA
items:
- name: 'DOI: 10.5281/zenodo.14216972'
description: latest
items:
- url: https://zenodo.org/api/records/14216972/files/metabolome.tsv/content
src: url
ext: auto
info: https://zenodo.org/records/14216972
- url: https://zenodo.org/api/records/14216972/files/proteome.tsv/content
src: url
ext: auto
info: https://zenodo.org/records/14216972
- url: https://zenodo.org/api/records/14216972/files/transcriptome.tsv/content
src: url
ext: auto
info: https://zenodo.org/records/14216972
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
layout: faq-page
---
26 changes: 26 additions & 0 deletions topics/proteomics/tutorials/multiGSEA-tutorial/tutorial.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

# This is the bibliography file for your tutorial.
#
# To add bibliography (bibtex) entries here, follow these steps:
# 1) Find the DOI for the article you want to cite
# 2) Go to https://doi2bib.org and fill in the DOI
# 3) Copy the resulting bibtex entry into this file
#
# To cite the example below, in your tutorial.md file
# use {% cite Batut2018 %}
#
# If you want to cite an online resourse (website etc)
# you can use the 'online' format (see below)
#
# You can remove the examples below

@misc{https://doi.org/10.18129/b9.bioc.multigsea,
doi = {10.18129/B9.BIOC.MULTIGSEA},
url = {https://bioconductor.org/packages/multiGSEA},
author = {{Sebastian Canzler, J\"{o}rg Hackerm\"{u}ller}},
title = {multiGSEA},
publisher = {Bioconductor},
year = {2020}
}


121 changes: 121 additions & 0 deletions topics/proteomics/tutorials/multiGSEA-tutorial/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
layout: tutorial_hands_on

title: Using MultiGSEA
tStehling marked this conversation as resolved.
Show resolved Hide resolved
subtopic: multi-omics
tags:
- multi-omics
- transcriptomics
- proteomics
- metabolomics
zenodo_link: 'https://zenodo.org/records/14216972'
questions:
- How to use MultiGSEA for GSEA-based pathway enrichment for multiple omics layers?
objectives:
- Perform GSEA-based pathway enrichment for transcriptomics, proteomics, and metabolomics data.
- Understand how to combine p-values across multiple omics layers.
time_estimation: 1H
key_points:
- MultiGSEA provides an integrated workflow for pathway enrichment analysis across multi-omics data.
- Supports pathway definitions from several databases and robust ID mapping.
contributors:
- tStehling
tStehling marked this conversation as resolved.
Show resolved Hide resolved


---


The multiGSEA package was designed to run a robust GSEA-based pathway enrichment for multiple omics layers. The enrichment is calculated for each omics layer separately and aggregated p-values are calculated afterwards to derive a composite multi-omics pathway enrichment.

Pathway definitions can be downloaded from up to eight different pathway databases by means of the graphite Bioconductor package (Sales, Calura, and Romualdi 2018). Feature mapping for transcripts and proteins is supported towards Entrez Gene IDs, Uniprot, Gene Symbol, RefSeq, and Ensembl IDs. The mapping is accomplished through the AnnotationDbi package (Pagès et al. 2019) and currently supported for 11 different model organisms including human, mouse, and rat. ID conversion of metabolite features to Comptox Dashboard IDs (DTXCID, DTXSID), CAS-numbers, Pubchem IDs (CID), HMDB, KEGG, ChEBI, Drugbank IDs, or common metabolite names is accomplished through the AnnotationHub package metabliteIDmapping. This package provides a comprehensive ID mapping for more than 1.1 million entries.

This tutorial covers a simple example workflow illustrating how the multiGSEA package works. The omics data sets that will be used throughout the example were originally provided by Quiros et al. (Quirós et al. 2017). In their publication the authors analyzed the mitochondrial response to four different toxicants, including Actinonin, Diclofenac, FCCB, and Mito-Block (MB), within the transcriptome, proteome, and metabolome layer.
In this tutorial we will solely focus on the Actinonin data set.


> <agenda-title></agenda-title>
>
> In this tutorial, we will cover:
>
> 1. TOC
> {:toc}
>
{: .agenda}

# Preparing the Data

To perform pathway enrichment with MultiGSEA, you'll need omics datasets in the file type TSV . Each individual data set contains four columns representing the feature (denoted as Symbol), the log2 fold change (logFC), the p-value (pValue), and the adjusted p-values (adj.pValue). We'll use example data provided on Zenodo.
shiltemann marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Sebastian can tell us a few xrefs which methods can give the needed values for Transcriptomics, Metabolomics, and Proteomics.


## Get data

### Data Upload

> <hands-on-title> Getting datasets </hands-on-title>
> 1. Create a new history for this tutorial.
>
> {% snippet faqs/galaxy/histories_create_new.md %}
>
> 2. Import the datasets from [Zenodo]({{ page.zenodo_link }}) into your Galaxy instance:
> ```
> https://zenodo.org/records/14216972/files/transcriptome.tsv
> https://zenodo.org/records/14216972/files/proteome.tsv
> https://zenodo.org/records/14216972/files/metabolome.tsv
> ```
{: .hands_on}


# Running MultiGSEA

In this step, you'll use the MultiGSEA tool to perform GSEA-based pathway enrichment on the uploaded datasets.

> <hands-on-title> Task description </hands-on-title>
>
> 1. Run {% tool [multiGSEA](toolshed.g2.bx.psu.edu/repos/iuc/multigsea/multigsea/1.12.0+galaxy0) %} with the following parameters
> - *"Select transcriptomics data"*: `Enabled`
> - {% icon param-file %} *"Transcriptomics data"*: `Transcriptomics`
> - {% icon param-select %} *"Gene ID format in transcriptomics data"*: `SYMBOL`
> - *"Select proteomics data"*: `Enabled`
> - {% icon param-file %} *"Proteomics data"*: `Proteomics`
> - {% icon param-select %} *"Gene ID format in proteomics data"*: `SYMBOL`
> - *"Select metabolomics data"*: `Enabled`
> - {% icon param-file %} *"Metabolomics data"*: `Metabolomics`
> - {% icon param-select %} *"Metabolite ID format"*: `HMDB`
> - *"Supported organisms"*: `Homo sapiens (Human)`.
> - *"Pathway databases"*: `KEGG`
> - *"Combine p-values method"*: `Stouffer`
> - *"P-value correction method"*: `BH`
>
> > <tip-title>About the parameters</tip-title>
> > - **Pathway databases**: `KEGG`Databases often contain their own format in which pathway definitions are provided. So you can select a relevant > > database. For the tutorial we choose `KEGG`
> > - **Combine p-values method**: Choose a method (here `Stouffer` for balanced weighting). To more comprehensively measure a pathway response, multiGSEA provides different approaches to compute an aggregated p value over multiple omics layers. Because no single approach for aggregating p values performs best under all circumstances, Loughin proposed basic recommendations on which method to use depending on structure and expectation of the problem. If small p values should be emphasized, Fisher’s method should be chosen. In cases where p values should be treated equally, Stouffer’s method is preferable. If large p values should be emphasized, the user should select Edgington’s method. Figure 2 indicates the difference between those three methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a reference for Loughin?

> > ![P-Value](../../images/p-value.png "P-value methods")
> > - **P-value correction method** Type I and type II errors depend on each other and thus reducing type I errors through a p value adjustment will likely increase the chance of making a type II error and an appropriate trade-off has to be made. Choose one of the different methods for controlling false discovery rate: For the tutorial choose `BH` (Benjamini-Hochberg).
> {: .tip}
>
{: .hands_on}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be useful to discuss the output of the tool here? again both on technical level (what is the format, what do the contents mean?) and biological (what can we learn from the output)



> <question-title></question-title>
>
> 1. What file format is required for the input data in MultiGSEA?
> 2. What is the purpose of the “Combine p-values method” parameter, and which method was selected in this tutorial?
> 3. Why is it important to select pathway databases (e.g., KEGG) when using MultiGSEA?
>
> > <solution-title></solution-title>
> >
> > 1. The required file format is TSV.
> > 2. The “Combine p-values method” parameter is used to aggregate p-values across omics layers. In this tutorial, the method Stouffer was selected to apply balanced weighting.
> > 3. Selecting pathway databases ensures that the analysis uses appropriate and relevant pathway definitions for enrichment.
> >
> {: .solution}
>
{: .question}

> # Conclusion
>
> In this tutorial, you explored the capabilities of MultiGSEA for performing pathway enrichment analysis across multiple omics layers, including transcriptomics, proteomics, and metabolomics data. By following the steps, you learned how to:
>
> - Prepare and upload the required omics datasets.
> - Configure and execute the MultiGSEA tool within Galaxy.
> - Combine p-values from different omics layers to derive a unified perspective on pathway enrichment.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
layout: workflow-list
---
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"a_galaxy_workflow": "true", "annotation": "The", "comments": [], "format-version": "0.1", "name": "multiGSEA Workflow", "report": {"markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n"}, "steps": {"0": {"annotation": "The input files (transcriptome.tsv, proteome.tsv, metabolome.tsv) are converted into the output file using the preset parameters of the tool.", "content_id": null, "errors": null, "id": 0, "input_connections": {}, "inputs": [{"description": "The input files (transcriptome.tsv, proteome.tsv, metabolome.tsv) are converted into the output file using the preset parameters of the tool.", "name": "transcriptome.tsv"}], "label": "transcriptome.tsv", "name": "Input dataset", "outputs": [], "position": {"left": 0, "top": 0}, "tool_id": null, "tool_state": "{\"optional\": false, \"tag\": \"\"}", "tool_version": null, "type": "data_input", "uuid": "cb1f6368-5d1a-43a9-a5c3-12186ae326d9", "when": null, "workflow_outputs": []}, "1": {"annotation": "", "content_id": null, "errors": null, "id": 1, "input_connections": {}, "inputs": [{"description": "", "name": "proteome.tsv"}], "label": "proteome.tsv", "name": "Input dataset", "outputs": [], "position": {"left": 1, "top": 90}, "tool_id": null, "tool_state": "{\"optional\": false, \"tag\": null}", "tool_version": null, "type": "data_input", "uuid": "ac32b15a-c6df-4e1d-bcfb-94c909c0b3ee", "when": null, "workflow_outputs": []}, "2": {"annotation": "", "content_id": null, "errors": null, "id": 2, "input_connections": {}, "inputs": [{"description": "", "name": "metabolome.tsv"}], "label": "metabolome.tsv", "name": "Input dataset", "outputs": [], "position": {"left": 2, "top": 180}, "tool_id": null, "tool_state": "{\"optional\": false, \"tag\": null}", "tool_version": null, "type": "data_input", "uuid": "3a38df50-6904-4366-958a-7ccfa12539da", "when": null, "workflow_outputs": []}, "3": {"annotation": "", "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/multigsea/multigsea/1.12.0+galaxy0", "errors": null, "id": 3, "input_connections": {"metabolomics_data|metabolomics": {"id": 2, "output_name": "output"}, "proteomics_data|proteomics": {"id": 1, "output_name": "output"}, "transcriptomics_data|transcriptomics": {"id": 0, "output_name": "output"}}, "inputs": [{"description": "runtime parameter for tool multiGSEA", "name": "metabolomics_data"}, {"description": "runtime parameter for tool multiGSEA", "name": "proteomics_data"}, {"description": "runtime parameter for tool multiGSEA", "name": "transcriptomics_data"}], "label": null, "name": "multiGSEA", "outputs": [{"name": "output", "type": "tabular"}], "position": {"left": 332, "top": 44.51666259765624}, "post_job_actions": {}, "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/multigsea/multigsea/1.12.0+galaxy0", "tool_shed_repository": {"changeset_revision": "e48b10ce08b8", "name": "multigsea", "owner": "iuc", "tool_shed": "toolshed.g2.bx.psu.edu"}, "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"combine_pvalues\": \"stouffer\", \"databases\": \"all\", \"metabolomics_data\": {\"selector\": \"true\", \"__current_case__\": 0, \"metabolomics\": {\"__class__\": \"ConnectedValue\"}, \"metabolome_ids\": \"HMDB\"}, \"organism\": \"hsapiens\", \"padj_method\": \"BH\", \"proteomics_data\": {\"selector\": \"true\", \"__current_case__\": 0, \"proteomics\": {\"__class__\": \"ConnectedValue\"}, \"proteome_ids\": \"SYMBOL\"}, \"transcriptomics_data\": {\"selector\": \"true\", \"__current_case__\": 0, \"transcriptomics\": {\"__class__\": \"ConnectedValue\"}, \"transcriptome_ids\": \"SYMBOL\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", "tool_version": "1.12.0+galaxy0", "type": "tool", "uuid": "f1ba258f-1a9e-410e-8366-0d3966843fff", "when": null, "workflow_outputs": []}}, "tags": [], "uuid": "d3d42f71-18e4-4c0a-9b21-052f37907c61", "version": 3}
2 changes: 1 addition & 1 deletion topics/statistics/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
layout: topic
topic_name: statistics
---
---
2 changes: 1 addition & 1 deletion topics/statistics/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ requirements:
editorial_board:
- marziacremona
- cumbof
- anuprulez
- anuprulez
Loading