Skip to content

Commit

Permalink
Merge pull request #5214 from galaxyproject/subinamehta-patch-4
Browse files Browse the repository at this point in the history
Created a learning pathway for GTA- proteogenomics.md
  • Loading branch information
shiltemann authored Aug 9, 2024
2 parents 62d550f + 0740f45 commit 47215d4
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 27 deletions.
16 changes: 10 additions & 6 deletions learning-pathways/clinical-metaproteomics.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ title: Clinical metaproteomics workflows within Galaxy
description: |
This learning path aims to teach you the basics of how to perform metaproteomics analysis of the clinical data within the Galaxy platform. You will learn how to use Galaxy for analysis and will be guided through the most common first steps of any metaproteomics database generation to searching the database, verifying the proteins/peptides, and data analysis.
priority: 3
cover-image: shared/images/proteomics.png
cover-image-alt: image of a 3D protein folding structure


editorial_board:
- subinamehta

Expand All @@ -27,24 +30,25 @@ pathway:
The identified peptides and proteins from various software will be combined later to perform verification.
tutorials:
- name: clinical-mp-2-discovery
topic: proteomics
topic: proteomics


- section: "Module 3: Verification"
description: |
Here we use the PepQuery tool to verify the presence of the peptides as well as validate that the peptides/proteins
Here we use the PepQuery tool to verify the presence of the peptides as well as validate that the peptides/proteins
identified are indeed of microbial origin.
tutorials:
- name: clinical-mp-3-verification
topic: proteomics

- section: "Module 4: Quantitation"
description: |
In this module, we perform quantitative analysis of our data using MaxQuant. Quantitative analysis will help us identify
differentially abundant proteins present in the sample and their abundance in various conditions.
In this module, we perform quantitative analysis of our data using MaxQuant. Quantitative analysis will help us identify
differertially abundant proteins present in the sample and their abundance in various conditions.
tutorials:
- name: clinical-mp-4-quantitation
topic: proteomics
topic: proteomics

- section: "Module 5: Data Interpretation"
description: |
Expand Down
42 changes: 42 additions & 0 deletions learning-pathways/proteogenomics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
layout: learning-pathway
tags: [beginner]
type: use


title: Proteogenomics
description: |
This learning path aims to teach you the basics of how to perform proteogenomics analysis of the Mass spectrometry data within the Galaxy platform. You will learn how to use Galaxy for analysis and will be guided through the most common first steps of any proteogenomics database generation to searching the database, followed by novel peptide data analysis.
cover-image: shared/images/proteomics.png
cover-image-alt: image of a 3D protein folding structure

editorial_board:
- subinamehta

pathway:
- section: "Module 1: Database generation"
description: |
Get a first look at the Galaxy platform for data analysis. We start with a short introduction to familiarize you with the Galaxy interface, and then proceed with understanding how to generate a customized database for proteogenomics.
tutorials:
- name: proteogenomics-dbcreation
topic: proteomics

- section: "Module 2: Database searching"
description: |
This section helps to guide the users through an MSMS dataset search against the customized database generated in the first module. The identified peptides and proteins will be then analyzed later in the novel peptide analysis.
tutorials:
- name: proteogenomics-dbsearch
topic: proteomics

- section: "Module 3: Novel Peptide Analysis"
description: |
The last module in the proteogenomics tutorial is to identify "novel peptides" using BlastP and to localize the peptides to their genomic coordinates. Both inputs from modules 1 and 2 are required to run this tutorial.
tutorials:
- name: proteogenomics-novel-peptide-analysis
topic: proteomics

---

New to Galaxy and/or the field of metaproteomics? Follow this learning path to get familiar with the basics!

Binary file added shared/images/proteomics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion topics/proteomics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ topic | features
[Label-free versus Labelled - How to Choose Your Quantitation Method](tutorials/labelfree-vs-labelled/tutorial.md)| [:book:](tutorials/labelfree-vs-labelled/tutorial.md)
[Metaproteomics](tutorials/metaproteomics/tutorial.md)| [:book:](tutorials/metaproteomics/tutorial.md)
[metaQuantome Data creation](tutorials/metaquantome-data-creation/tutorial.md)| [:book:](tutorials/metaquantome-data-creation/tutorial.md)
[RNA-seq Database creation](tutorials/proteogenomics-dbcreation/tutorial.md)| [:book:](tutorials/proteogenomics-dbcreation/tutorial.md)
[Proteogenomics RNA-seq Database creation](tutorials/proteogenomics-dbcreation/tutorial.md)| [:book:](tutorials/proteogenomics-dbcreation/tutorial.md)
[Proteogenomics Database searching](tutorials/proteogenomics-dbsearch/tutorial.md)| [:book:](tutorials/proteogenomics-dbsearch/tutorial.md)
[Proteogenomics Novel Peptide Analysis](tutorials/proteogenomics-novel-peptide-analysis/tutorial.md)| [:book:](tutorials/proteogenomics-novel-peptide-analysis/tutorial.md)
[metaQuantome-Taxonomy](tutorials/metaquantome-taxonomy/tutorial.md) | [:book:](tutorials/metaquantome-taxonomy/tutorial.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ In this tutorial, protein and the total RNA sample was obtained from the early d
>
> {% snippet faqs/galaxy/histories_create_new.md %}
>
> 2. Import the Uniprot FASTA, FASTQ file and the GTF file from Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1489208.svg)](https://doi.org/10.5281/zenodo.1489208)
> 2. Import the Uniprot FASTA, FASTQ file and the GTF file from Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13270741.svg)](https://doi.org/10.5281/zenodo.13270741)
> ```
> https://zenodo.org/records/1489208/files/Trimmed_ref_5000_uniprot_cRAP.fasta
> https://zenodo.org/record/1489208/files/FASTQ_ProB_22LIST.fastqsanger
Expand Down
19 changes: 11 additions & 8 deletions topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,15 @@ In this tutorial, we perform proteogenomic database searching using the Mass Spe
> <hands-on-title>data upload and organization</hands-on-title>
>
> 1. Create a **new history** and name it something meaningful (e.g. *Proteogenomics DB search*)
> 2. Import the four MGF MS/MS files and the Trimmed_ref_5000_uniprot_cRAP.FASTA sequence file from Zenodo.[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1489208.svg)](https://doi.org/10.5281/zenodo.1489208)
> 2. Import the four MGF MS/MS files and the Trimmed_ref_5000_uniprot_cRAP.FASTA sequence file from Zenodo.[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13270741.svg)](https://doi.org/10.5281/zenodo.13270741)
> ```
> https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f4.mgf
> https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f5.mgf
> https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f8.mgf
> https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f9.mgf
> https://zenodo.org/records/13270741/files/Uniprot_cRAP_SAV_indel_translatedbed.FASTA
> https://zenodo.org/records/13270741/files/Reference_Protein_Accessions.tabular
>
> ```
>
> {% snippet faqs/galaxy/datasets_import_via_link.md %}
Expand All @@ -87,7 +90,7 @@ In this tutorial, we perform proteogenomic database searching using the Mass Spe
# Match peptide sequences
The search database labelled `Uniprot_cRAP_SAV_indel_translatedbed.FASTA` is the input database that
The search database labeled `Uniprot_cRAP_SAV_indel_translatedbed.FASTA` is the input database that
will be used to match MS/MS to peptide sequences via a sequence database search.
For this, the sequence database-searching program called [SearchGUI](https://compomics.github.io/projects/searchgui.html) will be used.The generated dataset collection of the three *MGF files* in the history is used as the MS/MS input. We will walk through a number of these settings in order to utilize SearchGUI on these example MGF files.
Expand Down Expand Up @@ -240,7 +243,7 @@ The mzidentml output from the Peptide shaker is converted into an sqlite databas
>
{: .hands_on}
The next step is to remove known peptides from the list of PSM's that we acquired from the Peptide Shaker results. For that we need to perform Query tabular to extract list of known peptides from the UniProt and cRAP database.
The next step is to remove known peptides from the list of PSMs that we acquired from the Peptide Shaker results. For that, we need to perform Query tabular to extract the list of known peptides from the UniProt and cRAP database.
## Query Tabular
Expand Down Expand Up @@ -327,7 +330,7 @@ The next step is to remove known peptides from the list of PSM's that we acquire
>
> - *"include query result column headers"*: `Yes`
>
> - Click **Run Tool** and inspect the query results file after it turned green.
> - Click **Run Tool** and inspect the query results file after it turns green.
>
{: .hands_on}
Expand Down Expand Up @@ -361,7 +364,7 @@ The output from this step is that the resultant peptides would be those which do
> ```
> - *"include query result column headers"*: `Yes`
>
> - Click **Run Tool** and inspect the query results file after it turned green.
> - Click **Run Tool** and inspect the query results file after it turns green.
>
> ![QT](../../images/QT_output.png)
>
Expand Down Expand Up @@ -392,11 +395,11 @@ The output FASTA file is going to be subjected to BLAST-P analysis.
# **Conclusion**
# Conclusion
This completes the walkthrough of the proteogenomics database search workflow. This tutorial is a guide to perform database searching with mass spectronetry files and have peptides ready for Blast-P analysis, you can perform follow up analysis using the next GTN "Proteogenomics Novel Peptide Analysis".
Researchers can use this workflow with their data also, please note that the tool parameters, reference genomes and the workflow will be needed to be modified accordingly.
This completes the walkthrough of the proteogenomics database search workflow. This tutorial is a guide to performing database searching with mass spectrometry files and having peptides ready for Blast-P analysis, you can perform follow-up analysis using the next GTN "Proteogenomics Novel Peptide Analysis".
Researchers can use this workflow with their data also, please note that the tool parameters, reference genomes, and the workflow will need to be modified accordingly.
This workflow was developed by the Galaxy-P team at the University of Minnesota. For more information about Galaxy-P or our ongoing work, please visit us at [galaxyp.org](https://galaxyp.org)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ All the files to run this workflow can be obtained from the [second tutorial]({%
> - **Mz to sqlite**
> - **Genomic mapping sqlite**
>
> If you do not have these files from the previous tutorials in this series, you can import them from Zenodo:
> If you do not have these files from the previous tutorials in this series, you can import them from [Zenodo](https://doi.org/10.5281/zenodo.13270741)
> ```
> https://zenodo.org/record/1489208/files/Peptides_for_Blast-P_analysis.tabular
> https://zenodo.org/record/1489208/files/PeptideShaker_PSM.tabular
Expand All @@ -83,9 +83,9 @@ All the files to run this workflow can be obtained from the [second tutorial]({%
# Peptide Selection
[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a web based tool used to compare biological sequences. BlastP, matches protein sequences against a protein database. More specifically, it looks at the amino acid sequence of proteins and can detect and evaluate the amount of differences between say, an experimentally derived sequence and all known amino acid sequences from a database. It can then find the most similar sequences and allow for identification of known proteins or for identification of potential peptides associated with novel proteoforms.
[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a web-based tool used to compare biological sequences. BlastP, matches protein sequences against a protein database. More specifically, it looks at the amino acid sequence of proteins and can detect and evaluate the amount of differences between say, an experimentally derived sequence and all known amino acid sequences from a database. It can then find the most similar sequences and allow for the identification of known proteins or for the identification of potential peptides associated with novel proteoforms.
The first step in this tutorial is to perfrom BLAST-P analysis using the NCBI-NR database. The output from BLASTP will determine the identification of the novel peptides. The result is a tabular file with 25 columns containing all the information regarding the alignment of these peptides with the sequences in the NCBI-NR database.
The first step in this tutorial is to perform BLAST-P analysis using the NCBI-NR database. The output from BLASTP will determine the identification of the novel peptides. The result is a tabular file with 25 columns containing all the information regarding the alignment of these peptides with the sequences in the NCBI-NR database.
> <hands-on-title>NCBI BLAST+ blastp</hands-on-title>
>
Expand Down Expand Up @@ -200,15 +200,15 @@ Once this step is completed, a tabular output containing novel proteoforms are d
# Multiomics Visualization Platform (MVP)
The Multiomics Visualization Platform is a Galaxy visualization plugin that allows the user to browse the selected proteomics data. It uses the SQlite database which allows the data to be filtered and aggregated in a user defined manner. It allows various features such as; the PSM can be displayed with a lorikeet spectral view, the selected peptide can be displayed in a protein view and an IGV browser is also available for the selected protein. The step by step guide shown below will provide a walkthrough on how to use this plugin (NOTE: the example shown below is a representative peptide which is subjected to change, so while you are running this tool please take a look at the "Novel Peptide" output from the previous steps).
The Multiomics Visualization Platform is a Galaxy visualization plugin that allows the user to browse the selected proteomics data. It uses the SQlite database which allows the data to be filtered and aggregated in a user-defined manner. It allows various features such as; the PSM can be displayed with a lorikeet spectral view, the selected peptide can be displayed in a protein view and an IGV browser is also available for the selected protein. The step-by-step guide shown below will provide a walkthrough on how to use this plugin (NOTE: the example shown below is a representative peptide that is subjected to change, so while you are running this tool please take a look at the "Novel Peptide" output from the previous steps).
> <hands-on-title>Guide to MVP</hands-on-title>
>
> The spectra belonging to these "Novel peptides" can be viewed using MVP,this can be achieved by selecting the output from the `mz to sqlite tool` (Generated in the second workflow).
> Here is a step by step guide to obtain the proteogenomic view of the "Novel peptides".
> The spectra belonging to these "Novel peptides" can be viewed using MVP, this can be achieved by selecting the output from the `mz to sqlite tool` (Generated in the second workflow).
> Here is a step-by-step guide to obtain the proteogenomic view of the "Novel peptides".
>
>
> 1) Click on the **Visualize in MVP application**, it will open up options for visualization application in the center pane, Select **MVP Application** from the options (or Right click to open in a new window).
> 1) Click on the **Visualize in MVP application**, it will open up options for visualization application in the center pane, Select **MVP Application** from the options (or Right-click to open in a new window).
>
> ![mz to sqlite](../../images/Visualize.png){:width="20%"}
>
Expand Down Expand Up @@ -333,7 +333,7 @@ Given chromosomal locations of peptides in a BED file, PepPointer classifies the
>
{: .hands_on}
The final tool for this workflow generates a tabular output that summarizes the information after running these workflows. The final summary output consists of the Peptide sequence, the spectra associated with the peptides, the protein accession number, chromosome number, Start and Stop of the genomic coordinate, the annotation, the genomic coordinate entry for viewing in Integrative Genomics Viewer (IGV), MVP or UCSC genome browser and the URL for viewing it on UCSC genome browser. This summary is created with the help of the query tabular tool.
The final tool for this workflow generates a tabular output that summarizes the information after running these workflows. The final summary output consists of the Peptide sequence, the spectra associated with the peptides, the protein accession number, the chromosome number, Start and Stop of the genomic coordinate, the annotation, the genomic coordinate entry for viewing in Integrative Genomics Viewer (IGV), MVP or UCSC genome browser and the URL for viewing it on UCSC genome browser. This summary is created with the help of the query tabular tool.
# Final Summary Output
Expand Down Expand Up @@ -377,17 +377,17 @@ The final tool for this workflow generates a tabular output that summarizes the
>
> - *"include query result column headers"*: `Yes`
>
> 2. Click **Run Tool** and inspect the query results file after it turned green. If everything went well, it should look similiar:
> 2. Click **Run Tool** and inspect the query results file after it turns green. If everything goes well, it should look similar:
>
> ![Final Summary](../../images/final_summary.png){:width="100%"}
>
> The Final summary displays a tabular output containing the list of novel peptides and its corresponding protein. It also provides the users with the chromosomal location of the novel proteoform along with the peptide's start and end position. The output also features the strand information, gene annotation and the genomic coordinates in a specific format that could be used on IGV or UCSC browser. It also provides the user with a UCSC Genome Browser link, which the user can directly copy and paste it on a web browser to learn more about the novel proteoform. Here we are demonstrating the use of proteogenomics workflow on an example trimmed mouse dataset. This study explores the possibilities for downstream biological /functional analysis of peptides corresponding to novel proteoforms.
>
{: .hands_on}
### Conclusion
## Conclusion
This completes the proteogenomics workflow analysis. This training workflow uses mouse data. For any other organism the data, tool paramters and the workflow will need to be modified accordingly.This workflow is also available at [usegalaxy.eu](https://usegalaxy.eu/).
This completes the proteogenomics workflow analysis. This training workflow uses mouse data. For any other organism, the data, tool parameters and workflow will need to be modified accordingly. This workflow is also available at [usegalaxy.eu](https://usegalaxy.eu/). All the tools are in the most stable version here (published in 2018), the tools are subject to changes and upgrades, hence there could be minor formatting that would be required.
This workflow was developed by the Galaxy-P team at the University of Minnesota.
For more information about Galaxy-P or our ongoing work, please visit us at [galaxyp.org](http://galaxyp.org)

0 comments on commit 47215d4

Please sign in to comment.