diff --git a/learning-pathways/clinical-metaproteomics.md b/learning-pathways/clinical-metaproteomics.md index 78aac5046df789..ad9b78175211f7 100644 --- a/learning-pathways/clinical-metaproteomics.md +++ b/learning-pathways/clinical-metaproteomics.md @@ -8,7 +8,10 @@ title: Clinical metaproteomics workflows within Galaxy description: | This learning path aims to teach you the basics of how to perform metaproteomics analysis of the clinical data within the Galaxy platform. You will learn how to use Galaxy for analysis and will be guided through the most common first steps of any metaproteomics database generation to searching the database, verifying the proteins/peptides, and data analysis. -priority: 3 +cover-image: shared/images/proteomics.png +cover-image-alt: image of a 3D protein folding structure + + editorial_board: - subinamehta @@ -27,12 +30,12 @@ pathway: The identified peptides and proteins from various software will be combined later to perform verification. tutorials: - name: clinical-mp-2-discovery - topic: proteomics + topic: proteomics - section: "Module 3: Verification" description: | - Here we use the PepQuery tool to verify the presence of the peptides as well as validate that the peptides/proteins + Here we use the PepQuery tool to verify the presence of the peptides as well as validate that the peptides/proteins identified are indeed of microbial origin. tutorials: - name: clinical-mp-3-verification @@ -40,11 +43,12 @@ pathway: - section: "Module 4: Quantitation" description: | - In this module, we perform quantitative analysis of our data using MaxQuant. Quantitative analysis will help us identify - differentially abundant proteins present in the sample and their abundance in various conditions. + In this module, we perform quantitative analysis of our data using MaxQuant. Quantitative analysis will help us identify + differertially abundant proteins present in the sample and their abundance in various conditions. + tutorials: - name: clinical-mp-4-quantitation - topic: proteomics + topic: proteomics - section: "Module 5: Data Interpretation" description: | diff --git a/learning-pathways/proteogenomics.md b/learning-pathways/proteogenomics.md new file mode 100644 index 00000000000000..b63b6156332b94 --- /dev/null +++ b/learning-pathways/proteogenomics.md @@ -0,0 +1,42 @@ +--- +layout: learning-pathway +tags: [beginner] +type: use + + +title: Proteogenomics +description: | + This learning path aims to teach you the basics of how to perform proteogenomics analysis of the Mass spectrometry data within the Galaxy platform. You will learn how to use Galaxy for analysis and will be guided through the most common first steps of any proteogenomics database generation to searching the database, followed by novel peptide data analysis. + +cover-image: shared/images/proteomics.png +cover-image-alt: image of a 3D protein folding structure + +editorial_board: +- subinamehta + +pathway: + - section: "Module 1: Database generation" + description: | + Get a first look at the Galaxy platform for data analysis. We start with a short introduction to familiarize you with the Galaxy interface, and then proceed with understanding how to generate a customized database for proteogenomics. + tutorials: + - name: proteogenomics-dbcreation + topic: proteomics + + - section: "Module 2: Database searching" + description: | + This section helps to guide the users through an MSMS dataset search against the customized database generated in the first module. The identified peptides and proteins will be then analyzed later in the novel peptide analysis. + tutorials: + - name: proteogenomics-dbsearch + topic: proteomics + + - section: "Module 3: Novel Peptide Analysis" + description: | + The last module in the proteogenomics tutorial is to identify "novel peptides" using BlastP and to localize the peptides to their genomic coordinates. Both inputs from modules 1 and 2 are required to run this tutorial. + tutorials: + - name: proteogenomics-novel-peptide-analysis + topic: proteomics + +--- + +New to Galaxy and/or the field of metaproteomics? Follow this learning path to get familiar with the basics! + diff --git a/shared/images/proteomics.png b/shared/images/proteomics.png new file mode 100644 index 00000000000000..315da0d6ada96a Binary files /dev/null and b/shared/images/proteomics.png differ diff --git a/topics/proteomics/README.md b/topics/proteomics/README.md index 1af3988dfb6cc0..4b1ae1d981cf2a 100644 --- a/topics/proteomics/README.md +++ b/topics/proteomics/README.md @@ -13,7 +13,7 @@ topic | features [Label-free versus Labelled - How to Choose Your Quantitation Method](tutorials/labelfree-vs-labelled/tutorial.md)| [:book:](tutorials/labelfree-vs-labelled/tutorial.md) [Metaproteomics](tutorials/metaproteomics/tutorial.md)| [:book:](tutorials/metaproteomics/tutorial.md) [metaQuantome Data creation](tutorials/metaquantome-data-creation/tutorial.md)| [:book:](tutorials/metaquantome-data-creation/tutorial.md) -[RNA-seq Database creation](tutorials/proteogenomics-dbcreation/tutorial.md)| [:book:](tutorials/proteogenomics-dbcreation/tutorial.md) +[Proteogenomics RNA-seq Database creation](tutorials/proteogenomics-dbcreation/tutorial.md)| [:book:](tutorials/proteogenomics-dbcreation/tutorial.md) [Proteogenomics Database searching](tutorials/proteogenomics-dbsearch/tutorial.md)| [:book:](tutorials/proteogenomics-dbsearch/tutorial.md) [Proteogenomics Novel Peptide Analysis](tutorials/proteogenomics-novel-peptide-analysis/tutorial.md)| [:book:](tutorials/proteogenomics-novel-peptide-analysis/tutorial.md) [metaQuantome-Taxonomy](tutorials/metaquantome-taxonomy/tutorial.md) | [:book:](tutorials/metaquantome-taxonomy/tutorial.md) diff --git a/topics/proteomics/tutorials/proteogenomics-dbcreation/tutorial.md b/topics/proteomics/tutorials/proteogenomics-dbcreation/tutorial.md index 3a93daf078fc70..ded935882ed12a 100644 --- a/topics/proteomics/tutorials/proteogenomics-dbcreation/tutorial.md +++ b/topics/proteomics/tutorials/proteogenomics-dbcreation/tutorial.md @@ -81,7 +81,7 @@ In this tutorial, protein and the total RNA sample was obtained from the early d > > {% snippet faqs/galaxy/histories_create_new.md %} > -> 2. Import the Uniprot FASTA, FASTQ file and the GTF file from Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1489208.svg)](https://doi.org/10.5281/zenodo.1489208) +> 2. Import the Uniprot FASTA, FASTQ file and the GTF file from Zenodo [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13270741.svg)](https://doi.org/10.5281/zenodo.13270741) > ``` > https://zenodo.org/records/1489208/files/Trimmed_ref_5000_uniprot_cRAP.fasta > https://zenodo.org/record/1489208/files/FASTQ_ProB_22LIST.fastqsanger diff --git a/topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.md b/topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.md index 00ab9c0e1ba4d3..c831f90862ff62 100644 --- a/topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.md +++ b/topics/proteomics/tutorials/proteogenomics-dbsearch/tutorial.md @@ -67,12 +67,15 @@ In this tutorial, we perform proteogenomic database searching using the Mass Spe > data upload and organization > > 1. Create a **new history** and name it something meaningful (e.g. *Proteogenomics DB search*) -> 2. Import the four MGF MS/MS files and the Trimmed_ref_5000_uniprot_cRAP.FASTA sequence file from Zenodo.[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1489208.svg)](https://doi.org/10.5281/zenodo.1489208) +> 2. Import the four MGF MS/MS files and the Trimmed_ref_5000_uniprot_cRAP.FASTA sequence file from Zenodo.[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13270741.svg)](https://doi.org/10.5281/zenodo.13270741) > ``` > https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f4.mgf > https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f5.mgf > https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f8.mgf > https://zenodo.org/record/1489208/files/Mo_Tai_Trimmed_mgfs__Mo_Tai_iTRAQ_f9.mgf +> https://zenodo.org/records/13270741/files/Uniprot_cRAP_SAV_indel_translatedbed.FASTA +> https://zenodo.org/records/13270741/files/Reference_Protein_Accessions.tabular +> > ``` > > {% snippet faqs/galaxy/datasets_import_via_link.md %} @@ -87,7 +90,7 @@ In this tutorial, we perform proteogenomic database searching using the Mass Spe # Match peptide sequences -The search database labelled `Uniprot_cRAP_SAV_indel_translatedbed.FASTA` is the input database that +The search database labeled `Uniprot_cRAP_SAV_indel_translatedbed.FASTA` is the input database that will be used to match MS/MS to peptide sequences via a sequence database search. For this, the sequence database-searching program called [SearchGUI](https://compomics.github.io/projects/searchgui.html) will be used.The generated dataset collection of the three *MGF files* in the history is used as the MS/MS input. We will walk through a number of these settings in order to utilize SearchGUI on these example MGF files. @@ -240,7 +243,7 @@ The mzidentml output from the Peptide shaker is converted into an sqlite databas > {: .hands_on} -The next step is to remove known peptides from the list of PSM's that we acquired from the Peptide Shaker results. For that we need to perform Query tabular to extract list of known peptides from the UniProt and cRAP database. +The next step is to remove known peptides from the list of PSMs that we acquired from the Peptide Shaker results. For that, we need to perform Query tabular to extract the list of known peptides from the UniProt and cRAP database. ## Query Tabular @@ -327,7 +330,7 @@ The next step is to remove known peptides from the list of PSM's that we acquire > > - *"include query result column headers"*: `Yes` > -> - Click **Run Tool** and inspect the query results file after it turned green. +> - Click **Run Tool** and inspect the query results file after it turns green. > {: .hands_on} @@ -361,7 +364,7 @@ The output from this step is that the resultant peptides would be those which do > ``` > - *"include query result column headers"*: `Yes` > -> - Click **Run Tool** and inspect the query results file after it turned green. +> - Click **Run Tool** and inspect the query results file after it turns green. > > ![QT](../../images/QT_output.png) > @@ -392,11 +395,11 @@ The output FASTA file is going to be subjected to BLAST-P analysis. -# **Conclusion** +# Conclusion -This completes the walkthrough of the proteogenomics database search workflow. This tutorial is a guide to perform database searching with mass spectronetry files and have peptides ready for Blast-P analysis, you can perform follow up analysis using the next GTN "Proteogenomics Novel Peptide Analysis". -Researchers can use this workflow with their data also, please note that the tool parameters, reference genomes and the workflow will be needed to be modified accordingly. +This completes the walkthrough of the proteogenomics database search workflow. This tutorial is a guide to performing database searching with mass spectrometry files and having peptides ready for Blast-P analysis, you can perform follow-up analysis using the next GTN "Proteogenomics Novel Peptide Analysis". +Researchers can use this workflow with their data also, please note that the tool parameters, reference genomes, and the workflow will need to be modified accordingly. This workflow was developed by the Galaxy-P team at the University of Minnesota. For more information about Galaxy-P or our ongoing work, please visit us at [galaxyp.org](https://galaxyp.org) diff --git a/topics/proteomics/tutorials/proteogenomics-novel-peptide-analysis/tutorial.md b/topics/proteomics/tutorials/proteogenomics-novel-peptide-analysis/tutorial.md index 67be973d456fc5..dd5504edce4559 100644 --- a/topics/proteomics/tutorials/proteogenomics-novel-peptide-analysis/tutorial.md +++ b/topics/proteomics/tutorials/proteogenomics-novel-peptide-analysis/tutorial.md @@ -70,7 +70,7 @@ All the files to run this workflow can be obtained from the [second tutorial]({% > - **Mz to sqlite** > - **Genomic mapping sqlite** > -> If you do not have these files from the previous tutorials in this series, you can import them from Zenodo: +> If you do not have these files from the previous tutorials in this series, you can import them from [Zenodo](https://doi.org/10.5281/zenodo.13270741) > ``` > https://zenodo.org/record/1489208/files/Peptides_for_Blast-P_analysis.tabular > https://zenodo.org/record/1489208/files/PeptideShaker_PSM.tabular @@ -83,9 +83,9 @@ All the files to run this workflow can be obtained from the [second tutorial]({% # Peptide Selection -[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a web based tool used to compare biological sequences. BlastP, matches protein sequences against a protein database. More specifically, it looks at the amino acid sequence of proteins and can detect and evaluate the amount of differences between say, an experimentally derived sequence and all known amino acid sequences from a database. It can then find the most similar sequences and allow for identification of known proteins or for identification of potential peptides associated with novel proteoforms. +[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) is a web-based tool used to compare biological sequences. BlastP, matches protein sequences against a protein database. More specifically, it looks at the amino acid sequence of proteins and can detect and evaluate the amount of differences between say, an experimentally derived sequence and all known amino acid sequences from a database. It can then find the most similar sequences and allow for the identification of known proteins or for the identification of potential peptides associated with novel proteoforms. -The first step in this tutorial is to perfrom BLAST-P analysis using the NCBI-NR database. The output from BLASTP will determine the identification of the novel peptides. The result is a tabular file with 25 columns containing all the information regarding the alignment of these peptides with the sequences in the NCBI-NR database. +The first step in this tutorial is to perform BLAST-P analysis using the NCBI-NR database. The output from BLASTP will determine the identification of the novel peptides. The result is a tabular file with 25 columns containing all the information regarding the alignment of these peptides with the sequences in the NCBI-NR database. > NCBI BLAST+ blastp > @@ -200,15 +200,15 @@ Once this step is completed, a tabular output containing novel proteoforms are d # Multiomics Visualization Platform (MVP) -The Multiomics Visualization Platform is a Galaxy visualization plugin that allows the user to browse the selected proteomics data. It uses the SQlite database which allows the data to be filtered and aggregated in a user defined manner. It allows various features such as; the PSM can be displayed with a lorikeet spectral view, the selected peptide can be displayed in a protein view and an IGV browser is also available for the selected protein. The step by step guide shown below will provide a walkthrough on how to use this plugin (NOTE: the example shown below is a representative peptide which is subjected to change, so while you are running this tool please take a look at the "Novel Peptide" output from the previous steps). +The Multiomics Visualization Platform is a Galaxy visualization plugin that allows the user to browse the selected proteomics data. It uses the SQlite database which allows the data to be filtered and aggregated in a user-defined manner. It allows various features such as; the PSM can be displayed with a lorikeet spectral view, the selected peptide can be displayed in a protein view and an IGV browser is also available for the selected protein. The step-by-step guide shown below will provide a walkthrough on how to use this plugin (NOTE: the example shown below is a representative peptide that is subjected to change, so while you are running this tool please take a look at the "Novel Peptide" output from the previous steps). > Guide to MVP > -> The spectra belonging to these "Novel peptides" can be viewed using MVP,this can be achieved by selecting the output from the `mz to sqlite tool` (Generated in the second workflow). -> Here is a step by step guide to obtain the proteogenomic view of the "Novel peptides". +> The spectra belonging to these "Novel peptides" can be viewed using MVP, this can be achieved by selecting the output from the `mz to sqlite tool` (Generated in the second workflow). +> Here is a step-by-step guide to obtain the proteogenomic view of the "Novel peptides". > > -> 1) Click on the **Visualize in MVP application**, it will open up options for visualization application in the center pane, Select **MVP Application** from the options (or Right click to open in a new window). +> 1) Click on the **Visualize in MVP application**, it will open up options for visualization application in the center pane, Select **MVP Application** from the options (or Right-click to open in a new window). > > ![mz to sqlite](../../images/Visualize.png){:width="20%"} > @@ -333,7 +333,7 @@ Given chromosomal locations of peptides in a BED file, PepPointer classifies the > {: .hands_on} -The final tool for this workflow generates a tabular output that summarizes the information after running these workflows. The final summary output consists of the Peptide sequence, the spectra associated with the peptides, the protein accession number, chromosome number, Start and Stop of the genomic coordinate, the annotation, the genomic coordinate entry for viewing in Integrative Genomics Viewer (IGV), MVP or UCSC genome browser and the URL for viewing it on UCSC genome browser. This summary is created with the help of the query tabular tool. +The final tool for this workflow generates a tabular output that summarizes the information after running these workflows. The final summary output consists of the Peptide sequence, the spectra associated with the peptides, the protein accession number, the chromosome number, Start and Stop of the genomic coordinate, the annotation, the genomic coordinate entry for viewing in Integrative Genomics Viewer (IGV), MVP or UCSC genome browser and the URL for viewing it on UCSC genome browser. This summary is created with the help of the query tabular tool. # Final Summary Output @@ -377,7 +377,7 @@ The final tool for this workflow generates a tabular output that summarizes the > > - *"include query result column headers"*: `Yes` > -> 2. Click **Run Tool** and inspect the query results file after it turned green. If everything went well, it should look similiar: +> 2. Click **Run Tool** and inspect the query results file after it turns green. If everything goes well, it should look similar: > > ![Final Summary](../../images/final_summary.png){:width="100%"} > @@ -385,9 +385,9 @@ The final tool for this workflow generates a tabular output that summarizes the > {: .hands_on} -### Conclusion +## Conclusion -This completes the proteogenomics workflow analysis. This training workflow uses mouse data. For any other organism the data, tool paramters and the workflow will need to be modified accordingly.This workflow is also available at [usegalaxy.eu](https://usegalaxy.eu/). +This completes the proteogenomics workflow analysis. This training workflow uses mouse data. For any other organism, the data, tool parameters and workflow will need to be modified accordingly. This workflow is also available at [usegalaxy.eu](https://usegalaxy.eu/). All the tools are in the most stable version here (published in 2018), the tools are subject to changes and upgrades, hence there could be minor formatting that would be required. This workflow was developed by the Galaxy-P team at the University of Minnesota. For more information about Galaxy-P or our ongoing work, please visit us at [galaxyp.org](http://galaxyp.org)