Merge pull request #522 from Proteobench/add_peaks_support

PEAKS integration
Proteobench · Jan 13, 2025 · 7982fae · 7982fae
2 parents 8188166 + 27e83e5
commit 7982fae
Show file tree

Hide file tree

Showing 9 changed files with 135 additions and 2 deletions.
diff --git a/docs/available-modules/2-quant-lfq-ion-dda.md b/docs/available-modules/2-quant-lfq-ion-dda.md
@@ -64,7 +64,7 @@ The module is flexible in terms of what workflow the participants can run. Howev
 
 When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs. By doing so, your workflow outputs, parameters and calculated metrics will be stored and publicly available. 
 
-To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, and i2MassChroQ (see bellow for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button 
+To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, AlphaPept, PEAKS and i2MassChroQ (see below for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button 
 `I really want to upload it` will appear to trigger the submission.
 
 After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `ProlineStudio__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure. 
@@ -81,6 +81,7 @@ Table 2 provides an overview of the required input files for public submission.
 |MaxQuant|evidence.txt|mqpar.xml|
 |Proline Studio|<result file>.xlsx|<result file>.xlsx|
 |Sage|lfq.tsv|results.json|
+|PEAKS|lfq_features.csv|parameters.txt|
 
 ### AlphaPept
 1. Load folder that contains the data files.
@@ -129,6 +130,12 @@ For public submission, you can upload the same excel export, just make sure to h
 MSAngel allows to build piplenes for bottom-up MS analysis with a choice of search engines, validation strategy and the Proline quantification. 
 More information can be found [here](https://www.profiproteomics.fr/ms-angel/)
 
+### PEAKS (work in progress)
+When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B".
+Make sure to set Enzyme as trypsin,  Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DDA.
+In workflow section use the PEAKS Q (de novo assisted search quantification) option. Set the different parameters in "Data refine" and "DB search". In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%. 
+Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Feature Vector CSV" from the Label Free Quantification Exports in the "Export" tab. 
+
 ### Sage
 
 1. Convert .raw files into .mzML using MSConvert or ThermoRawFileParser **(do not change the file names)**

diff --git a/docs/available-modules/4-quant-lfq-ion-dia-aif.md b/docs/available-modules/4-quant-lfq-ion-dia-aif.md
@@ -77,7 +77,8 @@ Table 2 provides an overview of the required input files for public submission.
 |DIA-NN|*_report.tsv|*report.log.txt|
 |FragPipe|*_report.tsv|fragpipe.workflow|
 |MaxDIA|evidence.txt|mqpar.xml|
-|Spectronaut|*.tsv|*.txt
+|Spectronaut|*.tsv|*.txt|
+|PEAKS|lfq.dia.peptides.csv|parameters.txt|
 
 
 After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `Proline__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure. 
@@ -113,6 +114,13 @@ By default, MaxDIA uses a contaminants-only fasta file that is located in the so
 
 For this module, use the "evidence.txt" output in the "txt" folder of MaxQuant search outputs. For public submission, please upload the "mqpar.xml" file associated with your search.
 
+### [PEAKS](https://www.bioinfor.com//)/) (work in progress)
+When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B".
+Make sure to set Enzyme as trypsin, Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DIA.
+In workflow section use the Quantification option. While we do not propose to use a custom spectral library, one could define one in the "Spectral library" tab. Define the different search parameters in the tab "DB search". 
+In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%. 
+Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Peptide CSV" from the Label Free Quantification Exports in the "Export" tab. 
+
 #### Troubleshooting: 
 
 Since the Thermo DIA data .raw files were acquired using a staggered window approach it is highly recommended to convert and demultiplex the .RAW files first into .mzML using MSConvert.

diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DDA/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DDA/parse_settings_peaks.toml
@@ -0,0 +1,37 @@
+[mapper]
+"Accession" = "Proteins"
+"Peptide" = "Sequence"
+"z" = "Charge"
+
+[condition_mapper]
+"Sample 1 Normalized Area" = "A"
+"Sample 2 Normalized Area" = "A"
+"Sample 3 Normalized Area" = "A"
+"Sample 4 Normalized Area" = "B"
+"Sample 5 Normalized Area" = "B"
+"Sample 6 Normalized Area" = "B"
+
+[run_mapper]
+"Sample 1 Normalized Area" = "Condition_A_Sample_Alpha_01"
+"Sample 2 Normalized Area" = "Condition_A_Sample_Alpha_02"
+"Sample 3 Normalized Area" = "Condition_A_Sample_Alpha_03"
+"Sample 4 Normalized Area" = "Condition_B_Sample_Alpha_01"
+"Sample 5 Normalized Area" = "Condition_B_Sample_Alpha_02"
+"Sample 6 Normalized Area" = "Condition_B_Sample_Alpha_03"
+
+[species_mapper]
+"_YEAST" = "YEAST"
+"_ECOLI" = "ECOLI"
+"_HUMAN" = "HUMAN"
+
+[modifications_parser]
+"parse_column" = "Sequence"
+"before_aa" = false
+"isalpha" = true
+"isupper" = true
+"pattern"="(?<=\\().+?(?=\\))"
+"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}
+
+[general]
+"contaminant_flag" = "Cont_"
+"decoy_flag" = false
diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DIA/AIF/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/ion/DIA/AIF/parse_settings_peaks.toml
@@ -0,0 +1,37 @@
+[mapper]
+"Accession" = "Proteins"
+"Peptide" = "Sequence"
+"z" = "Charge"
+
+[condition_mapper]
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "A"
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "A"
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "A"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "B"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "B"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "B"
+
+[run_mapper]
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "Condition_A_Sample_Alpha_01"
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "Condition_A_Sample_Alpha_02"
+"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "Condition_A_Sample_Alpha_03"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "Condition_B_Sample_Alpha_01"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "Condition_B_Sample_Alpha_02"
+"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "Condition_B_Sample_Alpha_03"
+
+[species_mapper]
+"_YEAST" = "YEAST"
+"_ECOLI" = "ECOLI"
+"_HUMAN" = "HUMAN"
+
+[modifications_parser]
+"parse_column" = "Sequence"
+"before_aa" = false
+"isalpha" = true
+"isupper" = true
+"pattern"="(?<=\\().+?(?=\\))"
+"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}
+
+[general]
+"contaminant_flag" = "Cont_"
+"decoy_flag" = false
diff --git a/proteobench/io/parsing/io_parse_settings/Quant/lfq/peptidoform/DDA/parse_settings_peaks.toml b/proteobench/io/parsing/io_parse_settings/Quant/lfq/peptidoform/DDA/parse_settings_peaks.toml
@@ -0,0 +1,36 @@
+[mapper]
+"Accession" = "Proteins"
+"Peptide" = "Sequence"
+
+[condition_mapper]
+"Area Sample 1" = "A"
+"Area Sample 2" = "A"
+"Area Sample 3" = "A"
+"Area Sample 4" = "B"
+"Area Sample 5" = "B"
+"Area Sample 6" = "B"
+
+[run_mapper]
+"Area Sample 1" = "Condition_A_Sample_Alpha_01"
+"Area Sample 2" = "Condition_A_Sample_Alpha_02"
+"Area Sample 3" = "Condition_A_Sample_Alpha_03"
+"Area Sample 4" = "Condition_B_Sample_Alpha_01"
+"Area Sample 5" = "Condition_B_Sample_Alpha_02"
+"Area Sample 6" = "Condition_B_Sample_Alpha_03"
+
+[species_mapper]
+"_YEAST" = "YEAST"
+"_ECOLI" = "ECOLI"
+"_HUMAN" = "HUMAN"
+
+[modifications_parser]
+"parse_column" = "Sequence"
+"before_aa" = false
+"isalpha" = true
+"isupper" = true
+"pattern"="(?<=\\().+?(?=\\))"
+"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}
+
+[general]
+"contaminant_flag" = "Cont_"
+"decoy_flag" = false
diff --git a/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml b/proteobench/io/parsing/io_parse_settings/parse_settings_files.toml
@@ -6,10 +6,12 @@
 "ProlineStudio" = "parse_settings_proline.toml"
 "MSAngel" = "parse_settings_msangel.toml"
 "Sage" = "parse_settings_sage.toml"
+"PEAKS" = "parse_settings_peaks.toml"
 "Custom" = "parse_settings_custom.toml"
 
 [quant_lfq_peptidoform_DDA]
 "WOMBAT" = "parse_settings_wombat.toml"
+"PEAKS" = "parse_settings_peaks.toml"
 "Proteome Discoverer" = "parse_settings_proteomediscoverer.toml"
 "Custom" = "parse_settings_custom.toml"
 
@@ -21,6 +23,7 @@
 "Spectronaut" = "parse_settings_spectronaut.toml"
 "AlphaDIA" = "parse_settings_alphadia.toml"
 "MSAID" = "parse_settings_msaid.toml"
+"PEAKS" = "parse_settings_peaks.toml"
 "Custom" = "parse_settings_custom.toml"
 
 [quant_lfq_ion_DIA_diaPASEF]

diff --git a/proteobench/io/parsing/parse_ion.py b/proteobench/io/parsing/parse_ion.py
@@ -107,6 +107,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame:
         input_data_frame["PG.ProteinGroups"] = input_data_frame["PG.ProteinGroups"].str.join(";")
     elif input_format == "MSAID":
         input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t")
+    elif input_format == "PEAKS":
+        input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",")
 
     return input_data_frame
 

diff --git a/proteobench/io/parsing/parse_peptidoform.py b/proteobench/io/parsing/parse_peptidoform.py
@@ -29,6 +29,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame:
     elif input_format == "Custom":
         input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t")
         input_data_frame["proforma"] = input_data_frame["Modified sequence"]
+    elif input_format == "PEAKS":
+        input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",")
 
     return input_data_frame
 

diff --git a/proteobench/plotting/plot_quant.py b/proteobench/plotting/plot_quant.py
@@ -88,6 +88,7 @@ def plot_metric(
             "FragPipe (DIA-NN quant)": "#ff7f00",
             "MSAID": "#afff57",
             "Proteome Discoverer": "#8c564b",
+            "PEAKS": "#f781bf",
         },
         mapping: Dict[str, int] = {"old": 10, "new": 20},
         highlight_color: str = "#d30067",