Skip to content

Commit

Permalink
Merge pull request #522 from Proteobench/add_peaks_support
Browse files Browse the repository at this point in the history
PEAKS integration
  • Loading branch information
RobbinBouwmeester authored Jan 13, 2025
2 parents 8188166 + 27e83e5 commit 7982fae
Show file tree
Hide file tree
Showing 9 changed files with 135 additions and 2 deletions.
9 changes: 8 additions & 1 deletion docs/available-modules/2-quant-lfq-ion-dda.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ The module is flexible in terms of what workflow the participants can run. Howev

When you have successfully uploaded and visualized a benchmark run, we strongly encourage you to add the result to the online repository. This way, your run will be available to the entire community and can be compared to all other uploaded benchmark runs. By doing so, your workflow outputs, parameters and calculated metrics will be stored and publicly available.

To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, and i2MassChroQ (see bellow for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button
To submit your run for public usage, you need to upload the parameter file associated to your run in the field `Meta data for searches`. Currently, we accept outputs from MaxQuant, FragPipe, Proline Studio, AlphaPept, PEAKS and i2MassChroQ (see below for more tool-specific details). Please fill the `Comments for submission` if needed, and confirm that the metadata is correct (correspond to the benchmark run) before checking the button `I confirm that the metadata is correct`. Then the button
`I really want to upload it` will appear to trigger the submission.

After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `ProlineStudio__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure.
Expand All @@ -81,6 +81,7 @@ Table 2 provides an overview of the required input files for public submission.
|MaxQuant|evidence.txt|mqpar.xml|
|Proline Studio|<result file>.xlsx|<result file>.xlsx|
|Sage|lfq.tsv|results.json|
|PEAKS|lfq_features.csv|parameters.txt|

### AlphaPept
1. Load folder that contains the data files.
Expand Down Expand Up @@ -129,6 +130,12 @@ For public submission, you can upload the same excel export, just make sure to h
MSAngel allows to build piplenes for bottom-up MS analysis with a choice of search engines, validation strategy and the Proline quantification.
More information can be found [here](https://www.profiproteomics.fr/ms-angel/)

### PEAKS (work in progress)
When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B".
Make sure to set Enzyme as trypsin, Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DDA.
In workflow section use the PEAKS Q (de novo assisted search quantification) option. Set the different parameters in "Data refine" and "DB search". In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%.
Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Feature Vector CSV" from the Label Free Quantification Exports in the "Export" tab.

### Sage

1. Convert .raw files into .mzML using MSConvert or ThermoRawFileParser **(do not change the file names)**
Expand Down
10 changes: 9 additions & 1 deletion docs/available-modules/4-quant-lfq-ion-dia-aif.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ Table 2 provides an overview of the required input files for public submission.
|DIA-NN|*_report.tsv|*report.log.txt|
|FragPipe|*_report.tsv|fragpipe.workflow|
|MaxDIA|evidence.txt|mqpar.xml|
|Spectronaut|*.tsv|*.txt
|Spectronaut|*.tsv|*.txt|
|PEAKS|lfq.dia.peptides.csv|parameters.txt|


After upload, you will get a link to the pull request associated with your data. Please copy it and save it. With this link, you can get the unique identifier of your run (for example `Proline__20240106_141919`), and follow the advancement of your submission and add comments to communicate with the ProteoBench maintainers. If everything looks good, your submission will be reviewed and accepted (it will take a few working days). Then, your benchmark run will be added to the public runs of this module and plotted alongside all other benchmark runs in the figure.
Expand Down Expand Up @@ -113,6 +114,13 @@ By default, MaxDIA uses a contaminants-only fasta file that is located in the so

For this module, use the "evidence.txt" output in the "txt" folder of MaxQuant search outputs. For public submission, please upload the "mqpar.xml" file associated with your search.

### [PEAKS](https://www.bioinfor.com//)/) (work in progress)
When starting a new project and selecting the .RAW files, there is no need to modify the sample names given by PEAKS. Just make sure that Sample 1 -> 3 are Condition "A" and Sample 4 -> 6 are condition "B".
Make sure to set Enzyme as trypsin, Instrument as Orbitrap (Orbi-Orbi), Fragment as HCD and Acquisition as DIA.
In workflow section use the Quantification option. While we do not propose to use a custom spectral library, one could define one in the "Spectral library" tab. Define the different search parameters in the tab "DB search".
In the tab "Quantification" use the "Label Free" option, followed by either adding all samples individually or grouping samples according to their respective condition. In the "Report" tab, make sure both Peptide FDR and Protein Group FDR are set to 1%.
Once the workflow has run succesfully, make sure to check the "All Search Parameters" and the "Peptide CSV" from the Label Free Quantification Exports in the "Export" tab.

#### Troubleshooting:

Since the Thermo DIA data .raw files were acquired using a staggered window approach it is highly recommended to convert and demultiplex the .RAW files first into .mzML using MSConvert.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[mapper]
"Accession" = "Proteins"
"Peptide" = "Sequence"
"z" = "Charge"

[condition_mapper]
"Sample 1 Normalized Area" = "A"
"Sample 2 Normalized Area" = "A"
"Sample 3 Normalized Area" = "A"
"Sample 4 Normalized Area" = "B"
"Sample 5 Normalized Area" = "B"
"Sample 6 Normalized Area" = "B"

[run_mapper]
"Sample 1 Normalized Area" = "Condition_A_Sample_Alpha_01"
"Sample 2 Normalized Area" = "Condition_A_Sample_Alpha_02"
"Sample 3 Normalized Area" = "Condition_A_Sample_Alpha_03"
"Sample 4 Normalized Area" = "Condition_B_Sample_Alpha_01"
"Sample 5 Normalized Area" = "Condition_B_Sample_Alpha_02"
"Sample 6 Normalized Area" = "Condition_B_Sample_Alpha_03"

[species_mapper]
"_YEAST" = "YEAST"
"_ECOLI" = "ECOLI"
"_HUMAN" = "HUMAN"

[modifications_parser]
"parse_column" = "Sequence"
"before_aa" = false
"isalpha" = true
"isupper" = true
"pattern"="(?<=\\().+?(?=\\))"
"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}

[general]
"contaminant_flag" = "Cont_"
"decoy_flag" = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[mapper]
"Accession" = "Proteins"
"Peptide" = "Sequence"
"z" = "Charge"

[condition_mapper]
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "A"
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "A"
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "A"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "B"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "B"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "B"

[run_mapper]
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.raw Normalized Area" = "Condition_A_Sample_Alpha_01"
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.raw Normalized Area" = "Condition_A_Sample_Alpha_02"
"LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.raw Normalized Area" = "Condition_A_Sample_Alpha_03"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.raw Normalized Area" = "Condition_B_Sample_Alpha_01"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.raw Normalized Area" = "Condition_B_Sample_Alpha_02"
"LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.raw Normalized Area" = "Condition_B_Sample_Alpha_03"

[species_mapper]
"_YEAST" = "YEAST"
"_ECOLI" = "ECOLI"
"_HUMAN" = "HUMAN"

[modifications_parser]
"parse_column" = "Sequence"
"before_aa" = false
"isalpha" = true
"isupper" = true
"pattern"="(?<=\\().+?(?=\\))"
"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}

[general]
"contaminant_flag" = "Cont_"
"decoy_flag" = false
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
[mapper]
"Accession" = "Proteins"
"Peptide" = "Sequence"

[condition_mapper]
"Area Sample 1" = "A"
"Area Sample 2" = "A"
"Area Sample 3" = "A"
"Area Sample 4" = "B"
"Area Sample 5" = "B"
"Area Sample 6" = "B"

[run_mapper]
"Area Sample 1" = "Condition_A_Sample_Alpha_01"
"Area Sample 2" = "Condition_A_Sample_Alpha_02"
"Area Sample 3" = "Condition_A_Sample_Alpha_03"
"Area Sample 4" = "Condition_B_Sample_Alpha_01"
"Area Sample 5" = "Condition_B_Sample_Alpha_02"
"Area Sample 6" = "Condition_B_Sample_Alpha_03"

[species_mapper]
"_YEAST" = "YEAST"
"_ECOLI" = "ECOLI"
"_HUMAN" = "HUMAN"

[modifications_parser]
"parse_column" = "Sequence"
"before_aa" = false
"isalpha" = true
"isupper" = true
"pattern"="(?<=\\().+?(?=\\))"
"modification_dict" = {"+57.02" = "Carbamidomethyl", "+15.99" = "Oxidation", "-17.026548" = "Gln->pyro-Glu", "-18.010565" = "Glu->pyro-Glu", "+42.01" = "Acetyl"}

[general]
"contaminant_flag" = "Cont_"
"decoy_flag" = false
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@
"ProlineStudio" = "parse_settings_proline.toml"
"MSAngel" = "parse_settings_msangel.toml"
"Sage" = "parse_settings_sage.toml"
"PEAKS" = "parse_settings_peaks.toml"
"Custom" = "parse_settings_custom.toml"

[quant_lfq_peptidoform_DDA]
"WOMBAT" = "parse_settings_wombat.toml"
"PEAKS" = "parse_settings_peaks.toml"
"Proteome Discoverer" = "parse_settings_proteomediscoverer.toml"
"Custom" = "parse_settings_custom.toml"

Expand All @@ -21,6 +23,7 @@
"Spectronaut" = "parse_settings_spectronaut.toml"
"AlphaDIA" = "parse_settings_alphadia.toml"
"MSAID" = "parse_settings_msaid.toml"
"PEAKS" = "parse_settings_peaks.toml"
"Custom" = "parse_settings_custom.toml"

[quant_lfq_ion_DIA_diaPASEF]
Expand Down
2 changes: 2 additions & 0 deletions proteobench/io/parsing/parse_ion.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame:
input_data_frame["PG.ProteinGroups"] = input_data_frame["PG.ProteinGroups"].str.join(";")
elif input_format == "MSAID":
input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t")
elif input_format == "PEAKS":
input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",")

return input_data_frame

Expand Down
2 changes: 2 additions & 0 deletions proteobench/io/parsing/parse_peptidoform.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ def load_input_file(input_csv: str, input_format: str) -> pd.DataFrame:
elif input_format == "Custom":
input_data_frame = pd.read_csv(input_csv, low_memory=False, sep="\t")
input_data_frame["proforma"] = input_data_frame["Modified sequence"]
elif input_format == "PEAKS":
input_data_frame = pd.read_csv(input_csv, low_memory=False, sep=",")

return input_data_frame

Expand Down
1 change: 1 addition & 0 deletions proteobench/plotting/plot_quant.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ def plot_metric(
"FragPipe (DIA-NN quant)": "#ff7f00",
"MSAID": "#afff57",
"Proteome Discoverer": "#8c564b",
"PEAKS": "#f781bf",
},
mapping: Dict[str, int] = {"old": 10, "new": 20},
highlight_color: str = "#d30067",
Expand Down

0 comments on commit 7982fae

Please sign in to comment.