Benchmarking of search engines with the ground truth of spatial proteomics datasets #404
JuliaS92
started this conversation in
Potential new module to discuss
Replies: 1 comment 3 replies
-
I would suggest to also provide a fasta file with the raw files so that everybody work with the same sequences (with the target sequences + contaminants). |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
aim of the new module
Benchmarks of search engines are crucial for selecting optimal tools in computational proteomics. Current benchmarks typically assess depth, coefficient of variation, and accuracy in mixed species experiments. While these mixed species benchmarks represent significant progress in software evaluation, they address only one of many use cases in proteomics and diverge from more common single-species experiments. Establishing a testable ground truth in real-life datasets remains challenging. However, spatial proteomics and SEC-MS experiments offer an inherent biochemical ground truth, as members of bona fide protein complexes exhibit near identical profiles. We propose leveraging this principle for software benchmarking, building upon previous work with dynamic organellar maps. By combining established measures with a carefully selected set of reference datasets, we aim to develop a comprehensive Proteobench module. This will provide an additional software benchmark that specifically addresses single-species performance and usability in profiling experiments, thereby enhancing the evaluation of proteomics software tools.
full description of the new module
DDA DOMS: https://www.ebi.ac.uk/pride/archive/projects/PXD034962
DIA DOMS: https://www.ebi.ac.uk/pride/archive/projects/PXD034971
In theory several datasets would be suitable to benchmark different quantification methods with the same concept.
Process the data and upload protein group quantifications
Protein groups files
See https://www.nature.com/articles/s41467-023-41000-7
The library can be configured for different data sources and can generate all metrics. One big question would be how to compare robustly between runs, as protein complex coverage can differ. DOM-ABC always requires all files to make the benchmark comparable. For this, a reduced amount of data would need to be stored for every run. Deciding how to deal with this is probably the biggest bottle neck.
Several are possible, I would suggest 3: profiled depth, complex scatter, reproducibility, as in Figure S1B https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-023-41000-7/MediaObjects/41467_2023_41000_MOESM1_ESM.pdf
potential reviewers
No response
Will you be able to work on the implementation (coding) yourself, with additional help from the ProteoBench maintainers?
any other information
This will hopefully be addressed at the EuBIC Developer meeting 2025 - see the related hackathon proposal here: EuBIC/EuBIC2025#9
Beta Was this translation helpful? Give feedback.
All reactions