Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add optional ms2rescore step #362

Merged
merged 23 commits into from
May 7, 2024
Merged

add optional ms2rescore step #362

merged 23 commits into from
May 7, 2024

Conversation

daichengxin
Copy link
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/quantms branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Mar 24, 2024

nf-core lint overall result: Passed ✅

Posted for pipeline commit bae2e3e

+| ✅ 286 tests passed       |+
#| ❔   4 tests were ignored |#

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-05-07 03:23:28

@jpfeuffer
Copy link
Collaborator

Really cool! Have you tried allowing mokapot as rescoring engine as well?
Currently seems to be unsupported since the mokapot option is missing in the "posterior probabilities" parameter.

@daichengxin
Copy link
Collaborator Author

Really cool! Have you tried allowing mokapot as rescoring engine as well? Currently seems to be unsupported since the mokapot option is missing in the "posterior probabilities" parameter.

mokapot was also allowed.

@ypriverol
Copy link
Member

@daichengxin can you add here some of the details of the current benchmark?

@daichengxin
Copy link
Collaborator Author

daichengxin commented Apr 28, 2024

Hi all, we did some benchmark for PXD006675 and immunopeptide dataset PXD020620 after adding rescore steps. I tested 4 strategies. search engine is comet

  1. percolator for individual run without extra feature
  2. percolator for all experiments without extra feature
  3. percolator for all experiments with ms2pip feature
  4. percolator for all experiments with deeplc and ms2pip feature

The statistics values of figures are from percolator output dir. The figure 1 is from PXD006675. The legend NoMs2rescore_percolator_individual_run is strategy 1, legend NoMs2rescore_percolator_exp_all is strategy 2, legend percolator_ms2pip_all_exp is strategy 3 and ms2rescore_all_exp is strategy 4. There was no significant difference in re-scoring between the entire experiment and individual runs at 0.01 FDR. But PSM decreased by 0.6% after re-scoring with ms2pip and deeplc feature for whole experiments compared to without extra feature. Then PSM increased by 0.4% after re-scoring without ms2pip and deeplc feature for whole experiments at 0.05 FDR compared to individual run. But PSM decreased by 0.6% after re-scoring with ms2pip and deeplc feature for whole experiments compared to without extra feature.

PXD006675_ms2rescore_comet

Soure data of the figure:
PXD006675_ms2rescore_pc_comet.csv

For PXD020620 (A small immunopeptide dataset).
There was no significant difference in re-scoring with/without extra feature between the entire experiment and individual runs. One reason could be that the dataset is too small. Trying to test on PXD019643 larger immunopeptide dataset.
PXD020620_comet_ms2rescore_exp_all

In addition, I also tested it on a standard benchmark dataset PXD001819. Total six conditions:

  1. comet and sage are used as search engines. including no ms2rescoring , rescoring with ms2pip feature and rescoring with ms2pip and deeplc feature for individual run.
  2. comet is used as search engine. including no ms2rescoring , rescoring with ms2pip feature and rescoring with ms2pip and deeplc feature for individual run.
    Weird, the number of identified PSM is basically the same.

ms2rescore

comet_ms2rescore

Copy link
Contributor

@jonasscheid jonasscheid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!
I think one key thing is missing: You need to parse the feature_names.tsv properly from ms2rescore to psmfeatureextractor. I think currently they are not parsed properly and Percolator will just ignore them
And don't forget to add the license in the ms2rescore python script ;-)

bin/ms2rescore_cli.py Show resolved Hide resolved
conf/modules.config Show resolved Hide resolved
workflows/quantms.nf Show resolved Hide resolved
subworkflows/local/dda_id.nf Outdated Show resolved Hide resolved
subworkflows/local/dda_id.nf Outdated Show resolved Hide resolved
subworkflows/local/dda_id.nf Show resolved Hide resolved
@daichengxin
Copy link
Collaborator Author

Thanks a lot. It works and boosts peptides identification.
PXD006675: PXD006675_ms2rescore_comet

PXD020620:
PXD020620_comet_ms2rescore_exp_all

@ypriverol ypriverol requested a review from jonasscheid May 6, 2024 14:38
@jonasscheid
Copy link
Contributor

Great stuff! That is what we see as well 😎

@jonasscheid
Copy link
Contributor

LuciPhor2 was not able to provide an output. Please set debug >= 4 for additional information.
Error: File not found (the file 'SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep2_consensus_fdr_filter_pep_luciphor.idXML' could not be found)
What's going wrong with the failing tests here?

@daichengxin
Copy link
Collaborator Author

LuciPhor2 was not able to provide an output. Please set debug >= 4 for additional information.
Error: File not found (the file 'SF_200217_pPeptideLibrary_pool1_HCDnlETcaD_OT_rep2_consensus_fdr_filter_pep_luciphor.idXML' could not be found)
What's going wrong with the failing tests here?

Not enough PSM to accurately model the data in minimal example(<50 PSM). Readjusted pre-filter levels

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented May 7, 2024

And no big improvement for rescoring over all files (vs. per run) right? Good for speed/parallel processing.
Very well done guys.

@daichengxin
Copy link
Collaborator Author

And no big improvement for rescoring over all files (vs. per run) right? Good for speed/parallel processing. Very well done guys.

Yes. No major differences between rescoring for independent msrun and the entire experiment in these benchmarking datasets.

@ypriverol
Copy link
Member

And no big improvement for rescoring over all files (vs. per run) right? Good for speed/parallel processing. Very well done guys.

Yes. No major differences between rescoring for independent msrun and the entire experiment in these benchmarking datasets.

This is quite a big surprise. I think it was @timosachsenberg the first one who mention to me that he did some benchmarks in the past with percolator at the msrun level and the entire experiment and never saw big differences.

@jpfeuffer
Copy link
Collaborator

Ideally we should check at some point on a ground truth dataset that both rescoring or rescoring per ms run does not underestimate the FDR.

@ypriverol
Copy link
Member

We are planning to do that, I will try to coordinate that also with @jonasscheid who has been working on ms2rescore for a while.

Copy link
Contributor

@jonasscheid jonasscheid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏼
When mokapot is used, ms2rescore dervies some nice QC plots investigating the FDR assumption @jpfeuffer. That could be a start.

@ypriverol
Copy link
Member

Actually, this trigger a nice discussion @daichengxin Did you tested if it will work everything with pmultiqc.

@ypriverol ypriverol merged commit e3c95f0 into bigbio:dev May 7, 2024
15 checks passed
@timosachsenberg
Copy link

nice work guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants