-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optional ms2rescore step #362
Conversation
|
Really cool! Have you tried allowing mokapot as rescoring engine as well? |
mokapot was also allowed. |
@daichengxin can you add here some of the details of the current benchmark? |
Hi all, we did some benchmark for PXD006675 and immunopeptide dataset PXD020620 after adding rescore steps. I tested 4 strategies. search engine is comet
The statistics values of figures are from percolator output dir. The figure 1 is from PXD006675. The legend Soure data of the figure: For PXD020620 (A small immunopeptide dataset). In addition, I also tested it on a standard benchmark dataset PXD001819. Total six conditions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
I think one key thing is missing: You need to parse the feature_names.tsv properly from ms2rescore to psmfeatureextractor. I think currently they are not parsed properly and Percolator will just ignore them
And don't forget to add the license in the ms2rescore python script ;-)
Co-authored-by: Jonas Scheid <[email protected]>
Co-authored-by: Jonas Scheid <[email protected]>
Great stuff! That is what we see as well 😎 |
|
Not enough PSM to accurately model the data in minimal example(<50 PSM). Readjusted pre-filter levels |
And no big improvement for rescoring over all files (vs. per run) right? Good for speed/parallel processing. |
Yes. No major differences between rescoring for independent msrun and the entire experiment in these benchmarking datasets. |
This is quite a big surprise. I think it was @timosachsenberg the first one who mention to me that he did some benchmarks in the past with percolator at the msrun level and the entire experiment and never saw big differences. |
Ideally we should check at some point on a ground truth dataset that both rescoring or rescoring per ms run does not underestimate the FDR. |
We are planning to do that, I will try to coordinate that also with @jonasscheid who has been working on ms2rescore for a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍🏼
When mokapot is used, ms2rescore dervies some nice QC plots investigating the FDR assumption @jpfeuffer. That could be a start.
Actually, this trigger a nice discussion @daichengxin Did you tested if it will work everything with pmultiqc. |
nice work guys! |
PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).