-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup of peptide parsing and annotation #61
Speedup of peptide parsing and annotation #61
Conversation
btw the tests that involve reading from USI are also breaking on master on my local system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. The fast-pass for unmodified peptides is a great addition, as well as annotating proteoforms directly. I have a few comments related to the code that should be relatively easy fixes.
I'm a bit less convinced about the simplified grammar though. This can be discussed in a bit more detail, see the respective comments.
@bittremieux added the suggestions, LMK what you think! |
This PR implements 4 main things, all with the purpose of improving speed of spectrum annotation workflows.
Benchmarks
Using some dummy peptide examples the speedup i see in the parsing is:
With mods
29.51it/s -> (baseline), greedy loading, no fastpass
137.54it/s -> + unmod fastpass, cached full parser (4x improve)
168.48it/s -> + simple parser (1.22x improve,~6x from baseline)
Without mods
34.18it/s -> (baseline) greedy loading, no fastpass
995089.92it/s -> + unmod fastpass, cached full parser (~ 30000x improve)
1081006.19it/s -> + simple parser (equivalent for practical purposes)
On a heavy annotation workflow I have these changes dropped the run time from 45 mins to 2.20 :P
LMK what you think!
Best