-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update API and add support for small molecules #43
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #43 +/- ##
==========================================
+ Coverage 92.48% 96.13% +3.65%
==========================================
Files 22 24 +2
Lines 971 957 -14
==========================================
+ Hits 898 920 +22
+ Misses 73 37 -36 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Massive effort! In general I think it looks pretty good. I have some small suggestions and a few questions.
Thanks a lot @bittremieux! It turns out a missed a lot of documentation and you caught at least two potential bugs 🎉 Sorry the update has so many new changes - Ruff updated and apparently added another rule to the autoformatting. |
I'm going to go ahead and merge so I can build atop this. @bittremieux, if there's anything else please let me know and I'll address it in a fresh PR! |
This PR makes some pretty large API changes, but provide a lot more flexibility than before.
Changed
PeptideTransformer*
is nowAnalyteTransformer*
to reflect support for small molecules as well.PeptideDataset
was renamedAnalyteDataset
.forward()
methods for the Transformer modules have been updated to with additional keyword arguments. This enables BERT-style masking for training and such.global_token_hook()
method. This can be overwritten in a subclass to customize how a global token (the first element of the sequence) is created using the*args
and**kwargs
provided in the forward methods.Added
MoleculeTokenizer
.PeptideIonTokenizer.calculate_precursor_ions()
method is fully PyTorch, so it should be efficient for use during model training and inference. (see Optimize beam search Noble-Lab/casanovo#269)Next PR will be docs!