Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API and add support for small molecules #43

Merged
merged 18 commits into from
Mar 7, 2024
Merged

Conversation

wfondrie
Copy link
Owner

@wfondrie wfondrie commented Dec 6, 2023

This PR makes some pretty large API changes, but provide a lot more flexibility than before.

Changed

  • PeptideTransformer* is now AnalyteTransformer* to reflect support for small molecules as well.
  • Similarly, PeptideDataset was renamed AnalyteDataset.
  • All the forward() methods for the Transformer modules have been updated to with additional keyword arguments. This enables BERT-style masking for training and such.
  • All transformer modules now have a global_token_hook() method. This can be overwritten in a subclass to customize how a global token (the first element of the sequence) is created using the *args and **kwargs provided in the forward methods.

Added

  • A new tokenizer for small molecules, MoleculeTokenizer.
  • The PeptideIonTokenizer.calculate_precursor_ions() method is fully PyTorch, so it should be efficient for use during model training and inference. (see Optimize beam search Noble-Lab/casanovo#269)

Next PR will be docs!

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: Patch coverage is 94.65241% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 96.13%. Comparing base (bd2861f) to head (38287c9).

Files Patch % Lines
depthcharge/tokenizers/peptides.py 87.09% 4 Missing ⚠️
depthcharge/tokenizers/tokenizer.py 81.25% 3 Missing ⚠️
depthcharge/transformers/analytes.py 95.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #43      +/-   ##
==========================================
+ Coverage   92.48%   96.13%   +3.65%     
==========================================
  Files          22       24       +2     
  Lines         971      957      -14     
==========================================
+ Hits          898      920      +22     
+ Misses         73       37      -36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wfondrie wfondrie requested a review from bittremieux December 9, 2023 00:40
@wfondrie wfondrie marked this pull request as ready for review December 9, 2023 00:48
@wfondrie wfondrie linked an issue Dec 11, 2023 that may be closed by this pull request
Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive effort! In general I think it looks pretty good. I have some small suggestions and a few questions.

depthcharge/data/analyte_datasets.py Outdated Show resolved Hide resolved
depthcharge/mixins.py Outdated Show resolved Hide resolved
depthcharge/tokenizers/molecules.py Outdated Show resolved Hide resolved
depthcharge/tokenizers/molecules.py Outdated Show resolved Hide resolved
depthcharge/tokenizers/molecules.py Outdated Show resolved Hide resolved
depthcharge/transformers/analytes.py Show resolved Hide resolved
depthcharge/transformers/analytes.py Outdated Show resolved Hide resolved
depthcharge/transformers/analytes.py Outdated Show resolved Hide resolved
depthcharge/transformers/analytes.py Outdated Show resolved Hide resolved
depthcharge/transformers/analytes.py Outdated Show resolved Hide resolved
@wfondrie
Copy link
Owner Author

wfondrie commented Mar 7, 2024

Thanks a lot @bittremieux! It turns out a missed a lot of documentation and you caught at least two potential bugs 🎉

Sorry the update has so many new changes - Ruff updated and apparently added another rule to the autoformatting.

@wfondrie wfondrie requested a review from bittremieux March 7, 2024 21:20
@wfondrie
Copy link
Owner Author

wfondrie commented Mar 7, 2024

I'm going to go ahead and merge so I can build atop this. @bittremieux, if there's anything else please let me know and I'll address it in a fresh PR!

@wfondrie wfondrie merged commit b1f25ce into main Mar 7, 2024
6 checks passed
@wfondrie wfondrie deleted the peptide-update branch March 7, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Peptide Transformer API
2 participants