Update API and add support for small molecules #43

wfondrie · 2023-12-06T20:17:27Z

This PR makes some pretty large API changes, but provide a lot more flexibility than before.

Changed

PeptideTransformer* is now AnalyteTransformer* to reflect support for small molecules as well.
Similarly, PeptideDataset was renamed AnalyteDataset.
All the forward() methods for the Transformer modules have been updated to with additional keyword arguments. This enables BERT-style masking for training and such.
All transformer modules now have a global_token_hook() method. This can be overwritten in a subclass to customize how a global token (the first element of the sequence) is created using the *args and **kwargs provided in the forward methods.

Added

A new tokenizer for small molecules, MoleculeTokenizer.
The PeptideIonTokenizer.calculate_precursor_ions() method is fully PyTorch, so it should be efficient for use during model training and inference. (see Optimize beam search Noble-Lab/casanovo#269)

Next PR will be docs!

… transformers

codecov · 2023-12-06T22:53:48Z

Codecov Report

Attention: Patch coverage is 94.65241% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 96.13%. Comparing base (bd2861f) to head (38287c9).

Files	Patch %	Lines
depthcharge/tokenizers/peptides.py	87.09%	4 Missing ⚠️
depthcharge/tokenizers/tokenizer.py	81.25%	3 Missing ⚠️
depthcharge/transformers/analytes.py	95.65%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #43      +/-   ##
==========================================
+ Coverage   92.48%   96.13%   +3.65%     
==========================================
  Files          22       24       +2     
  Lines         971      957      -14     
==========================================
+ Hits          898      920      +22     
+ Misses         73       37      -36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bittremieux

Massive effort! In general I think it looks pretty good. I have some small suggestions and a few questions.

depthcharge/data/analyte_datasets.py

depthcharge/mixins.py

depthcharge/tokenizers/molecules.py

depthcharge/transformers/analytes.py

wfondrie · 2024-03-07T21:17:43Z

Thanks a lot @bittremieux! It turns out a missed a lot of documentation and you caught at least two potential bugs 🎉

Sorry the update has so many new changes - Ruff updated and apparently added another rule to the autoformatting.

wfondrie · 2024-03-07T21:22:44Z

I'm going to go ahead and merge so I can build atop this. @bittremieux, if there's anything else please let me know and I'll address it in a fresh PR!

wfondrie added 4 commits December 6, 2023 10:26

Add small molecule support and update peptide transformers to analyte…

0d7dafc

… transformers

Merge branch 'main' into peptide-update

58a3f63

Various fixes and new precursor calculation

3840108

Finished tests and fixed some bugs

92663f4

wfondrie added 4 commits December 6, 2023 16:32

Revert spliting peptide datasets

88eb8b6

Fix bugs and improve test coverage

025119e

Make missing residues an error

953cc3c

Add molecule tests

94c6413

wfondrie requested a review from bittremieux December 9, 2023 00:40

wfondrie marked this pull request as ready for review December 9, 2023 00:48

wfondrie linked an issue Dec 11, 2023 that may be closed by this pull request

Update Peptide Transformer API #40

Closed

wfondrie mentioned this pull request Dec 11, 2023

Example code to read .mgf file #44

Closed

wfondrie added 4 commits January 17, 2024 14:11

Added customizable start and stop tokens

1531088

Allow add stop and start even if no token exists

09ed5c0

Update test

b98c798

Add embed method

b2a43e9

bittremieux requested changes Feb 16, 2024

View reviewed changes

wfondrie added 6 commits March 6, 2024 17:00

Start making Wout's edits

32fa6ba

Most of Wout's edits done

230d60a

Final fixes

30def80

Fix formatting errors

d40d0c9

Bump pre-commit versions

efa9d58

Ruff format update

38287c9

wfondrie requested a review from bittremieux March 7, 2024 21:20

wfondrie merged commit b1f25ce into main Mar 7, 2024
6 checks passed

wfondrie deleted the peptide-update branch March 7, 2024 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update API and add support for small molecules #43

Update API and add support for small molecules #43

wfondrie commented Dec 6, 2023 •

edited

Loading

codecov bot commented Dec 6, 2023 •

edited

Loading

bittremieux left a comment

wfondrie commented Mar 7, 2024

wfondrie commented Mar 7, 2024

Update API and add support for small molecules #43

Update API and add support for small molecules #43

Conversation

wfondrie commented Dec 6, 2023 • edited Loading

Changed

Added

codecov bot commented Dec 6, 2023 • edited Loading

Codecov Report

bittremieux left a comment

Choose a reason for hiding this comment

wfondrie commented Mar 7, 2024

wfondrie commented Mar 7, 2024

wfondrie commented Dec 6, 2023 •

edited

Loading

codecov bot commented Dec 6, 2023 •

edited

Loading