Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example code to read .mgf file #44

Closed
GuptaVishu2002 opened this issue Dec 10, 2023 · 6 comments
Closed

Example code to read .mgf file #44

GuptaVishu2002 opened this issue Dec 10, 2023 · 6 comments

Comments

@GuptaVishu2002
Copy link

GuptaVishu2002 commented Dec 10, 2023

Hi, I would like to know how to read and preprocess a .mgf file using the package. Can you please help me by providing an example code for that, which can then be used to pass on other package functions such as Encoder and Transformer? Thank You

@bittremieux
Copy link
Collaborator

You can read and parse spectra from MGF files using the Dataset functionality. It's as straightforward as this:

from depthcharge.data. import pectrumDataset

dataset = SpectrumDataset("my_file.mgf", "my_file.lance")

This can then be used as any general PyTorch dataset to provide to your model for training, validation, or testing. How you do this specifically depends on how you use PyTorch, Lightning, etc.

Note that the API is currently in heavy development, so there are some breaking changes between various DepthCharge versions. The Lance integration is included in the development version if you install from GitHub, but not in the latest release on PyPI yet.

@wfondrie
Copy link
Owner

Hi @GuptaVishu2002 - I'm planing the next release for after #43 is reviewed and merged and I'm working on documentation this week. Stay tuned!

@GuptaVishu2002
Copy link
Author

Hi @bittremieux , @wfondrie - thank you very much for the reply. Looking forward to the updates.

@GuptaVishu2002
Copy link
Author

GuptaVishu2002 commented Jan 23, 2024

Hi @bittremieux @wfondrie, I hope you are doing well. Would it be possible for you to give a sample code on the recommended way to incorporate arbitrary information (such as precursor_mz, precursor_charge) into the spectrum representation for the transformer (via subclassing of SpectrumTransformerEncoder class and overwriting the precursor_hook() method)? Thank You.

@wfondrie
Copy link
Owner

Hi @GuptaVishu2002 - sorry for the delay! We're still trying to merge a major PR, then I'll get cracking on refreshed and more detailed documentation. Thanks.

For now, the best place to learn how to use the precursor is to look at the unit tests:

def test_precursor_hook(batch):
"""Test that the hook works."""
class MyEncoder(SpectrumTransformerEncoder):
"""A silly class."""
def precursor_hook(self, mz_array, intensity_array, **kwargs):
"""A silly hook."""
return kwargs["charge"].expand(self.d_model, -1).T
model1 = MyEncoder(8, 1, 12)
emb1, mask1 = model1(**batch)
assert emb1.shape == (2, 4, 8)
assert mask1.sum() == 1
model2 = SpectrumTransformerEncoder(8, 1, 12)
emb2, mask2 = model2(**batch)
assert emb2.shape == (2, 4, 8)
assert mask2.sum() == 1
for elem in zip(emb1.flatten(), emb2.flatten()):
if elem:
assert elem[0] != elem[1]

@wfondrie
Copy link
Owner

I haven't specifically added how to read an MGF file, but I just added documentation in #47 about working with mass spec data in general. Have a look and let me know if you have other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants