Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PTM Dephosphorylation Predictation Tool #1525

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

haibkhn
Copy link

@haibkhn haibkhn commented Oct 16, 2024

This pull request introduces a new tool for predicting dephosphorylation. Key features include:

  • Two modes of operation:

    1. Manual hyperparameter selection
    2. Automated hyperparameter search using Optuna
  • Support for 3 protein language model variations:

    • ProtT5-XL-UniRef50
    • ESM
    • ProtT5-XL-BFD
  • Additional hyperparameter search option:

    • SMAC functionality included, but note that the latest SMAC version (2.2.0) is not yet updated in Anaconda

@anuprulez Please review these changes and let me know if any modifications are needed.

@anuprulez
Copy link
Contributor

@haibkhn thanks for the PR. I will have a look and provide my feedback

@anuprulez
Copy link
Contributor

Here are a few obvious comments to start with:

ping @haibkhn
Maybe @bgruening also has more comments. I will look into more details probably end of this week or next week.

Copy link
Owner

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a .shed.yml file

@@ -0,0 +1,339 @@
<tool id="hyperparameter_finetune" name="Hyperparameter Search for Finetuning model" version="1.0.0">
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="hyperparameter_finetune" name="Hyperparameter Search for Finetuning model" version="1.0.0">
<tool id="hyperparameter_finetune" name="Hyperparameter Search for Finetuning model" version="1.0.0" profile="23.0">

@anuprulez
Copy link
Contributor

I think while testing, the script tries to download the LLM (ProtT5-XL-UniRef50) having a size > 10 GB. It could be the reason CI/CD throws an error related to the memory

error_protrans

It could be possible to provide a remote link to the model but I think HuggingFace does not allow the model from remote. The models should be hosted at HuggingFace, correct? @haibkhn

Moving to a container-based tool might help but not sure. Can the HuggingFace table used with the Flux tool might help?
I am looking at #1496 where we store the names of HF models but not the models themselves

ping @bgruening @arash77

Thanks!

@arash77
Copy link
Contributor

arash77 commented Oct 22, 2024

I don't think that it is possible to test the tool completely with the models in GitHub if the model is big. unless there is a smaller version of the model available.
But for production, we can use two methods, one that I have used is to use HF_HOME environment variable to store all hugging face models or you can use cache_dir in from_pretrained function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants