-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New biobb_pytorch Molecular dynamics autoencoder wrapper #173
New biobb_pytorch Molecular dynamics autoencoder wrapper #173
Conversation
Co-authored-by: Simon Bray <[email protected]>
Co-authored-by: Simon Bray <[email protected]>
Co-authored-by: Simon Bray <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
Co-authored-by: Björn Grüning <[email protected]>
Please have a look at the linting logs: https://github.com/galaxycomputationalchemistry/galaxy-tools-compchem/actions/runs/9906049158/job/27368207101?pr=173 planemo can also test this with |
|
||
<outputs> | ||
<data format="npy" name="output_reconstructed_data_npy_path" label="output_reconstructed_data_npy_path" /> | ||
<data format="npy" name="output_latent_space_npy_path" label="output_latent_space_npy_path" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this output is optional you can use a filter
here to only define this output when your boolean is true.
`#Required Outputs using file extension` | ||
touch '$output_reconstructed_data_npy_path' && | ||
ln -s '$output_reconstructed_data_npy_path' ./output_reconstructed_data_npy_path.$output_reconstructed_data_npy_path.ext && | ||
|
||
`#Optional Outputs using file extension` | ||
#if $output_latent_space_npy_path: | ||
touch '$output_latent_space_npy_path' && | ||
ln -s '$output_latent_space_npy_path' ./output_latent_space_npy_path.$output_latent_space_npy_path.ext && | ||
#end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`#Required Outputs using file extension` | |
touch '$output_reconstructed_data_npy_path' && | |
ln -s '$output_reconstructed_data_npy_path' ./output_reconstructed_data_npy_path.$output_reconstructed_data_npy_path.ext && | |
`#Optional Outputs using file extension` | |
#if $output_latent_space_npy_path: | |
touch '$output_latent_space_npy_path' && | |
ln -s '$output_latent_space_npy_path' ./output_latent_space_npy_path.$output_latent_space_npy_path.ext && | |
#end if |
This looks very complicated, I guess you can remove this and try something like below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. I removed the part you mentioned, used a literal in the command, and utilized the "from_work_dir" attribute. This has significantly simplified things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool glad you found that useful. And sorry this PR takes so long, my hope is that if we do this one properly the rest will be a piece of cake :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have also removed the comments inside the command section to avoid potential issues with quotes or other special characters in the future. Is there anything else that needs to be changed before we merge?
You could also, and that is the simplest way possible, just write in the command section: And than in your outputs do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea. Is that something that is interesting?
|
||
<inputs> | ||
<param name="input_data_npy_path" type="data" format="npy" optional="False" label="Input NPY file" help="Input data file"/> | ||
<param name="input_model_pth_path" type="data" format="pth" optional="False" label="Input PTH file" help="Input model file"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pickled objects are a bit dangerous, would it be possible to use ONNX https://pytorch.org/docs/stable/onnx.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping, any comments here?
Pickles contain arbitrary code and are a bad exchange format, as you need to trust them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While ONNX is a great choice for model export, as you suggested, and exporting models to ONNX is straightforward, there is currently no built-in functionality in PyTorch for importing ONNX models back into PyTorch. This has been a long-standing request from the PyTorch community since 2019, but as of now, there's no reliable or officially supported method for importing ONNX models back into PyTorch for further training or fine-tuning.
Given this limitation, switching entirely to ONNX would mean losing the ability to easily retrain, modify, or extend our models within the PyTorch ecosystem. For these reasons, despite the risks associated with pickling, we still favor the use of PyTorch’s torch.save() and torch.load() methods for model serialization. We're aware that these methods are not ideal from a security perspective, but they provide the necessary flexibility for ongoing model development.
Our intent developing this tools is to provide a way to create or improve very specialized models not to deploy them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, its a bit of a pain, I had hoped that https://github.com/Talmaj/onnx2pytorch is working for you.
<inputs> | ||
<param name="input_data_npy_path" type="data" format="npy" optional="False" label="Input NPY file" help="Input data file"/> | ||
<param name="input_model_pth_path" type="data" format="pth" optional="False" label="Input PTH file" help="Input model file"/> | ||
<param name="config_json" type="data" format="json" optional="True" label="Configuration file" help="File containing tool settings"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are users supposed to create such JSON files?
How many parameters do such files have in average? Is that 1-2 or more like 20-30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biobb_object has 38 configurable properties, and in addition, train_mdae introduces 16 specific properties, bringing the total to 54 potential parameters that users could modify in the JSON files. However, all of these properties come with default values, so users only need to tweak those that are relevant to their specific use case. We’ve put considerable effort into documenting these parameters and guiding users to focus on the most impactful ones, ensuring they can make adjustments as needed without being overwhelmed.
; | ||
]]> | ||
</command> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<configfiles> | |
<configfile name="train_config"> | |
{ | |
"properties": { | |
"num_epochs": $num_epoch, | |
"seed": $seed | |
} | |
} | |
</configfile> | |
</configfiles> |
This way you can create those configfiles on the fly and ask your users for the inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.
#if $config_json: | ||
ln -s '$config_json' ./config_json.$config_json.ext && | ||
#end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if $config_json: | |
ln -s '$config_json' ./config_json.$config_json.ext && | |
#end if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.
train_mdae | ||
|
||
#if $config_json: | ||
--config ./config_json.$config_json.ext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--config ./config_json.$config_json.ext | |
--config ./$train_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.
Co-authored-by: Björn Grüning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PauAndrio looks good to me. Just this one security issue with the python pickels.
Sorry I thought this was already answered: While ONNX is a great choice for model export, as you suggested, and exporting models to ONNX is straightforward, there is currently no built-in functionality in PyTorch for importing ONNX models back into PyTorch. This has been a long-standing request from the PyTorch community since 2019, but as of now, there's no reliable or officially supported method for importing ONNX models back into PyTorch for further training or fine-tuning. Given this limitation, switching entirely to ONNX would mean losing the ability to easily retrain, modify, or extend our models within the PyTorch ecosystem. For these reasons, despite the risks associated with pickling, we still favor the use of PyTorch’s torch.save() and torch.load() methods for model serialization. We're aware that these methods are not ideal from a security perspective, but they provide the necessary flexibility for ongoing model development. Our intent developing this tools is to provide a way to create or improve very specialized models not to deploy them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PauAndrio and sorry for the long turn-around.
Please change the shed.yml and then we merge.
|
||
<inputs> | ||
<param name="input_data_npy_path" type="data" format="npy" optional="False" label="Input NPY file" help="Input data file"/> | ||
<param name="input_model_pth_path" type="data" format="pth" optional="False" label="Input PTH file" help="Input model file"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, its a bit of a pain, I had hoped that https://github.com/Talmaj/onnx2pytorch is working for you.
owner: chemteam | ||
description: "biobb_pytorch is the Biobb module collection to create and train ML & DL models using the popular [PyTorch](https://pytorch.org/) Python library." | ||
homepage_url: https://github.com/bioexcel/biobb_pytorch | ||
long_description: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you convert this file into a suite-based file? https://github.com/galaxyproject/tools-iuc/blob/main/tools/semibin/.shed.yml
#if $output_performance_npz_path: | ||
--output_performance_npz_path ./output_performance_npz_path.npz | ||
#end if | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; |
Hi, @bgruening, Please, let me know if there is anything else that I have to do to resolve and merge this pull request? Regards, |
Let's get this in. Thanks a lot @PauAndrio! |
891dd7d
into
galaxycomputationalchemistry:master
Wrappers for the new collection of blocks to train and apply autoencoder models to molecular dynamics data.