Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New biobb_pytorch Molecular dynamics autoencoder wrapper #173

Merged
merged 20 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions tools/biobb_pytorch/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: biobb_pytorch
owner: "Biobb team"
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
description: "biobb_pytorch is the Biobb module collection to create and train ML & DL models using the popular [PyTorch](https://pytorch.org/) Python library."
homepage_url: https://github.com/bioexcel/biobb_pytorch
long_description: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you convert this file into a suite-based file? https://github.com/galaxyproject/tools-iuc/blob/main/tools/semibin/.shed.yml

biobb_pytorch is the Biobb module collection to create and train ML & DL models using the popular [PyTorch](https://pytorch.org/) Python library.
Biobb (BioExcel building blocks) packages are Python building blocks that
create new layer of compatibility and interoperability over popular
bioinformatics tools.
The latest documentation of this package can be found in our readthedocs site:
[latest API documentation](http://biobb-pytorch.readthedocs.io/en/latest/).
remote_repository_url: https://github.com/galaxycomputationalchemistry/galaxy-tools-compchem/tree/master/tools/biobb_pytorch
type: unrestricted
categories:
- Molecular Dynamics
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
- Computational chemistry
- Machine Learning
- Deep Learning
- PyTorch
- Biobb
- Autoencoders
maintainers:
bgruening marked this conversation as resolved.
Show resolved Hide resolved
- Pau Andrio
- Genis Bayarri
- Adam Hospital
92 changes: 92 additions & 0 deletions tools/biobb_pytorch/biobb_apply_mdae.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<tool id="biobb_pytorch_apply_mdae" name="ApplyMdae" version="4.2.1" >
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
<description>Apply a Molecular Dynamics AutoEncoder (MDAE) PyTorch model.</description>

<requirements>
<requirement type="package" version="4.2.1">biobb_pytorch</requirement>
</requirements>

<command detect_errors="exit_code"><![CDATA[

ln -s $input_data_npy_path ./input_data_npy_path.$input_data_npy_path.ext;
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
ln -s $input_model_pth_path ./input_model_pth_path.$input_model_pth_path.ext;
#if str($config_json) != 'None':
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
ln -s $config_json ./config_json.$config_json.ext;
#end if

apply_mdae

#if str($config_json) != 'None':
--config ./config_json.$config_json.ext
#end if

--input_data_npy_path ./input_data_npy_path.$input_data_npy_path.ext
--input_model_pth_path ./input_model_pth_path.$input_model_pth_path.ext
--output_reconstructed_data_npy_path $outname_output_reconstructed_data_npy_path
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
#if str($outname_output_latent_space_npy_path) != 'None':
--output_latent_space_npy_path $outname_output_latent_space_npy_path
#end if
;

if test -f $outname_output_reconstructed_data_npy_path; then mv $outname_output_reconstructed_data_npy_path $output_reconstructed_data_npy_path; fi;
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
if test -f $outname_output_latent_space_npy_path; then mv $outname_output_latent_space_npy_path $output_latent_space_npy_path; fi;

]]>
</command>

<inputs>
<param name="input_data_npy_path" type="data" format="npy" optional="False" label="input NPY file" help="Path to the input data file. Format: [input].npy"/>
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
<param name="input_model_pth_path" type="data" format="pth" optional="False" label="input PTH file" help="Path to the input model file. Format: [input].pth"/>
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
<param name="outname_output_reconstructed_data_npy_path" type="text" value="myapply_mdae.npy" optional="False" label="output NPY name" help="Path to the output reconstructed data file Format: [output].npy "/>
<param name="outname_output_latent_space_npy_path" type="text" value="myapply_mdae.npy" optional="True" label="output NPY name" help="Path to the reduced dimensionality file Format: [output].npy "/>
<param name="config_json" type="data" format="json" optional="True" label="Configuration file" help="File containing tool settings. See below for the syntax"/>
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
</inputs>

<outputs>
<data name="output_reconstructed_data_npy_path" format="npy" />
<data name="output_latent_space_npy_path" format="npy" />
</outputs>

<tests>
<test>
<param name="config_json" value="config_apply_mdae.json" ftype="json" />
<param name="input_data_npy_path" value="train_mdae_traj.npy" ftype="npy" />
<param name="input_model_pth_path" value="ref_output_model.pth" />
<param name="outname_output_reconstructed_data_npy_path" value="output_reconstructed_data.npy" />
<param name="outname_output_latent_space_npy_path" value="output_latent_space.npy" />
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
<output name="output_reconstructed_data_npy_path" file="ref_output_reconstructed_data.npy" compare="sim_size" />
<output name="output_latent_space_npy_path" file="ref_output_latent_space.npy" compare="sim_size" />
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
</test>
</tests>

<help>
.. class:: infomark

Check the syntax for the tool parameters at the original library documentation: https://biobb_pytorch.readthedocs.io/en/latest
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved

-----

.. image:: http://mmb.irbbarcelona.org/biobb/assets/layouts/layout3/img/logo.png
:width: 150

**https://mmb.irbbarcelona.org/biobb**

.. image:: https://bioexcel.eu/wp-content/uploads/2019/08/Bioexcel_logo_no_subheading_660px.png
:width: 150

**https://bioexcel.eu**
</help>

<citations>
<citation type="bibtex">
@misc{githubbiobb,
author = {Andrio P, Bayarri, G., Hospital A, Gelpi JL},
year = {2019-21},
title = {biobb: BioExcel building blocks },
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/bioexcel/biobb_pytorch},
}
</citation>
<citation type="doi">10.1038/s41597-019-0177-4</citation>
</citations>
</tool>
102 changes: 102 additions & 0 deletions tools/biobb_pytorch/biobb_train_mdae.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
<tool id="biobb_pytorch_train_mdae" name="TrainMdae" version="4.2.1" >
<description>: Train a Molecular Dynamics AutoEncoder (MDAE) PyTorch model.</description>
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved

<requirements>
<requirement type="package" version="4.2.1">biobb_pytorch</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[

ln -s $input_train_npy_path ./input_train_npy_path.$input_train_npy_path.ext;
PauAndrio marked this conversation as resolved.
Show resolved Hide resolved
#if str($input_model_pth_path) != 'None':
ln -s $input_model_pth_path ./input_model_pth_path.$input_model_pth_path.ext;
#end if
#if str($config_json) != 'None':
ln -s $config_json ./config_json.$config_json.ext;
#end if

train_mdae

#if str($config_json) != 'None':
--config ./config_json.$config_json.ext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--config ./config_json.$config_json.ext
--config ./$train_config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.

#end if

--input_train_npy_path ./input_train_npy_path.$input_train_npy_path.ext
#if str($input_model_pth_path) != 'None':
--input_model_pth_path ./input_model_pth_path.$input_model_pth_path.ext
#end if
--output_model_pth_path $outname_output_model_pth_path
#if str($outname_output_train_data_npz_path) != 'None':
--output_train_data_npz_path $outname_output_train_data_npz_path
#end if
#if str($outname_output_performance_npz_path) != 'None':
--output_performance_npz_path $outname_output_performance_npz_path
#end if
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
;


if test -f $outname_output_model_pth_path; then mv $outname_output_model_pth_path $output_model_pth_path; fi;
if test -f $outname_output_train_data_npz_path; then mv $outname_output_train_data_npz_path $output_train_data_npz_path; fi;
if test -f $outname_output_performance_npz_path; then mv $outname_output_performance_npz_path $output_performance_npz_path; fi;

]]>
</command>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<configfiles>
<configfile name="train_config">
{
"properties": {
"num_epochs": $num_epoch,
"seed": $seed
}
}
</configfile>
</configfiles>

This way you can create those configfiles on the fly and ask your users for the inputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s something we’re planning for a future release, where the most relevant properties will be integrated into Galaxy's UI through sliders, multi-select options, number validators, filters, etc., making the configuration process more user-friendly. However, for now, I’d like to keep things as simple as possible and focus on getting my first tool published in the Galaxy Toolshed.

<inputs>
<param name="input_train_npy_path" type="data" format="npy" optional="False" label="input NPY file" help="Path to the input train data file. Format: [input].npy"/>
<param name="input_model_pth_path" type="data" format="pth" optional="True" label="input PTH file" help="Path to the input model file. Format: [input].pth"/>
<param name="outname_output_model_pth_path" type="text" value="mytrain_mdae.pth" optional="False" label="output PTH name" help="Path to the output model file Format: [output].pth "/>
<param name="outname_output_train_data_npz_path" type="text" value="mytrain_mdae.npz" optional="True" label="output train data NPZ name" help="Path to the output train data file Format: [output].npz "/>
<param name="outname_output_performance_npz_path" type="text" value="mytrain_mdae.npz" optional="True" label="output performance NPZ name" help="Path to the output performance file Format: [output].npz "/>
<param name="config_json" type="data" format="json" optional="True" label="Configuration file" help="File containing tool settings. See below for the syntax"/>
</inputs>

<outputs>
<data name="output_model_pth_path" />
<data name="output_train_data_npz_path" format="npz" />
<data name="output_performance_npz_path" format="npz" />
</outputs>

<tests>
<test>
<param name="config_json" value="config_train_mdae.json" ftype="json" />
<param name="input_train_npy_path" value="train_mdae_traj.npy" ftype="npy" />
<param name="outname_output_model_pth_path" value="output_model.pth" />
<param name="outname_output_train_data_npz_path" value="output_train_data.npz" />
<param name="outname_output_performance_npz_path" value="output_performance.npz" />
<output name="output_model_pth_path" file="ref_output_model.pth" compare="sim_size" />
<output name="output_train_data_npz_path" file="ref_output_train_data.npz" compare="sim_size" />
<output name="output_performance_npz_path" file="ref_output_performance.npz" compare="sim_size" />
</test>
</tests>

<help>
.. class:: infomark

Check the syntax for the tool parameters at the original library documentation: https://biobb_pytorch.readthedocs.io/en/latest

-----

.. image:: http://mmb.irbbarcelona.org/biobb/assets/layouts/layout3/img/logo.png
:width: 150

**https://mmb.irbbarcelona.org/biobb**

.. image:: https://bioexcel.eu/wp-content/uploads/2019/08/Bioexcel_logo_no_subheading_660px.png
:width: 150

**https://bioexcel.eu**
</help>

<citations>
<citation type="bibtex">
@misc{githubbiobb,
author = {Andrio P, Bayarri, G., Hospital A, Gelpi JL},
year = {2019-21},
title = {biobb: BioExcel building blocks },
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/bioexcel/biobb_pytorch},
}
</citation>
<citation type="doi">10.1038/s41597-019-0177-4</citation>
</citations>
</tool>
5 changes: 5 additions & 0 deletions tools/biobb_pytorch/test-data/config_apply_mdae.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"properties": {
"batch_size": 1
}
}
6 changes: 6 additions & 0 deletions tools/biobb_pytorch/test-data/config_train_mdae.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"properties": {
"num_epochs": 50,
"seed": 1
}
}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading