-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial config for partial conv in JET of esm2 in bionemo2 #428
base: main
Are you sure you want to change the base?
Conversation
Did we decide to do this with the CLI/argparse entrypoints? This is doable with the pydantic interface as well. |
max_steps: | ||
value: 10000 | ||
script: |- | ||
export NVTE_FUSED_ATTN=1; export NVTE_FLASH_ATTN=0; python scripts/${variant}/${model}/${model}_${variant}.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export NVTE_FUSED_ATTN=1; export NVTE_FLASH_ATTN=0; python scripts/${variant}/${model}/${model}_${variant}.py \ | |
python scripts/${variant}/${model}/${model}_${variant}.py \ |
@jstjohn was seeing issues with these ENV variables (at least with our version of CUDNN), so we've removed them for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dorotat-nv yes, also note that the script paths are pretty different now. I believe there's a CLI arg thats like train_esm2
https://github.com/NVIDIA/bionemo-framework/blob/main/sub-packages/bionemo-esm2/pyproject.toml#L24
We also have the new cli
which in two steps creates a config and runs the config. Perhaps it would be good to save our JET config settings in a .json
file and instead just run that: https://github.com/NVIDIA/bionemo-framework/blob/main/sub-packages/bionemo-esm2/pyproject.toml#L21 cc @skothenhill-nv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between bionemo-esm2-train
and train_esm2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bionemo-esm2-train
goes through the pydantic interface and shares a train
method with Geneformer. We can do the same thing for inference.
In anycase I'd definitely prefer if we prioritized using the pydantic interface for things.
Adding the first config of partial convergence training in JET for ESM2 model in BioNeMo 2
The JET pipeline can be submitted for this config by running
python jet/cli.py submit --root-dir ci/benchmarks/partial-conv/ --manifest-template-config-file <PATH_TO_CI_REPO>/jet/manifest_template_config.yaml
where PATH_TO_CI_REPO is an absolute path to the CI repository storing JET CLI tools: https://gitlab-master.nvidia.com/clara-discovery/bionemo-github-ci/-/blob/master/jet/cli.py?ref_type=heads#L78
The PR it is still a draft. I need to adjust number of steps (due to 4h time limit and reducing nodes from 32 to 4). Also, I need to add test section.
In order to make the management of configs more robust, I will add a unit test in BioNeMo2 which checks if the training command can be executed.