initial config for partial conv in JET of esm2 in bionemo2 #428

dorotat-nv · 2024-11-13T15:39:14Z

Adding the first config of partial convergence training in JET for ESM2 model in BioNeMo 2

The JET pipeline can be submitted for this config by running

python jet/cli.py submit --root-dir ci/benchmarks/partial-conv/ --manifest-template-config-file <PATH_TO_CI_REPO>/jet/manifest_template_config.yaml

where PATH_TO_CI_REPO is an absolute path to the CI repository storing JET CLI tools: https://gitlab-master.nvidia.com/clara-discovery/bionemo-github-ci/-/blob/master/jet/cli.py?ref_type=heads#L78

The PR it is still a draft. I need to adjust number of steps (due to 4h time limit and reducing nodes from 32 to 4). Also, I need to add test section.

In order to make the management of configs more robust, I will add a unit test in BioNeMo2 which checks if the training command can be executed.

skothenhill-nv · 2024-11-13T17:59:21Z

Did we decide to do this with the CLI/argparse entrypoints? This is doable with the pydantic interface as well.

pstjohn · 2024-11-13T19:20:59Z

ci/benchmarks/partial-conv/esm2_pretrain.yaml

+  max_steps:
+    value: 10000
+script: |-
+    export NVTE_FUSED_ATTN=1; export NVTE_FLASH_ATTN=0; python scripts/${variant}/${model}/${model}_${variant}.py \


Suggested change

export NVTE_FUSED_ATTN=1; export NVTE_FLASH_ATTN=0; python scripts/${variant}/${model}/${model}_${variant}.py \

python scripts/${variant}/${model}/${model}_${variant}.py \

@jstjohn was seeing issues with these ENV variables (at least with our version of CUDNN), so we've removed them for now

@dorotat-nv yes, also note that the script paths are pretty different now. I believe there's a CLI arg thats like train_esm2 https://github.com/NVIDIA/bionemo-framework/blob/main/sub-packages/bionemo-esm2/pyproject.toml#L24

We also have the new cli which in two steps creates a config and runs the config. Perhaps it would be good to save our JET config settings in a .json file and instead just run that: https://github.com/NVIDIA/bionemo-framework/blob/main/sub-packages/bionemo-esm2/pyproject.toml#L21 cc @skothenhill-nv

what's the difference between bionemo-esm2-train and train_esm2?

bionemo-esm2-train goes through the pydantic interface and shares a train method with Geneformer. We can do the same thing for inference.

In anycase I'd definitely prefer if we prioritized using the pydantic interface for things.

added initial config for partial conv of bionemo2

835fd20

dorotat-nv requested review from malcolmgreaves and pstjohn as code owners November 13, 2024 15:39

dorotat-nv requested review from jstjohn, sichu2023 and skothenhill-nv November 13, 2024 15:39

dorotat-nv changed the title ~~added initial config for partial conv in JET of esm2 in bionemo2~~ initial config for partial conv in JET of esm2 in bionemo2 Nov 13, 2024

pstjohn reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial config for partial conv in JET of esm2 in bionemo2 #428

initial config for partial conv in JET of esm2 in bionemo2 #428

dorotat-nv commented Nov 13, 2024 •

edited

Loading

skothenhill-nv commented Nov 13, 2024

pstjohn Nov 13, 2024

jstjohn Nov 13, 2024

pstjohn Nov 13, 2024

skothenhill-nv Nov 13, 2024

	export NVTE_FUSED_ATTN=1; export NVTE_FLASH_ATTN=0; python scripts/${variant}/${model}/${model}_${variant}.py \
	python scripts/${variant}/${model}/${model}_${variant}.py \

initial config for partial conv in JET of esm2 in bionemo2 #428

Are you sure you want to change the base?

initial config for partial conv in JET of esm2 in bionemo2 #428

Conversation

dorotat-nv commented Nov 13, 2024 • edited Loading

skothenhill-nv commented Nov 13, 2024

pstjohn Nov 13, 2024

Choose a reason for hiding this comment

jstjohn Nov 13, 2024

Choose a reason for hiding this comment

pstjohn Nov 13, 2024

Choose a reason for hiding this comment

skothenhill-nv Nov 13, 2024

Choose a reason for hiding this comment

dorotat-nv commented Nov 13, 2024 •

edited

Loading