llm performance scripts #11736

malay-nagda · 2025-01-02T09:14:15Z

What does this PR do ?

Adds scripts for llm pre-training and fine-tuning, optimized for performance

Collection: [llm]

Changelog

Added arg params for HuggingFace Token, NEMO_HOME env var and fine-tuning scheme

-hf/--hf_token # needed for downloading checkpoints and tokenizers from HF
-nh/--nemo_home # needed for accessing locally stored checkpoints and tokenizers
-f/--finetuning # 'lora' or 'sft'. default is 'lora'

Usage

python3 scripts/llm/performance/pretrain_llama3_8b.py -a <slurm_account> -p <slurm_partition>

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Malay Nagda <[email protected]>

scripts/llm/performance/finetuning_llama3_8b.py

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

scripts/llm/performance/utils.py

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

Signed-off-by: Malay Nagda <[email protected]>

Signed-off-by: malay-nagda <[email protected]>

scripts/llm/performance/finetune_llama31_405b.py

scripts/llm/performance/finetune_llama3_70b.py

scripts/llm/performance/finetune_llama3_8b.py

Signed-off-by: Malay Nagda <[email protected]>

github-actions · 2025-01-13T19:40:40Z

[🤖]: Hi @malay-nagda 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

scripts/llm/performance/utils.py

Signed-off-by: Malay Nagda <[email protected]>

github-actions · 2025-01-18T16:14:32Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.nlp.modules.common.tokenizer_utils
nemo/collections/nlp/modules/common/tokenizer_utils.py:73:0: C0301: Line too long (199/119) (line-too-long)
nemo/collections/nlp/modules/common/tokenizer_utils.py:96:0: C0301: Line too long (149/119) (line-too-long)
nemo/collections/nlp/modules/common/tokenizer_utils.py:131:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/nlp/modules/common/tokenizer_utils.py:233:0: C0301: Line too long (146/119) (line-too-long)
nemo/collections/nlp/modules/common/tokenizer_utils.py:42:0: C0115: Missing class docstring (missing-class-docstring)

-----------------------------------
Your code has been rated at 9.50/10

Mitigation guide:

Add sensible and useful docstrings to functions and methods
For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

finetuning llama3 8b

687f58f

Signed-off-by: Malay Nagda <[email protected]>

github-advanced-security bot found potential problems Jan 2, 2025

View reviewed changes

scripts/llm/performance/finetuning_llama3_8b.py Fixed Show fixed Hide fixed

malay-nagda and others added 2 commits January 7, 2025 17:31

llama3 70b

95f1809

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

c5c42cc

Signed-off-by: malay-nagda <[email protected]>

malay-nagda changed the title ~~finetuning llama3 8b~~ finetuning llama3 Jan 7, 2025

malay-nagda and others added 3 commits January 8, 2025 22:07

peft and slurm functional

4cde0fc

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

372d376

Signed-off-by: malay-nagda <[email protected]>

Merge branch 'main' into malay/llama3_finetuning

423d7b2

Signed-off-by: malay-nagda <[email protected]>

github-advanced-security bot found potential problems Jan 8, 2025

View reviewed changes

scripts/llm/performance/utils.py Fixed Show fixed Hide fixed

scripts/llm/performance/utils.py Fixed Show fixed Hide fixed

scripts/llm/performance/utils.py Fixed Show fixed Hide fixed

malay-nagda and others added 16 commits January 9, 2025 12:19

formatting and cleanup

92c381e

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

1bc19a1

Signed-off-by: malay-nagda <[email protected]>

Merge branch 'main' into malay/llama3_finetuning

916d220

405b lora + more cleanup

cf3bc02

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

fcbe667

Signed-off-by: malay-nagda <[email protected]>

no tp comm, import ckpt, data filename

46c89bd

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

4df3bda

Signed-off-by: malay-nagda <[email protected]>

renamed files

4a59854

Signed-off-by: Malay Nagda <[email protected]>

tp comm

c6d3c82

Signed-off-by: Malay Nagda <[email protected]>

mpi tp comm

c6c044a

Signed-off-by: Malay Nagda <[email protected]>

nemotron recipes

bc65fe4

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

d108b37

Signed-off-by: malay-nagda <[email protected]>

Merge branch 'main' into malay/llama3_finetuning

ec7f073

formatting, cleanup & nemotron tokenizer

d45209b

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

7ddc1ac

Signed-off-by: malay-nagda <[email protected]>

supported tokenizers

35e60ed

Signed-off-by: Malay Nagda <[email protected]>

github-actions bot added the NLP label Jan 13, 2025

malay-nagda marked this pull request as ready for review January 13, 2025 09:36

malay-nagda requested a review from erhoo82 January 13, 2025 09:36

malay-nagda added the r2.1.1 label Jan 13, 2025

cleanup

afb329a

Signed-off-by: Malay Nagda <[email protected]>

malay-nagda and others added 2 commits January 13, 2025 09:57

Apply isort and black reformatting

5984bb6

Signed-off-by: malay-nagda <[email protected]>

tp and pp related cfgs

c8ffcc1

Signed-off-by: Malay Nagda <[email protected]>

malay-nagda changed the title ~~finetuning llama3~~ llm performance scripts Jan 13, 2025

malay-nagda added the Run CICD label Jan 13, 2025

malay-nagda requested a review from vysarge January 13, 2025 10:25

malay-nagda and others added 3 commits January 13, 2025 16:21

formatting

bb72366

Signed-off-by: Malay Nagda <[email protected]>

340b fused attn

6022e64

Signed-off-by: Malay Nagda <[email protected]>

Apply isort and black reformatting

dc8819c

Signed-off-by: malay-nagda <[email protected]>

github-advanced-security bot found potential problems Jan 13, 2025

View reviewed changes

scripts/llm/performance/finetune_llama31_405b.py Fixed Show fixed Hide fixed

scripts/llm/performance/finetune_llama3_70b.py Fixed Show fixed Hide fixed

scripts/llm/performance/finetune_llama3_8b.py Fixed Show fixed Hide fixed

malay-nagda added 2 commits January 13, 2025 18:18

conditional nccl_pp_comm_chunksize

dc1e8ee

Signed-off-by: Malay Nagda <[email protected]>

null tokenizer

10145fe

Signed-off-by: Malay Nagda <[email protected]>

malay-nagda added Run CICD and removed Run CICD labels Jan 13, 2025

vysarge reviewed Jan 14, 2025

View reviewed changes

scripts/llm/performance/utils.py Outdated Show resolved Hide resolved

scripts/llm/performance/utils.py Outdated Show resolved Hide resolved

malay-nagda added 4 commits January 15, 2025 18:22

logs msgs

467ba30

Signed-off-by: Malay Nagda <[email protected]>

logs msgs

c0d291b

Signed-off-by: Malay Nagda <[email protected]>

temp mem mesaurement

a91071c

Signed-off-by: Malay Nagda <[email protected]>

mem usage, 8b tp4

c1a25db

Signed-off-by: Malay Nagda <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm performance scripts #11736

llm performance scripts #11736

malay-nagda commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 13, 2025

github-actions bot commented Jan 18, 2025

llm performance scripts #11736

Are you sure you want to change the base?

llm performance scripts #11736

Conversation

malay-nagda commented Jan 2, 2025 • edited Loading

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Jan 13, 2025

github-actions bot commented Jan 18, 2025

malay-nagda commented Jan 2, 2025 •

edited

Loading