-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Design Trainer for the LLM Runtimes #2321
Comments
/assign @saileshd1402 We are experimenting with some PyTorch-native and Transformers APIs to design this Trainer. |
@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
What we concern most is that 👀:
Please refer to Kubeflow Training V2 LLM Trainer Design Doc for design details:) /cc @kubeflow/wg-training-leads @Doris-xm @astefanutti @helenxie-bit @tariq-hasan @akshaychitneni @varshaprasad96 @tarekabouzeid @tarat44 @Syulin7 @sandipanpanda @mszadkow @akhilsaivenkata @tico88612 @danielsuh05 @kannon92 @gavrissh @saileshd1402 @ckyuto @Veer0x1 @astefanutti @oksanabaza @YosiElias @sophie0730 @seanlaii @Bobbins228 @droctothorpe @lowang-bh @mimowo @hkiiita @ChristopheBrown @harshithbelagur @marcmaliar @deepanker13 |
As part of Kubeflow Training V2 work, we should design and implement custom Trainer to fine-tune LLMs that we are planning to support via TrainingRuntimes in Kubeflow upstream.
We should discuss whether we should use native PyTorch APIs or HuggingFace Transformers in the LLM Trainer implementation.
The Trainer should allow users to configure LoRA, QLoRA, FSDP, and other important configurations.
Useful resources:
Part of: #2170
Design Doc
Initial design doc from @Electronic-Waste where we can brainstorm ideas: https://docs.google.com/document/d/1a4xWGVWZo43QKv8tIomoK_XHzBMC_byXBnDb0104htQ/edit?tab=t.0
cc @saileshd1402 @deepanker13 @kubeflow/wg-training-leads
Love this feature?
Give it a 👍 We prioritize the features with most 👍
The text was updated successfully, but these errors were encountered: