-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update yml file to run 8b tests on presubmit and 70b and 405b tests nightly #387
Conversation
.github/workflows/ci-llama.yaml
Outdated
- name: Run llama 70b and 405b tests | ||
if: github.event_name != "pull_request" | ||
run: pytest sharktank/tests/models/llama/benchmark_amdgpu_test.py -v -s --longrun --iree-hip-target=gfx942 --html=out/index.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing which tests a workflow runs based on the event trigger feels like a recipe for confusion.
- What happens with the
workflow_dispatch
trigger? - Will developers know that https://github.com/nod-ai/SHARK-Platform/actions/workflows/ci-llama.yaml?query=event%3Apull_request and https://github.com/nod-ai/SHARK-Platform/actions/workflows/ci-llama.yaml?query=event%3Aschedule are testing entirely different things?
Workflow should be as simple and predictable as possible - don't overengineer this.
I would split into one workflow that runs on push
and pull_request
then a separate workflow that runs on schedule
(and both should support workflow_dispatch
for debugging).
If you're concerned about duplicating boilerplate... that needs to be fixed too. The nightly workflow should be using nightly release packages, and we need to roll out the new packaging into these workflows. Lines 59-76 should be condensed down to one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright sounds good, will split into two workflow files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tests fit nicely alongside other sharktank tests, whatever runs on push
and pull_request
could be part of https://github.com/nod-ai/SHARK-Platform/blob/main/.github/workflows/ci-sharktank.yml.
What were the original reasons for creating a separate workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original reason for creating a separate workflow was to be able to have a few of the small llama tests (8b) run on pull_request
in order to test any regressions in sharktank with all the changes coming in. The larger llama tests (70b and 405b) take much longer, so would just be run nightly.
CC @saienduri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's go with two workflows due to the machine constraint (won't be able to slot nicely into ci-sharktank).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing is the hardcoded paths, so it would have to be it's own step that only runs when the matrix.runs_on
is the mi300x machine, which I guess is fine. For the long tests, we only want to run that on a schedule/workflow_dispatch, so having that in its own workflow file makes the most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, Avi we can probably add the 8b testing to the ci-sharktank.yml
and the full llama testing to a new workflow file ci-sharktank-nightly.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we shouldn't have any hardcoded paths. That needs to be fixed. Write tests first for developers to use (on any system that has the right hardware, enough disk space, etc.), then teach the CI to run them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aviator19941 is it possible to have the 8b files in huggingface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree, but looks like there are too many moving parts/files/configurations right now (model size, data type, batch size, TP sharding, decomposed vs decodeposed attention) which makes it hard to move to huggingface quickly. Let's go with separate workflows for now (just so it doesn't halt testing coverage for llama), but the end goal here should be moving to huggingface and the ci-sharktank.yml.
ab51261
to
ab67cf9
Compare
@pytest.mark.xfail( | ||
reason="Test not yet implemented", strict=True, raises=AttributeError | ||
reason="Test not yet implemented", strict=True, raises=ExportMlirException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the reasons appropriately for all the tests?
4da6643
to
1ac1e0d
Compare
913e91e
to
82eee2c
Compare
084bcd8
to
cba07d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
1f21e83
to
cef7e2b
Compare
9b92a79
to
e8f90d4
Compare
…n nightly Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
Signed-off-by: aviator19941 <[email protected]>
b9e2d68
to
1cfbb2c
Compare
…mpiles very slowly Signed-off-by: aviator19941 <[email protected]>
In the future, when making substantial changes to a pull request after obtaining approval, please re-request review. I wasn't paying attention here because there were no new comments on the review thread and the commit history is confusing (32 commits with multiple rounds of force pushing over may days). A PR with +539 −384 lines changed should at a minimum have a longer summary / commit message and go through closer review. |
Updates yml file to run 8b tests on each pull_request and 70b and 405b tests nightly.