-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharktank] Split Perplexity CI #452
base: main
Are you sure you want to change the base?
Changes from all commits
b64ed03
b75c6bb
258ce89
a36f078
dcc39bc
0313396
2e0bfd5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Copyright 2024 Advanced Micro Devices, Inc. | ||
# | ||
# Licensed under the Apache License v2.0 with LLVM Exceptions. | ||
# See https://llvm.org/LICENSE.txt for license information. | ||
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
name: CI - Perplexity Torch | ||
|
||
on: | ||
workflow_dispatch: | ||
schedule: | ||
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT. | ||
- cron: "0 7 * * 1-5" | ||
|
||
concurrency: | ||
# A PR number if a pull request and otherwise the commit hash. This cancels | ||
# queued and in-progress runs for the same PR (presubmit) or commit | ||
# (postsubmit). The workflow name is prepended to avoid conflicts between | ||
# different workflows. | ||
group: ${{ github.workflow }}-${{ github.event.number || github.sha }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
test_perplexity_torch: | ||
timeout-minutes: 1000 | ||
name: "Perplexity Torch" | ||
strategy: | ||
matrix: | ||
version: [3.11] | ||
runs-on: [llama-mi300x-3] | ||
fail-fast: false | ||
runs-on: ${{matrix.runs-on}} | ||
defaults: | ||
run: | ||
shell: bash | ||
env: | ||
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache" | ||
SHARK_PLATFORM_REPO_ROOT: ${{ github.workspace }} | ||
steps: | ||
- name: "Setting up Python" | ||
id: setup_python | ||
uses: actions/setup-python@v3 | ||
with: | ||
python-version: ${{matrix.version}} | ||
|
||
- name: "Checkout Code" | ||
uses: actions/checkout@v3 | ||
|
||
- name: Cache Pip Packages | ||
uses: actions/cache@v4 | ||
id: cache-pip | ||
with: | ||
path: ${{ env.PIP_CACHE_DIR }} | ||
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements.txt') }} | ||
|
||
- name: Install sharktank deps | ||
run: | | ||
python -m pip install --no-compile --upgrade pip | ||
# Note: We install in three steps in order to satisfy requirements | ||
# from non default locations first. Installing the PyTorch CPU | ||
# wheels saves multiple minutes and a lot of bandwidth on runner setup. | ||
pip install --no-compile -r pytorch-cpu-requirements.txt | ||
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \ | ||
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine" | ||
pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/ | ||
|
||
- name: Run perplexity test with Torch | ||
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json | ||
Comment on lines
+67
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by "which repo regressed"? We should generally only be testing things we control here. What about the logs at https://github.com/nod-ai/SHARK-Platform/actions/runs/11659531084 isn't clear? I want to trend towards a smaller number of workflow files, not a larger one. I'm already confused enough by the list at https://github.com/nod-ai/SHARK-Platform/actions and https://github.com/nod-ai/SHARK-Platform/tree/main/.github/workflows. We have a mix of workflows defined by subproject (e.g. sharktank, shortfin, tuner), model (e.g. llama, sdxl), or by test category (e.g. perplexity, eval). There is quite a bit of overlap there, and as long as it isn't obvious which workflow a given test belongs in, people will just add a new workflow uniquely suited to that purpose. I've been refactoring workflows lately as part of rolling out packaging, and we have a substantial amount of copy/paste and eventual drift between workflows that I'm having to navigate. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that there is a better way to categorize the workflows. The sole reason to split I understand the logs are clear, but for our devs, workflow badges offer an easier way to keep informed about the CIs/regressions if any, instead of having to dig through the workflow logs. Like you said there are quite a few of them. Let me know if there is an alternate way to have 2 workflow badges for each job running in a single workflow yaml, without splitting it. I couldn't find any. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I'll respond in more detail tomorrow. Re-requested review to keep it in my queue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, so https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_torch.py stays in native PyTorch to test our model implementations, while https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_vmfb.py exports to IREE then compiles and runs, right?
I'm not sure I buy this argument. Taken to the extreme, we could have one workflow per test, so we know exactly which test failed based on the workflow badges. The possible technical reasons to split the workflow are:
For the monitoring side, regressions should be rare enough and addressed quickly enough that a bit of clicking through to see which sub-job failed is pretty reasonable in my opinion. As these are long running jobs, I do actually like splitting for the other reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's correct about the 2 perplexity scripts. A lot of fast moving things on both IREE, codegen and sharktank is why we feel the need to have separate badges to track regressions faster. If you noticed yesterday's nightly broke from an IREE-turbine regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this workflow runs nightly, we could also switch from using an explicit full project build with the latest deps of all packages to using a nightly release build: https://github.com/nod-ai/SHARK-Platform/blob/main/docs/nightly_releases.md#quickstart---sharktank. For a number of these workflows I think we should be testing with the stable versions of dependencies (
iree-base-compiler
,iree-base-runtime
,iree-turbine
) and the latest nightly / source versions of each.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this needs to be a separate PR where all CIs are switched to the latest nightly release build. If not can update here.
About testing both stable and nightly of IREE, this CI might not be the right candidate. Currently 8b fp16 takes 3 hrs and we plan to add more models, quantizations and decomposed/non-decomposed, stretching it to > 12 hrs for one set of models to finish.