-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sharktank] Split Perplexity CI #452
base: main
Are you sure you want to change the base?
Conversation
- name: Run perplexity test with Torch | ||
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.
Can you clarify what you mean by "which repo regressed"? We should generally only be testing things we control here.
What about the logs at https://github.com/nod-ai/SHARK-Platform/actions/runs/11659531084 isn't clear?
I want to trend towards a smaller number of workflow files, not a larger one. I'm already confused enough by the list at https://github.com/nod-ai/SHARK-Platform/actions and https://github.com/nod-ai/SHARK-Platform/tree/main/.github/workflows. We have a mix of workflows defined by subproject (e.g. sharktank, shortfin, tuner), model (e.g. llama, sdxl), or by test category (e.g. perplexity, eval). There is quite a bit of overlap there, and as long as it isn't obvious which workflow a given test belongs in, people will just add a new workflow uniquely suited to that purpose. I've been refactoring workflows lately as part of rolling out packaging, and we have a substantial amount of copy/paste and eventual drift between workflows that I'm having to navigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there is a better way to categorize the workflows.
The sole reason to split ci_eval.yaml
is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.
I understand the logs are clear, but for our devs, workflow badges offer an easier way to keep informed about the CIs/regressions if any, instead of having to dig through the workflow logs. Like you said there are quite a few of them.
Let me know if there is an alternate way to have 2 workflow badges for each job running in a single workflow yaml, without splitting it. I couldn't find any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll respond in more detail tomorrow. Re-requested review to keep it in my queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_torch.py stays in native PyTorch to test our model implementations, while https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_vmfb.py exports to IREE then compiles and runs, right?
The sole reason to split
ci_eval.yaml
is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.
I'm not sure I buy this argument. Taken to the extreme, we could have one workflow per test, so we know exactly which test failed based on the workflow badges.
The possible technical reasons to split the workflow are:
- finer control over workflow triggers (e.g. run IREE eval on every commit and PyTorch eval nightly, or use workflow_dispatch to run one job but not the other)
- monitoring via workflow badges
For the monitoring side, regressions should be rare enough and addressed quickly enough that a bit of clicking through to see which sub-job failed is pretty reasonable in my opinion.
As these are long running jobs, I do actually like splitting for the other reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct about the 2 perplexity scripts.
A lot of fast moving things on both IREE, codegen and sharktank is why we feel the need to have separate badges to track regressions faster. If you noticed yesterday's nightly broke from an IREE-turbine regression.
- name: Install sharktank deps | ||
run: | | ||
python -m pip install --no-compile --upgrade pip | ||
# Note: We install in three steps in order to satisfy requirements | ||
# from non default locations first. Installing the PyTorch CPU | ||
# wheels saves multiple minutes and a lot of bandwidth on runner setup. | ||
pip install --no-compile -r pytorch-cpu-requirements.txt | ||
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \ | ||
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine" | ||
pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this workflow runs nightly, we could also switch from using an explicit full project build with the latest deps of all packages to using a nightly release build: https://github.com/nod-ai/SHARK-Platform/blob/main/docs/nightly_releases.md#quickstart---sharktank. For a number of these workflows I think we should be testing with the stable versions of dependencies (iree-base-compiler
, iree-base-runtime
, iree-turbine
) and the latest nightly / source versions of each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this needs to be a separate PR where all CIs are switched to the latest nightly release build. If not can update here.
About testing both stable and nightly of IREE, this CI might not be the right candidate. Currently 8b fp16 takes 3 hrs and we plan to add more models, quantizations and decomposed/non-decomposed, stretching it to > 12 hrs for one set of models to finish.
Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.