Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sharktank] Split Perplexity CI #452

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

archana-ramalingam
Copy link
Collaborator

Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.

Comment on lines +67 to +68
- name: Run perplexity test with Torch
run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.

Can you clarify what you mean by "which repo regressed"? We should generally only be testing things we control here.

What about the logs at https://github.com/nod-ai/SHARK-Platform/actions/runs/11659531084 isn't clear?

I want to trend towards a smaller number of workflow files, not a larger one. I'm already confused enough by the list at https://github.com/nod-ai/SHARK-Platform/actions and https://github.com/nod-ai/SHARK-Platform/tree/main/.github/workflows. We have a mix of workflows defined by subproject (e.g. sharktank, shortfin, tuner), model (e.g. llama, sdxl), or by test category (e.g. perplexity, eval). There is quite a bit of overlap there, and as long as it isn't obvious which workflow a given test belongs in, people will just add a new workflow uniquely suited to that purpose. I've been refactoring workflows lately as part of rolling out packaging, and we have a substantial amount of copy/paste and eventual drift between workflows that I'm having to navigate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is a better way to categorize the workflows.

The sole reason to split ci_eval.yaml is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.

I understand the logs are clear, but for our devs, workflow badges offer an easier way to keep informed about the CIs/regressions if any, instead of having to dig through the workflow logs. Like you said there are quite a few of them.

Let me know if there is an alternate way to have 2 workflow badges for each job running in a single workflow yaml, without splitting it. I couldn't find any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll respond in more detail tomorrow. Re-requested review to keep it in my queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_torch.py stays in native PyTorch to test our model implementations, while https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_vmfb.py exports to IREE then compiles and runs, right?

The sole reason to split ci_eval.yaml is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.

I'm not sure I buy this argument. Taken to the extreme, we could have one workflow per test, so we know exactly which test failed based on the workflow badges.

The possible technical reasons to split the workflow are:

  • finer control over workflow triggers (e.g. run IREE eval on every commit and PyTorch eval nightly, or use workflow_dispatch to run one job but not the other)
  • monitoring via workflow badges

For the monitoring side, regressions should be rare enough and addressed quickly enough that a bit of clicking through to see which sub-job failed is pretty reasonable in my opinion.

As these are long running jobs, I do actually like splitting for the other reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct about the 2 perplexity scripts.

A lot of fast moving things on both IREE, codegen and sharktank is why we feel the need to have separate badges to track regressions faster. If you noticed yesterday's nightly broke from an IREE-turbine regression.

@ScottTodd ScottTodd self-requested a review November 8, 2024 00:43
Comment on lines +56 to +65
- name: Install sharktank deps
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-cpu-requirements.txt
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this workflow runs nightly, we could also switch from using an explicit full project build with the latest deps of all packages to using a nightly release build: https://github.com/nod-ai/SHARK-Platform/blob/main/docs/nightly_releases.md#quickstart---sharktank. For a number of these workflows I think we should be testing with the stable versions of dependencies (iree-base-compiler, iree-base-runtime, iree-turbine) and the latest nightly / source versions of each.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this needs to be a separate PR where all CIs are switched to the latest nightly release build. If not can update here.

About testing both stable and nightly of IREE, this CI might not be the right candidate. Currently 8b fp16 takes 3 hrs and we plan to add more models, quantizations and decomposed/non-decomposed, stretching it to > 12 hrs for one set of models to finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants