[sharktank] Split Perplexity CI #452

archana-ramalingam · 2024-11-08T00:01:04Z

Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.

ScottTodd · 2024-11-08T00:27:10Z

.github/workflows/ci_eval_torch.yaml

+      - name: Run perplexity test with Torch
+        run:  pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json


Split Perplexity CI nightly workflow to Torch and IREE to be able to fetch/read their status separately, providing more clarity on which repo regressed.

Can you clarify what you mean by "which repo regressed"? We should generally only be testing things we control here.

What about the logs at https://github.com/nod-ai/SHARK-Platform/actions/runs/11659531084 isn't clear?

I want to trend towards a smaller number of workflow files, not a larger one. I'm already confused enough by the list at https://github.com/nod-ai/SHARK-Platform/actions and https://github.com/nod-ai/SHARK-Platform/tree/main/.github/workflows. We have a mix of workflows defined by subproject (e.g. sharktank, shortfin, tuner), model (e.g. llama, sdxl), or by test category (e.g. perplexity, eval). There is quite a bit of overlap there, and as long as it isn't obvious which workflow a given test belongs in, people will just add a new workflow uniquely suited to that purpose. I've been refactoring workflows lately as part of rolling out packaging, and we have a substantial amount of copy/paste and eventual drift between workflows that I'm having to navigate.

I agree that there is a better way to categorize the workflows.

The sole reason to split ci_eval.yaml is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.

I understand the logs are clear, but for our devs, workflow badges offer an easier way to keep informed about the CIs/regressions if any, instead of having to dig through the workflow logs. Like you said there are quite a few of them.

Let me know if there is an alternate way to have 2 workflow badges for each job running in a single workflow yaml, without splitting it. I couldn't find any.

Thanks. I'll respond in more detail tomorrow. Re-requested review to keep it in my queue.

Ah, so https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_torch.py stays in native PyTorch to test our model implementations, while https://github.com/nod-ai/SHARK-Platform/blob/main/sharktank/sharktank/evaluate/perplexity_vmfb.py exports to IREE then compiles and runs, right?

The sole reason to split ci_eval.yaml is to have 2 workflow badges at sharktank/README.md for Torch and IREE. So when IREE fails we know it's IREE regression and not sharktank.

I'm not sure I buy this argument. Taken to the extreme, we could have one workflow per test, so we know exactly which test failed based on the workflow badges.

The possible technical reasons to split the workflow are:

finer control over workflow triggers (e.g. run IREE eval on every commit and PyTorch eval nightly, or use workflow_dispatch to run one job but not the other)

monitoring via workflow badges

For the monitoring side, regressions should be rare enough and addressed quickly enough that a bit of clicking through to see which sub-job failed is pretty reasonable in my opinion.

As these are long running jobs, I do actually like splitting for the other reason.

That's correct about the 2 perplexity scripts.

A lot of fast moving things on both IREE, codegen and sharktank is why we feel the need to have separate badges to track regressions faster. If you noticed yesterday's nightly broke from an IREE-turbine regression.

This reverts commit dcc39bc.

ScottTodd · 2024-11-11T19:06:29Z

.github/workflows/ci_eval_torch.yaml

+      - name: Install sharktank deps
+        run: |
+          python -m pip install --no-compile --upgrade pip
+          # Note: We install in three steps in order to satisfy requirements
+          # from non default locations first. Installing the PyTorch CPU
+          # wheels saves multiple minutes and a lot of bandwidth on runner setup.
+          pip install --no-compile -r pytorch-cpu-requirements.txt
+          pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
+            -e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
+          pip install --no-compile -r requirements.txt -r sharktank/requirements-tests.txt -e sharktank/


As this workflow runs nightly, we could also switch from using an explicit full project build with the latest deps of all packages to using a nightly release build: https://github.com/nod-ai/SHARK-Platform/blob/main/docs/nightly_releases.md#quickstart---sharktank. For a number of these workflows I think we should be testing with the stable versions of dependencies (iree-base-compiler, iree-base-runtime, iree-turbine) and the latest nightly / source versions of each.

I feel this needs to be a separate PR where all CIs are switched to the latest nightly release build. If not can update here.

About testing both stable and nightly of IREE, this CI might not be the right candidate. Currently 8b fp16 takes 3 hrs and we plan to add more models, quantizations and decomposed/non-decomposed, stretching it to > 12 hrs for one set of models to finish.

archana-ramalingam added 3 commits November 7, 2024 23:43

Split workflows for separate status badges

b64ed03

Rename _vmfb to _iree

b75c6bb

Add separate workflow badges for Torch and IREE

258ce89

archana-ramalingam requested a review from ScottTodd November 8, 2024 00:01

ScottTodd reviewed Nov 8, 2024

View reviewed changes

ScottTodd self-requested a review November 8, 2024 00:43

archana-ramalingam added 4 commits November 8, 2024 01:07

Add perplexity scoreboard and description

a36f078

Test CI changes

dcc39bc

Revert "Test CI changes"

0313396

This reverts commit dcc39bc.

Merge branch 'main' into split-perplexity-ci

2e0bfd5

ScottTodd reviewed Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sharktank] Split Perplexity CI #452

[sharktank] Split Perplexity CI #452

archana-ramalingam commented Nov 8, 2024

ScottTodd Nov 8, 2024

archana-ramalingam Nov 8, 2024

ScottTodd Nov 8, 2024

ScottTodd Nov 11, 2024

archana-ramalingam Nov 12, 2024

ScottTodd Nov 11, 2024

archana-ramalingam Nov 12, 2024

		- name: Run perplexity test with Torch
		run: pytest -n 8 -v -s sharktank/tests/evaluate/perplexity_torch_test.py --longrun --llama3-8b-f16-model-path=/data/llama3.1/8b/llama8b_f16.irpa --llama3-8b-tokenizer-path=/data/llama3.1/8b/tokenizer_config.json

[sharktank] Split Perplexity CI #452

Are you sure you want to change the base?

[sharktank] Split Perplexity CI #452

Conversation

archana-ramalingam commented Nov 8, 2024

ScottTodd Nov 8, 2024

Choose a reason for hiding this comment

archana-ramalingam Nov 8, 2024

Choose a reason for hiding this comment

ScottTodd Nov 8, 2024

Choose a reason for hiding this comment

ScottTodd Nov 11, 2024

Choose a reason for hiding this comment

archana-ramalingam Nov 12, 2024

Choose a reason for hiding this comment

ScottTodd Nov 11, 2024

Choose a reason for hiding this comment

archana-ramalingam Nov 12, 2024

Choose a reason for hiding this comment