Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: enables ollama integration tests #23

Merged
merged 5 commits into from
Sep 23, 2024

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Sep 2, 2024

This adds CI for the existing ollama tests.

The model we use, currently mistral-nemo is somewhat large, so not helpful to run in docker for a couple reasons. Instead, we nohup ollama directly.

  • slower execution through containerization
  • more complicated cache setup, as you need to manage the model, which is pulled in docker

To reduce problems, pulling and smoke testing the model are done in a separate step, logging on failure. This gives us a high confidence that when the tests actually run, failures are related to tests, not slow first-time model startup hitting some timeout. All that said, the tests themselves do take time as it is using real inference now.

@codefromthecrypt codefromthecrypt marked this pull request as ready for review September 2, 2024 07:35
@codefromthecrypt
Copy link
Contributor Author

Lemme know if we are brave enough to run this in CI. One reason not to is that it could be flakey.

@michaelneale
Copy link
Collaborator

always brave enough for CI.

@michaelneale michaelneale requested a review from zakiali September 2, 2024 08:03
@codefromthecrypt codefromthecrypt changed the title Enables ollama integration tests with llama3.1:8b-instruct-q8_0 Enables ollama integration tests with llama3.1:8b-instruct-q4_0 Sep 3, 2024
@codefromthecrypt
Copy link
Contributor Author

Made tests conditional on OLLAMA_HOST and pull the model we need.

Note: docker/testcontainers is not a viable option as the models are slow to start (especially in docker which disallows GPU access), and also eat a lot of memory. I think if we want CI, we should just start ollama in github action. If sounds good, 👍 and I'll give it a go.

@codefromthecrypt
Copy link
Contributor Author

oops sorry about red I'll sort it later.

@codefromthecrypt
Copy link
Contributor Author

🤞 on CI, I am only running it in latest python, after normal unit tests.

@codefromthecrypt
Copy link
Contributor Author

last try on CI today 🤞

@codefromthecrypt
Copy link
Contributor Author

notes for tomorrow:

  • 1 of 4 passed. so look more closely at why that could be the case.
  • double check the cache actually works. i used the key based on ollama defaults

@codefromthecrypt codefromthecrypt marked this pull request as draft September 4, 2024 02:27
@codefromthecrypt
Copy link
Contributor Author

put back to draft until I sort out what's going on in CI. Feel free to edit this PR if someone wants to finish before I respond again

@codefromthecrypt
Copy link
Contributor Author

@k33g has some interesting options in https://github.com/parakeet-nest/awesome-slms trying again with testcontainers may not be a terrible idea. I'm out of time for the moment so anyone can give a go if they like.

@codefromthecrypt codefromthecrypt force-pushed the ollama-integration branch 2 times, most recently from 58b3bcb to a6e043c Compare September 19, 2024 09:45
@codefromthecrypt codefromthecrypt changed the title Enables ollama integration tests with llama3.1:8b-instruct-q4_0 ci: enables ollama integration tests Sep 19, 2024
@codefromthecrypt codefromthecrypt marked this pull request as ready for review September 19, 2024 09:58
@codefromthecrypt
Copy link
Contributor Author

@anuraaga if you want to do another drive-by review, more than welcome!

key: ollama-${{ hashFiles('./src/exchange/providers/ollama.py') }}

- name: Install Ollama
run: curl -fsSL https://ollama.com/install.sh | sh
Copy link
Contributor Author

@codefromthecrypt codefromthecrypt Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing part is the log here ends up saying "hey I'm running" when well it isn't ;) I used act locally to figure it out.

act pull_request -P ubuntu-latest=ghcr.io/catthehacker/ubuntu:act-latest --container-architecture linux/amd64

@codefromthecrypt
Copy link
Contributor Author

been pretty consistently 5m all in

Screenshot 2024-09-19 at 6 04 45 PM

.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
# this prior to running tests. This also reduces the chance of flakiness.
- name: Pull and Test Ollama model
run: | # get the OLLAMA_MODEL from ./src/exchange/providers/ollama.py
OLLAMA_MODEL=$(uv run python -c "from src.exchange.providers.ollama import OLLAMA_MODEL; print(OLLAMA_MODEL)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty awesome

.github/workflows/ci.yaml Outdated Show resolved Hide resolved
@codefromthecrypt
Copy link
Contributor Author

thanks again for the feedback @anuraaga this is better

@anuraaga
Copy link
Contributor

LGTM (seems possible for repo to disable grey check drive by approvals, just learned that 😀)

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this as focused and self contained (and means can use the check as a matrix for ollama support)

@codefromthecrypt
Copy link
Contributor Author

PTAL, I added a commit I can revert that makes the test a lot faster by using qwen2.5's smallest model. I did this in a way that leaves the default mistral. Idea being that we don't spend 5mins for the few tests we have now. Let's see.

@codefromthecrypt
Copy link
Contributor Author

Screenshot 2024-09-21 at 8 02 05 AM

down to 1m20s

As the default model is for goose anyway, my thinking is that we only test with a model needed to pass tests here. In goose, when we make an integration test, we can use the larger model as well. The main tradeoff is someone could add a typo to the default model, and the PR check wouldn't know about it, but that's the case in the other providers anyway.

@@ -6,7 +7,7 @@
@pytest.mark.integration
def test_ollama_integration():
provider = OllamaProvider.from_env()
model = OLLAMA_MODEL
model = os.getenv("OLLAMA_MODEL", OLLAMA_MODEL)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also happy to make a constant in ollama.py OLLAMA_TEST_MODEL, re-used for these two tests. That way we don't need to define it in CI. OTOH, I didn't do that so far because I think ad-hoc can still use the large model, and that's a good thing.

baxen and others added 4 commits September 23, 2024 14:16
Co-authored-by: Anuraag (Rag) Agrawal <[email protected]>
Signed-off-by: Adrian Cole <[email protected]>
Signed-off-by: Adrian Cole <[email protected]>
Signed-off-by: Adrian Cole <[email protected]>
@michaelneale michaelneale merged commit 5b34bc5 into square:main Sep 23, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants