diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000000..9324f63c07 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,293 @@ +# Contributing to Opik + +We're excited that you're interested in contributing to Opik! There are many ways to contribute, from writing code to improving the documentation. + +The easiest way to get started is to: + +* Submit [bug reports](https://github.com/comet-ml/opik/issues) and [feature requests](https://github.com/comet-ml/opik/issues) +* Review the documentation and submit [Pull Requests](https://github.com/comet-ml/opik/pulls) to improve it +* Speaking or writing about Opik and [letting us know](https://chat.comet.com) +* Upvoting [popular feature requests](https://github.com/comet-ml/opik/issues?q=is%3Aissue+is%3Aopen+label%3A%22feature+request%22) to show your support + + +## Submitting a new issue or feature request + +### Submitting a new issue + +Thanks for taking the time to submit an issue, it's the best way to help us improve Opik! + +Before submitting a new issue, please check the [existing issues](https://github.com/comet-ml/opik/issues) to avoid duplicates. + +To help us understand the issue you're experiencing, please provide steps to reproduce the issue included a minimal code snippet that reproduces the issue. This helps us diagnose the issue and fix it more quickly. + +### Submitting a new feature request + +Feature requests are welcome! To help us understand the feature you'd like to see, please provide: + +1. A short description of the motivation behind this request +2. A detailed description of the feature you'd like to see, including any code snippets if applicable + +If you are in a position to submit a PR for the feature, feel free to open a PR ! + +## Project set up and Architecture + +The Opik project is made up of five main sub-projects: + +* `apps/opik-documentation`: The Opik documentation website +* `deployment/installer`: The Opik installer +* `sdks/python`: The Opik Python SDK +* `apps/opik-frontend`: The Opik frontend application +* `apps/opik-backend`: The Opik backend server + + +In addition, Opik relies on: + +1. Clickhouse: Used to trace traces, spans and feedback scores +2. MySQL: Used to store metadata associated with projects, datasets, experiments, etc. +3. Redis: Used for caching + +### Contributing to the documentation + +The documentation is made up of three main parts: + +1. `apps/opik-documentation/documentation`: The Opik documentation website +2. `apps/opik-documentation/python-sdk-docs`: The Python reference documentation +3. `apps/opik-documentation/rest-api-docs`: The REST API reference documentation + +#### Contributing to the documentation website + +The documentation website is built using [Docusaurus](https://docusaurus.io/) and is located in `apps/opik-documentation/documentation`. + +In order to run the documentation website locally, you need to have `npm` installed. Once installed, you can run the documentation locally using the following command: + +```bash +cd apps/opik-documentation/documentation + +# Install dependencies - Only needs to be run once +npm install + +# Run the documentation website locally +npm run start +``` + +You can then access the documentation website at `http://localhost:3000`. Any change you make to the documentation will be updated in real-time. + +#### Contributing to the Python SDK reference documentation + +The Python SDK reference documentation is built using [Sphinx](https://www.sphinx-doc.org/en/master/) and is located in `apps/opik-documentation/python-sdk-docs`. + +In order to run the Python SDK reference documentation locally, you need to have `python` and `pip` installed. Once installed, you can run the documentation locally using the following command: + +```bash +cd apps/opik-documentation/python-sdk-docs + +# Install dependencies - Only needs to be run once +pip install -r requirements.txt + +# Run the python sdk reference documentation locally +make dev +``` + +The Python SDK reference documentation will be built and available at `http://127.0.0.1:8000`. Any change you make to the documentation will be updated in real-time. + +### Contributing to the Python SDK + +The Python SDK is available under `sdks/python` and can be installed locally using `pip install -e sdks/python`. + +To test your changes locally, you can run Opik locally using `opik server install`. + +Before submitting a PR, please ensure that your code passes the test suite: + +```bash +cd sdks/python + +pytest tests/ +``` + +and the linter: + +```bash +cd sdks/python + +pre-commit run --all-files +``` + +> [!NOTE] +> If you changes impact public facing methods or docstrings, please also update the documentation. You can find more information about updating the docs in the [documentation contribution guide](#contributing-to-the-documentation). + +### Contributing to the installer + +The Opik server installer is a Python package that installs and manages the Opik server on a local machine. In order to achieve this, the installer relies on: + +1. Minikube: Used to manage the Kubernetes cluster +2. Helm: Used to manage the Kubernetes charts +3. Ansible: Used to manage the installation of the Opik server + +#### Building the package +In order to build the package: + +1. Ensure that you have the necessary packaging dependencies installed: + +```bash +pip install -r pub-requirements.txt +``` + +2. Run the following command to build the package: + +```bash +python -m build --wheel +``` + +This will create a `dist` directory containing the built package. + +3. You can now upload the package to the PyPi repository using `twine`: + +```bash +twine upload dist/* +``` + +#### QA Testing + +To test the installer, clone this repository onto the machine you want to +install the Opik server on and install the package using the following +commands: + +```bash +# Make sure pip is up to date +pip install --upgrade pip + +# Clone the repository +git clone git@github.com:comet-ml/opik.git + +# You may need to checkout the branch you want to test +# git checkout installer-pkg + +cd opik/deployment/installer/ + +# Install the package +pip install . +``` + +If your pip installation path you may get a warning that the package is not +installed in your `PATH`. This is fine, the package will still work. +But you will need to call the fully qualified path to the executable. +Review the warning message to see the path to the executable. + +```bash +# When the package is publically released none of these flags will be needed. +# and you will be able to simply run `opik-server install` +opik-server install --opik-version 0.1.0 +``` + +This will install the Opik server on your machine. + +By default this will hide the details of the installation process. If you want +to see the details of the installation process, you can add the `--debug` +flag just before the `install` command. + +```bash +opik-server --debug install ........ +``` + +If successful, the message will instruct you to run a kubectl command to +forward the necessary ports to your local machine, and provide you with the +URL to access the Opik server. + +#### Uninstalling + +To uninstall the Opik server, run the following command: + +```bash +minikube delete +``` + +To reset the machine to a clean state, with no Opik server installed, it is +best to use a fresh VM. But if you want to reset the machine to a clean state +without reinstalling the VM, you can run the following commands: + +##### macOS + +```bash +minikube delete +brew uninstall minikube +brew uninstall helm +brew uninstall kubectl +brew uninstall --cask docker +rm -rf ~/.minikube +rm -rf ~/.helm +rm -rf ~/.kube +rm -rf ~/.docker +sudo find /usr/local/bin -lname '/Applications/Docker.app/*' -exec rm {} + +``` + +##### Ubuntu + +```bash +minikube delete +sudo apt-get remove helm kubectl minikube docker-ce containerd.io +rm -rf ~/.minikube +rm -rf ~/.helm +rm -rf ~/.kube +rm -rf ~/.docker +``` + +### Contributing to the frontend + +The Opik frontend is a React application that is located in `apps/opik-frontend`. + +In order to run the frontend locally, you need to have `npm` installed. Once installed, you can run the frontend locally using the following command: + +```bash +cd apps/opik-frontend + +# Install dependencies - Only needs to be run once +npm install + +# Run the frontend locally +npm run start +``` + +You can then access the development frontend at `http://localhost:5174/`. Any change you make to the frontend will be updated in real-time. + +The dev server is set up to work with Opik BE run on `http://localhost:8080`. All requests made to `http://localhost:5174/api` are proxied to the backend. +The server port can be changed in `vite.config.ts` file section `proxy`. + +> [!NOTE] +> You will need to have the backend running locally in order for the frontend to work. For this, we recommend running a local instance of Opik using `opik server install`. + +Before submitting a PR, please ensure that your code passes the test suite, the linter and the type checker: + +```bash +cd apps/opik-frontend + +npm run test +npm run lint +npm run typecheck +``` + +### Contributing to the backend + +The Opik backend is a Java application that is located in `apps/opik-backend`. + +In order to run the backend locally, you need to have `java` and `maven` installed. Once installed, you can run the backend locally using the following command: + +```bash +cd apps/opik-backend + +# Build the Opik application +mvn clean install + +# Run the Opik application - +java -jar target/opik-backend-{project.pom.version}.jar server config.yml +``` +Replace `{project.pom.version}` with the version of the project in the pom file. + +Once the backend is running, you can access the Opik API at `http://localhost:8080`. + +Before submitting a PR, please ensure that your code passes the test suite: + +```bash +cd apps/opik-backend + +mvn test +``` diff --git a/README.md b/README.md index 5e8f82154d..e6cbb64362 100644 --- a/README.md +++ b/README.md @@ -1,117 +1,161 @@ -# opik +

+
+ + +
+ Opik +
+
+ Open-source end-to-end LLM Development Platform
+

-## Running Comet Opik locally +

+Confidently evaluate, test and monitor LLM applications.  +

-Comet Opik contains two main services: -1. Frontend available at `apps/opik-frontend/README.md` -2. Backend available at `apps/opik-backend/README.md` +
-### Python SDK +[![Python SDK](https://img.shields.io/pypi/v/opik)](https://pypi.org/project/opik/) +[![License](https://img.shields.io/github/license/comet-ml/opik)](https://github.com/comet-ml/opik/blob/main/LICENSE) +[![Build](https://github.com/comet-ml/opik/actions/workflows/build_apps.yml/badge.svg)](https://github.com/comet-ml/opik/actions/workflows/build_apps.yml) + + + -You can install the latest version of the Python SDK by running: +
-```bash -# Navigate and pull the latest changes if there are any -cd sdks/python -git checkout main -git pull +

+ Website • + Slack community • + Twitter • + Documentation +

-# Pip install the local version of the SDK -pip install -e . -U -``` +![Opik thumbnail](readme-thumbnail.png) -## Running the full application locally with minikube +## 🚀 What is Opik? -### Installation Prerequisites +[Opik](https://www.comet.com/site/products/opik) is an open-source platform for evaluating, testing and monitoring LLM applications. Built by [Comet](https://www.comet.com). -- Docker - https://docs.docker.com/engine/install/ +
-- kubectl - https://kubernetes.io/docs/tasks/tools/#kubectl +You can use Opik for: +* **Development:** + * **Tracing:** Track all LLM calls and traces during development and production ([Quickstart](https://www.comet.com/docs/opik/quickstart), [Integrations](https://www.comet.com/docs/opik/integrations/overview)) + * **Annotations:** Annotate your LLM calls by logging feedback scores using the [Python SDK](...), [Rest API](...) or the [UI](...). -- Helm - https://helm.sh/docs/intro/install/ +* **Evaluation**: Automate the evaluation process of your LLM application: -- minikube - https://minikube.sigs.k8s.io/docs/start + * **Datasets and Experiments**: Store test cases and run experiments ([Datasets](https://www.comet.com/docs/opik/evaluation/manage_datasets), [Evaluate your LLM Application](https://www.comet.com/docs/opik/evaluation/evaluate_your_llm)) -- more tools: - - **`bash`** completion / `zsh` completion - - `kubectx` and `kubens` - easy switch context/namespaces for kubectl - https://github.com/ahmetb/kubectx + * **LLM as a judge metrics**: Use Opik's LLM as a judge metric for complex issues like [hallucination detection](https://www.comet.com/docs/opik/evaluation/metrics/hallucination), [moderation](https://www.comet.com/docs/opik/evaluation/metrics/moderation) and RAG evaluation ([Answer Relevance](https://www.comet.com/docs/opik/evaluation/metrics/answer_relevance), [Context Precision](https://www.comet.com/docs/opik/evaluation/metrics/context_precision) and [Answer Relevance](https://www.comet.com/docs/opik/evaluation/metrics/answer_relevance)) -### Run k8s cluster locally + * **CI/CD integration**: Run evaluations as part of your CI/CD pipeline using our [PyTest integration](...) -Start your `minikube` cluster https://minikube.sigs.k8s.io/docs/start/ +* **Production Monitoring**: Monitor your LLM application in production and easily close the feedback loop by adding error traces to your evaluation datasets. -```bash -minikube start -``` +
-### Build and run -Run the script that builds and runs Opik on `minikube` -```bash -./build_and_run.sh -``` +## 🛠️ Installation -Script options -``` ---no-build Skip the build process ---no-fe-build Skip the FE build process ---no-helm-update Skip helm repo update ---local-fe Run FE locally (For frontend developers) ---help Display help message -``` -Note that when you run it for the first time it can take a few minutes to install everything +The easiest way to get started with Opik is by creating a free Comet account at [comet.com](https://www.comet.com/signup?from=llm). -To check that your application is running enter url `http://localhost:5173` -To check api documentation enter url `http://localhost:3003` -You can run the `clickhouse-client` with -```bash -kubectl exec -it chi-opik-clickhouse-cluster-0-0-0 clickhouse-client -``` -After the client is connected, you can check the databases with -```bash -show databases; -``` +If you'd like to self-host Opik, you create a simple local version of Opik using:: -### Some simple k8s commands to manage the installation -List the pods that are running ```bash -kubectl get pods -``` -To restart a pod just delete the pod, k8s will start a new one -```bash -kubectl delete pod +pip install opik-installer + +opik-server install ``` -There is no clean way to delete the databases, so if you need to do that, it's better to delete the namespace and then install again. -Run + +For more information about the different deployment options, please see our deployment guides: + +| Installation methods | Docs link | +| ------------------- | --------- | +| Local instance | [![Minikube](https://img.shields.io/badge/minikube-%230db7ed.svg?&logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyMDAiIGhlaWdodD0iMjAwIiB2aWV3Qm94PSIwIDAgMzIgMzIiPgogIDxkZWZzPgogICAgPG1hc2sgaWQ9InZzY29kZUljb25zRm9sZGVyVHlwZU1pbmlrdWJlMCIgd2lkdGg9IjIxIiBoZWlnaHQ9IjIwLjQ0OSIgeD0iMTAiIHk9IjEwLjU3NSIgbWFza1VuaXRzPSJ1c2VyU3BhY2VPblVzZSI+CiAgICAgIDxwYXRoIGZpbGw9IiNmZmYiIGZpbGwtcnVsZT0iZXZlbm9kZCIgZD0iTTMxIDMxLjAyNXYtMjAuNDVIMTB2MjAuNDVoMjF6Ii8+CiAgICA8L21hc2s+CiAgPC9kZWZzPgogIDxwYXRoIGZpbGw9IiM1NWI1YmYiIGQ9Ik0yNy45IDZoLTkuOGwtMiA0SDV2MTdoMjVWNlptLjEgNGgtNy44bDEtMkgyOFoiLz4KICA8ZyBtYXNrPSJ1cmwoI3ZzY29kZUljb25zRm9sZGVyVHlwZU1pbmlrdWJlMCkiPgogICAgPHBhdGggZmlsbD0iIzMyNmRlNiIgZmlsbC1ydWxlPSJldmVub2RkIiBkPSJNMjAuNTIgMTAuNTc1YTIuMDM4IDIuMDM4IDAgMCAwLS44NDEuMTkxbC02Ljg3MSAzLjI4NmExLjkyMSAxLjkyMSAwIDAgMC0xLjA1OSAxLjMxN2wtMS43IDcuMzczYTEuOTI0IDEuOTI0IDAgMCAwIC4zODEgMS42NTZsNC43NTIgNS45MDdhMS45MTcgMS45MTcgMCAwIDAgMS41MDcuNzJoNy42MThhMS45MTcgMS45MTcgMCAwIDAgMS41MDctLjcybDQuNzU0LTUuOTA1YTEuOTE0IDEuOTE0IDAgMCAwIC4zODEtMS42NTZsLTEuNy03LjM3M2ExLjkyMSAxLjkyMSAwIDAgMC0xLjA1OS0xLjMxN2wtNi44NDMtMy4yODZhMS45MzkgMS45MzkgMCAwIDAtLjgyOS0uMTkxbTAgLjYzOWExLjMyIDEuMzIgMCAwIDEgLjU1Ny4xMjJsNi44NzEgMy4yNzJhMS4zMjIgMS4zMjIgMCAwIDEgLjcwNi44ODNsMS43IDcuMzczYTEuMjY5IDEuMjY5IDAgMCAxLS4yNTggMS4xMTNsLTQuNzUyIDUuOTA3YTEuMyAxLjMgMCAwIDEtMS4wMTkuNDg5SDE2LjdhMS4zIDEuMyAwIDAgMS0xLjAxOS0uNDg5bC00Ljc1Mi01LjkwN2ExLjM2MSAxLjM2MSAwIDAgMS0uMjU4LTEuMTEzbDEuNy03LjM3M2ExLjMgMS4zIDAgMCAxIC43MDYtLjg4M2w2Ljg3MS0zLjI4NmExLjYzMyAxLjYzMyAwIDAgMSAuNTctLjEwOCIvPgogIDwvZz4KICA8cGF0aCBmaWxsPSIjMWZiZmNmIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiIGQ9Ik0xNi41NDUgMjguNjQ5YTEuMjYxIDEuMjYxIDAgMCAwIC45OS40NzRsNS45NzgtLjAxYTEuMjg5IDEuMjg5IDAgMCAwIC45ODctLjQ3NWwzLjY0NC00LjU4OGExLjI4IDEuMjggMCAwIDAgLjA5NC0uNjYxdi02LjQwN2wtNy44MyA0LjUwOGwtNy44MjItNC41djYuNGExLjA3NiAxLjA3NiAwIDAgMCAuMjQxLjY3MVoiLz4KICA8cGF0aCBmaWxsPSIjYzllOWVjIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiIGQ9Im0yMC40MDggMjEuNDlsNy44My00LjUwOGwtNy44MzctNC41MDVsLTcuODE1IDQuNTA5bDcuODIyIDQuNTA0eiIvPgogIDxwYXRoIGZpbGw9IiMzMjZkZTYiIGZpbGwtcnVsZT0iZXZlbm9kZCIgZD0iTTIyLjI3NiAyNC45NzNhLjU0NS41NDUgMCAwIDEtLjcxNS0uMTIyYS40NjQuNDY0IDAgMCAxLS4xLS4yMjVsLS4xODUtMy4zMjZhNi4xOTQgNi4xOTQgMCAwIDEgMy42NzQgMS43NzZabS0yLjc3Ni0uNDI5YS41NTkuNTU5IDAgMCAxLS41NTEuNTMxYS40ODIuNDgyIDAgMCAxLS4yNDUtLjA2MWwtMi43MTUtMS45MzlBNi4yMzMgNi4yMzMgMCAwIDEgMTkuMDMgMjEuNGMuMjI1LS4wNDEuNDI5LS4wODIuNjU0LS4xMjJabTcuNjM0LTEuMzY3bC4yLS4xODR2LS4wNDFhLjQ1OS40NTkgMCAwIDEgLjEtLjMwNmE1Ljk3MSA1Ljk3MSAwIDAgMSAuOTE4LS42MzJjLjA2MS0uMDQxLjEyMy0uMDYyLjE4NC0uMWEyLjk4NiAyLjk4NiAwIDAgMCAuMzQ3LS4yYy4wMi0uMDIuMDYxLS4wNC4xLS4wODFjLjAyLS4wMjEuMDQxLS4wMjEuMDQxLS4wNDFhLjY4Mi42ODIgMCAwIDAgLjE0My0uOTE5YS41OC41OCAwIDAgMC0uNDctLjIyNGEuNzU5Ljc1OSAwIDAgMC0uNDQ5LjE2M2wtLjA0MS4wNDFjLS4wNC4wMi0uMDYxLjA2MS0uMS4wODJhMy40NDcgMy40NDcgMCAwIDAtLjI2NS4yODVhLjk2NC45NjQgMCAwIDEtLjE0My4xNDNhNS4yNCA1LjI0IDAgMCAxLS44MTYuNzM1YS4zMzEuMzMxIDAgMCAxLS4xODQuMDYxYS4yNjYuMjY2IDAgMCAxLS4xMjMtLjAyaC0uMDRsLS4yMzYuMTYxYTkuOTUzIDkuOTUzIDAgMCAwLS44MzctLjc3NmE4LjE1NyA4LjE1NyAwIDAgMC00LjI2Ni0xLjY5NGwtLjAyLS4yNjVsLS4wNDEtLjA0MWEuNDI5LjQyOSAwIDAgMS0uMTY0LS4yNjVhNy4xOTMgNy4xOTMgMCAwIDEgLjA2Mi0xLjF2LS4wMjFhLjcuNyAwIDAgMSAuMDQxLS4yYy4wMi0uMTIyLjA0LS4yNDUuMDYxLS4zODh2LS4xODNhLjYyMy42MjMgMCAwIDAtMS4wODItLjQ3YS42NDYuNjQ2IDAgMCAwLS4xODQuNDd2LjE2M2ExLjIxNCAxLjIxNCAwIDAgMCAuMDYxLjM4OGMuMDIxLjA2MS4wMjEuMTIyLjA0MS4ydi4wMmE1LjMzIDUuMzMgMCAwIDEgLjA2MiAxLjFhLjQzMi40MzIgMCAwIDEtLjE2NC4yNjVsLS4wNDEuMDQxbC0uMDIuMjY1YTEwLjQ2MSAxMC40NjEgMCAwIDAtMS4xLjE2M2E3Ljg3IDcuODcgMCAwIDAtNC4wNDIgMi4yODZsLS4yLS4xNDNoLS4wNDFjLS4wNCAwLS4wODEuMDIxLS4xMjIuMDIxYS4zMzkuMzM5IDAgMCAxLS4xODQtLjA2MWE1LjQyIDUuNDIgMCAwIDEtLjgxNi0uNzU2YS45NjEuOTYxIDAgMCAwLS4xNDMtLjE0MmEzLjQ1NSAzLjQ1NSAwIDAgMC0uMjY1LS4yODZjLS4wMjEtLjAyMS0uMDYyLS4wNDEtLjEtLjA4MmMtLjAyMS0uMDItLjA0MS0uMDItLjA0MS0uMDQxYS43MTUuNzE1IDAgMCAwLS40NTUtLjE2OGEuNTgxLjU4MSAwIDAgMC0uNDcuMjI1YS42ODEuNjgxIDAgMCAwIC4xNDMuOTE4Yy4wMjEgMCAuMDIxLjAyLjA0MS4wMmMuMDQxLjAyMS4wNjEuMDYyLjEuMDgyYTIuOTg2IDIuOTg2IDAgMCAwIC4zNDcuMmEuODQ2Ljg0NiAwIDAgMSAuMTg0LjFhNS45NyA1Ljk3IDAgMCAxIC45MTguNjMzYS4zNzUuMzc1IDAgMCAxIC4xLjMwNnYuMDQxbC4yLjE4NGEuNzY3Ljc2NyAwIDAgMC0uMS4xNjNhNy45ODYgNy45ODYgMCAwIDAtLjYxMiAxLjE3OGwxLjE2NCAxLjQzNmE2LjQxIDYuNDEgMCAwIDEgLjY5My0xLjU5M2wyLjQyOSAyLjE2M2EuNTQ0LjU0NCAwIDAgMSAuMDYyLjc1NWEuNDExLjQxMSAwIDAgMS0uMjQ1LjE2NGwtMS40LjQwN2wuNy44NmExLjI2MSAxLjI2MSAwIDAgMCAuOTkuNDc0bDUuOTc4LS4wMWExLjI4OSAxLjI4OSAwIDAgMCAuOTg3LS40NzVsLjY1NC0uODI0bC0xLjQzOC0uNDA3YS41NTMuNTUzIDAgMCAxLS4zODgtLjY1M2EuNDkuNDkgMCAwIDEgLjEyMy0uMjI1bDIuNDY5LTIuMjIyYTYuNDYzIDYuNDYzIDAgMCAxIC43MDUgMS42NTZsMS4xODctMS40OTRhOC42MTYgOC42MTYgMCAwIDAtLjY4Ny0xLjI4NVoiLz4KPC9zdmc+)](https://www.comet.com/docs/opik/self-host/self_hosting_opik#all-in-one-installation) +| Kubernetes | [![Kubernetes](https://img.shields.io/badge/kubernetes-%23326ce5.svg?&logo=kubernetes&logoColor=white)](https://www.comet.com/docs/opik/self-host/self_hosting_opik#all-in-one-installation) | + + +## 🏁 Get Started + +If you are logging traces to the Cloud Opik platform, you will need to get your API key from the user menu and set it as the `OPIK_API_KEY` environment variable: + ```bash -kubectl delete namespace opik +export OPIK_API_KEY= ``` -and in parallel (in another terminal window/tab) run + +If you are using a local Opik instance, you don't need to set the `OPIK_API_KEY` environment variable and isntead set the environment variable `OPIK_BASE_URL` to point to your local Opik instance: ```bash -kubectl patch chi opik-clickhouse --type json --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]' +export OPIK_BASE_URL=http://localhost:5173 ``` -after the namespace is deleted, run -```bash -./build_and_run.sh --no-build + +You are now ready to start logging traces using either the [Python SDK](https://www.comet.com/docs/opik/python-sdk/overview) or the [REST API](https://www.comet.com/docs/opik/rest-api). + +### 📝 Logging Traces + +The easiest way to get started is to use one of our integrations. Opik supports: + +| Integration | Description | Documentation | Try in Colab | +| ----------- | ----------- | ------------- | ------------ | +| OpenAI | Log traces for all OpenAI LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/openai) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb) | +| LiteLLM | Log traces for all OpenAI LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/openai) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/litellm.ipynb) | +| LangChain | Log traces for all LangChain LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/langchain) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb) | +| LlamaIndex | Log traces for all LlamaIndex LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/llamaindex) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/llama-index.ipynb) | +| Ragas | Log traces for all Ragas evaluations | [Documentation](https://www.comet.com/docs/opik/integrations/ragas) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb) | + +> [!TIP] +> If the framework you are using is not listed above, feel free to [open an issue](https://github.com/comet-ml/opik/issues) or submit a PR with the integration. + +If you are not using any of the frameworks above, you can also using the `track` function decorator to [log traces](https://www.comet.com/docs/opik/tracing/log_traces): + +```python +from opik import track + +@track +def my_llm_function(user_question: str) -> str: + # Your LLM code here + + return "Hello" ``` -to install everything again -Stop minikube -```bash -minikube stop +> [!TIP] +> The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls. + +### 🧑‍⚖️ LLM as a Judge metrics + +The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the [metrics documentation](https://www.comet.com/docs/opik/evaluation/metrics/overview). + +To use them, simply import the relevant metric and use the `score` function: + +```python +from opik.evaluation.metrics import Hallucination + +metric = Hallucination() +score = metric.score( + input="What is the capital of France?", + output="Paris", + context=["France is a country in Europe."] +) +print(score) ``` -Next time you will start the minikube, it will run everything with the same configuration and data you had before. +Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the [metrics documentation](https://www.comet.com/docs/opik/evaluation/metrics/overview). + +### 🔍 Evaluating your LLM Application -## Repository structure +Opik allows you to evaluate your LLM application during development through [Datasets](https://www.comet.com/docs/opik/evaluation/manage_datasets) and [Experiments](https://www.comet.com/docs/opik/evaluation/evaluate_your_llm). -`apps` +You can also run evaluations as part of your CI/CD pipeline using our [PyTest integration](...). -Contains the applications. +## 🤝 Contributing -`apps/opik-backend` +There are many ways to contribute to Opik: -Contains the Opik application. +* Submit [bug reports](https://github.com/comet-ml/opik/issues) and [feature requests](https://github.com/comet-ml/opik/issues) +* Review the documentation and submit [Pull Requests](https://github.com/comet-ml/opik/pulls) to improve it +* Speaking or writing about Opik and [letting us know](https://chat.comet.com) +* Upvoting [popular feature requests](https://github.com/comet-ml/opik/issues?q=is%3Aissue+is%3Aopen+label%3A%22feature+request%22) to show your support -See `apps/opik-backend/README.md`. +To learn more about how to contribute to Opik, please see our [contributing guidelines](CONTRIBUTING.md). diff --git a/apps/opik-backend/README.md b/apps/opik-backend/README.md index 9fe4380913..c89cd021ef 100644 --- a/apps/opik-backend/README.md +++ b/apps/opik-backend/README.md @@ -1,50 +1,3 @@ -# Opik +# Opik backend -How to start the Opik application ---- - -1. Run `mvn clean install` to build your application -1. Start application with `java -jar target/opik-backend-{project.pom.version}.jar server config.yml` -1. To check that your application is running enter url `http://localhost:8080` - -Health Check ---- - -To see your applications health enter url `http://localhost:8081/healthcheck` - -Run migrations ---- - -1. Check pending - migrations `java -jar target/opik-backend-{project.pom.version}.jar {database} status config.yml` -2. Run migrations `java -jar target/opik-backend-{project.pom.version}.jar {database} migrate config.yml` -3. Create schema - tag `java -jar target/opik-backend-{project.pom.version}.jar {database} tag config.yml {tag_name}` -3. Rollback - migrations `java -jar target/opik-backend-{project.pom.version}.jar {database} rollback config.yml --count 1` - OR `java -jar target/opik-backend-{project.pom.version}.jar {database} rollback config.yml --tag {tag_name}` - -Replace `{project.pom.version}` with the version of the project in the pom file. -Replace `{database}` with `db` for MySQL migrations and with `dbAnalytics` for ClickHouse migrations. - - -``` -SHOW DATABASES - -Query id: a9faa739-5565-4fc5-8843-5dc0f72ff46d - -┌─name───────────────┐ -│ INFORMATION_SCHEMA │ -│ opik │ -│ default │ -│ information_schema │ -│ system │ -└────────────────────┘ - -5 rows in set. Elapsed: 0.004 sec. -``` - -* You can curl the ClickHouse REST endpoint - with `echo 'SELECT version()' | curl -H 'X-ClickHouse-User: opik' -H 'X-ClickHouse-Key: opik' 'http://localhost:8123/' -d @-`. - Sample result: `23.8.15.35`. -* You can stop the application with `docker-compose -f apps/opik-backend/docker-compose.yml down`. +If you would like to contribute to the Opik backend, please refer to the [Contribution guide](./CONTRIBUTING.md). diff --git a/apps/opik-documentation/README.md b/apps/opik-documentation/README.md index a1f2a60773..2a952bc2c1 100644 --- a/apps/opik-documentation/README.md +++ b/apps/opik-documentation/README.md @@ -1,108 +1,3 @@ -# Documentation +# Opik documentation -The Comet LLM Evaluation documentation has three main components: - -1. Documentation website with user guides and tutorials -2. Python SDK reference documentation -3. API reference documentation - -## Python SDK - -The Python SDK reference documentation is built using Sphinx. - -### Setup -In order to generate the reference documentation, you will need use Python 3.10 or later. - -You can create the required environment using: - -```python -conda create --name py312_docs_opik python=3.12 -conda activate py312_docs_opik - -cd python-sdk-docs -pip install -r requirements.txt -``` - -### Development - -When building the Sphinx docs, there are two main components: - -1. The source code available at `../../../sdks/python` - This is where all the docstrings are defined -2. The Sphinx files available at `./source` - -In order to view the Sphinx docs, you can run the following commands: - -``` -make dev -``` - -The `python-sdk-docs-dev` command will rebuild the sphinx docs on any changes in the source repo and will also serve the docs on `http://localhost:8000`. If you make any changes to the SDK source code, you will need to run this command again. - -### Publishing the docs - -To deploy the docs, you can run the following command: - -``` -pip install -e ../../../sdks/python/ -make build - -ssh root@146.190.72.83 'cd /var/www/html && rm -rf sdk-reference-docs' -scp -r build/html* root@146.190.72.83:/var/www/html/sdk-reference-docs -``` - -**Note:** The `generate-python-sdk-documentation` command will use the opik SDK version you have installed. If you want to use a specific version of the SDK, make sure it is installed and then run the command. - -## API reference documentation - -The API reference documentation is autogenerated based on OpenAPI specs and served directly from the backend. This ensures the API documentation is always up to date. - - -## Documentation - -The main documentation containing our guides and tutorials are build using Docusaurus. - -### Setup - -In order to run the documentation locally, you will need to install Node.js. For this we will use `nvm` to install the correct version of Node.js. - -``` -curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash -``` - -We will then install the latest version of Node.js. - -``` -nvm install --lts -nvm use --lts -``` - -You can install the dependencies by running the following command: - -``` -cd documentation -npm install -``` - -### Running the documentation locally -You can run the documentation locally by running the following command: - -``` -cd documentation -npm run dev -``` - -**Note:** When running `npm run dev`, cookbooks are rebuild each time a change is saved. For this you will need to have a Python environment running with `jupyter` installed. If you don't have it installed, you can use the command `npm run start` instead to only start docusaurus. - -### Publishing the docs - -You can publish the docs to the dev server by running the following commands: -``` -# Build the files -npm run build - -# Remove the existing files -ssh root@146.190.72.83 'cd /var/www/html && rm -rf 404.html index.html sitemap.xml assets category evaluation img monitoring quickstart self-host tracing cookbook' - -# Push the files -scp -r build/* root@146.190.72.83:/var/www/html/ -``` +If you would like to contribute to the Opik documentation, please refer to the [Contribution guide](./CONTRIBUTING.md). diff --git a/apps/opik-documentation/documentation/docs/cookbook/.gitignore b/apps/opik-documentation/documentation/docs/cookbook/.gitignore index 6a91a439ea..29be9ebb01 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/.gitignore +++ b/apps/opik-documentation/documentation/docs/cookbook/.gitignore @@ -1 +1,2 @@ -*.sqlite \ No newline at end of file +*.sqlite +/data diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.ipynb b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.ipynb index c1f8d2ba04..48c132417e 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.ipynb +++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.ipynb @@ -6,11 +6,59 @@ "source": [ "# Evaluating Opik's Hallucination Metric\n", "\n", - "*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*\n", - "\n", "For this guide we will be evaluating the Hallucination metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Hallucination metric included in the SDK." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating an account on Comet.com\n", + "\n", + "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", + "\n", + "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", + "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are running the Opik platform locally, simply set:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# import os\n", + "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing our environment\n", + "\n", + "First, we will install the necessary libraries, configure the OpenAI API key and create a new Opik dataset" + ] + }, { "cell_type": "code", "execution_count": null, @@ -22,7 +70,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -30,7 +78,6 @@ "import os\n", "import getpass\n", "\n", - "os.environ[\"COMET_URL_OVERRIDE\"] = \"http://localhost:5173/api\"\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key: \")" ] }, @@ -43,23 +90,16 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "status_code: 409, body: {'errors': ['Dataset already exists']}\n" - ] - } - ], + "outputs": [], "source": [ "# Create dataset\n", "from opik import Opik, DatasetItem\n", "import pandas as pd\n", "\n", "client = Opik()\n", + "\n", "try:\n", " # Create dataset\n", " dataset = client.create_dataset(name=\"HaluBench\", description=\"HaluBench dataset\")\n", @@ -86,60 +126,20 @@ " print(e)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluating the hallucination metric\n", + "\n", + "We can use the Opik SDK to compute a hallucination score for each item in the dataset:" + ] + }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Running tasks: 100%|██████████| 500/500 [00:53<00:00, 9.43it/s]\n", - "Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 513253.06it/s]\n" - ] - }, - { - "data": { - "text/html": [ - "
╭─ HaluBench (500 samples) ────────────╮\n",
-       "│                                      │\n",
-       "│ Total time:        00:00:53          │\n",
-       "│ Number of samples: 500               │\n",
-       "│                                      │\n",
-       "│ Detected hallucination: 0.8020 (avg) │\n",
-       "│                                      │\n",
-       "╰──────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "╭─ HaluBench (500 samples) ────────────╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:53 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 500 │\n", - "│ │\n", - "│ \u001b[1;32mDetected hallucination: 0.8020 (avg)\u001b[0m │\n", - "│ │\n", - "╰──────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
-       "
\n" - ], - "text/plain": [ - "Uploading results to Opik \u001b[33m...\u001b[0m \n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from opik.evaluation.metrics import Hallucination\n", "from opik.evaluation import evaluate\n", @@ -154,8 +154,6 @@ " self.name = name\n", "\n", " def score(self, hallucination_score, expected_hallucination_score, **kwargs):\n", - " expected_hallucination_score = 1 if expected_hallucination_score == \"FAIL\" else 0\n", - " \n", " return score_result.ScoreResult(\n", " value= None if hallucination_score is None else hallucination_score == expected_hallucination_score,\n", " name=self.name,\n", @@ -179,7 +177,7 @@ " hallucination_reason = str(e)\n", " \n", " return {\n", - " \"hallucination_score\": hallucination_score,\n", + " \"hallucination_score\": \"FAIL\" if hallucination_score == 1 else \"PASS\",\n", " \"hallucination_reason\": hallucination_reason,\n", " \"expected_hallucination_score\": x.expected_output[\"expected_output\"]\n", " }\n", @@ -198,8 +196,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset." + "We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset and we can see the specific items where hallucinations were not detected.\n", + "\n", + "![Hallucination Evaluation](/img/cookbook/hallucination_metric_cookbook.png)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] } ], "metadata": { diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md index 1a4ce3ed72..1d4267d341 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md +++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md @@ -1,9 +1,34 @@ # Evaluating Opik's Hallucination Metric -*This cookbook was created from a Jypyter notebook which can be found [here](TBD).* - For this guide we will be evaluating the Hallucination metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Hallucination metric included in the SDK. +## Creating an account on Comet.com + +[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. + +> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. + + +```python +import os +import getpass + +os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") +os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") +``` + +If you are running the Opik platform locally, simply set: + + +```python +# import os +# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" +``` + +## Preparing our environment + +First, we will install the necessary libraries, configure the OpenAI API key and create a new Opik dataset + ```python %pip install pyarrow fsspec huggingface_hub --quiet @@ -15,7 +40,6 @@ For this guide we will be evaluating the Hallucination metric included in the LL import os import getpass -os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api" os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ") ``` @@ -28,6 +52,7 @@ from opik import Opik, DatasetItem import pandas as pd client = Opik() + try: # Create dataset dataset = client.create_dataset(name="HaluBench", description="HaluBench dataset") @@ -54,8 +79,9 @@ except Exception as e: print(e) ``` - status_code: 409, body: {'errors': ['Dataset already exists']} +## Evaluating the hallucination metric +We can use the Opik SDK to compute a hallucination score for each item in the dataset: ```python @@ -72,8 +98,6 @@ class CheckHallucinated(base_metric.BaseMetric): self.name = name def score(self, hallucination_score, expected_hallucination_score, **kwargs): - expected_hallucination_score = 1 if expected_hallucination_score == "FAIL" else 0 - return score_result.ScoreResult( value= None if hallucination_score is None else hallucination_score == expected_hallucination_score, name=self.name, @@ -97,7 +121,7 @@ def evaluation_task(x: DatasetItem): hallucination_reason = str(e) return { - "hallucination_score": hallucination_score, + "hallucination_score": "FAIL" if hallucination_score == 1 else "PASS", "hallucination_reason": hallucination_reason, "expected_hallucination_score": x.expected_output["expected_output"] } @@ -112,27 +136,8 @@ res = evaluate( ) ``` - Running tasks: 100%|██████████| 500/500 [00:53<00:00, 9.43it/s] - Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 513253.06it/s] - - - -
╭─ HaluBench (500 samples) ────────────╮
-│                                      │
-│ Total time:        00:00:53          │
-│ Number of samples: 500               │
-│                                      │
-│ Detected hallucination: 0.8020 (avg) │
-│                                      │
-╰──────────────────────────────────────╯
-
- - - - -
Uploading results to Opik ... 
-
+We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset and we can see the specific items where hallucinations were not detected. +![Hallucination Evaluation](/img/cookbook/hallucination_metric_cookbook.png) -We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset. diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb index 90bcb11862..98eb7e5a0f 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb +++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb @@ -11,17 +11,65 @@ "For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating an account on Comet.com\n", + "\n", + "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", + "\n", + "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", + "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are running the Opik platform locally, simply set:" + ] + }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#import os\n", + "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing our environment\n", + "\n", + "First, we will install the necessary libraries and configure the OpenAI API key and download a reference moderation dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# Configure OpenAI\n", "import os\n", "import getpass\n", "\n", - "os.environ[\"COMET_URL_OVERRIDE\"] = \"http://localhost:5173/api\"\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key: \")" ] }, @@ -34,17 +82,9 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "status_code: 409, body: {'errors': ['Dataset already exists']}\n" - ] - } - ], + "outputs": [], "source": [ "# Create dataset\n", "from opik import Opik, DatasetItem\n", @@ -87,60 +127,20 @@ " print(e)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluating the moderation metric\n", + "\n", + "We can use the Opik SDK to compute a moderation score for each item in the dataset:" + ] + }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Running tasks: 100%|██████████| 500/500 [00:34<00:00, 14.44it/s]\n", - "Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 379712.48it/s]\n" - ] - }, - { - "data": { - "text/html": [ - "
╭─ OpenAIModerationDataset (500 samples) ─╮\n",
-       "│                                         │\n",
-       "│ Total time:        00:00:34             │\n",
-       "│ Number of samples: 500                  │\n",
-       "│                                         │\n",
-       "│ Detected Moderation: 0.8460 (avg)       │\n",
-       "│                                         │\n",
-       "╰─────────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "╭─ OpenAIModerationDataset (500 samples) ─╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:34 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 500 │\n", - "│ │\n", - "│ \u001b[1;32mDetected Moderation: 0.8460 (avg)\u001b[0m │\n", - "│ │\n", - "╰─────────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
-       "
\n" - ], - "text/plain": [ - "Uploading results to Opik \u001b[33m...\u001b[0m \n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from opik.evaluation.metrics import Moderation\n", "from opik.evaluation import evaluate\n", @@ -196,7 +196,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model." + "We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. We can view a breakdown of the results in the Opik UI:\n", + "\n", + "![Moderation Evaluation](/img/cookbook/moderation_metric_cookbook.png)" ] } ], diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md index 8da8450f6f..9dcfc96e45 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md +++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md @@ -4,13 +4,38 @@ For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK. +## Creating an account on Comet.com + +[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. + +> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. + + +```python +import os +import getpass + +os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") +os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") +``` + +If you are running the Opik platform locally, simply set: + + +```python +#import os +# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" +``` + +## Preparing our environment + +First, we will install the necessary libraries and configure the OpenAI API key and download a reference moderation dataset. + ```python -# Configure OpenAI import os import getpass -os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api" os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ") ``` @@ -59,8 +84,9 @@ except Exception as e: print(e) ``` - status_code: 409, body: {'errors': ['Dataset already exists']} +## Evaluating the moderation metric +We can use the Opik SDK to compute a moderation score for each item in the dataset: ```python @@ -114,27 +140,6 @@ res = evaluate( ) ``` - Running tasks: 100%|██████████| 500/500 [00:34<00:00, 14.44it/s] - Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 379712.48it/s] - - - -
╭─ OpenAIModerationDataset (500 samples) ─╮
-│                                         │
-│ Total time:        00:00:34             │
-│ Number of samples: 500                  │
-│                                         │
-│ Detected Moderation: 0.8460 (avg)       │
-│                                         │
-╰─────────────────────────────────────────╯
-
- - - - -
Uploading results to Opik ... 
-
- - +We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. We can view a breakdown of the results in the Opik UI: -We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. +![Moderation Evaluation](/img/cookbook/moderation_metric_cookbook.png) diff --git a/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb b/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb index 10683b5a43..aee519aa8d 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb +++ b/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb @@ -4,9 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Using LLM Evaluation with Langchain\n", - "\n", - "*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*\n", + "# Using Opik with Langchain\n", "\n", "For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data.\n", "\n", @@ -17,6 +15,47 @@ "3. Automating the evaluation of the SQL queries on the synthetic dataset" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating an account on Comet.com\n", + "\n", + "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", + "\n", + "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", + "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are running the Opik platform locally, simply set:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# import os\n", + "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -28,34 +67,18 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Note: you may need to restart the kernel to use updated packages.\n" - ] - } - ], + "outputs": [], "source": [ - "%pip install --upgrade --quiet langchain langchain-community langchain-openai" + "%pip install --upgrade --quiet opik langchain langchain-community langchain-openai" ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 19, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Chinook database downloaded\n" - ] - } - ], + "outputs": [], "source": [ "# Download the relevant data\n", "import os\n", @@ -65,7 +88,12 @@ "import os\n", "\n", "url = \"https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite\"\n", - "filename = \"Chinook_Sqlite.sqlite\"\n", + "filename = \"./data/chinook/Chinook_Sqlite.sqlite\"\n", + "\n", + "folder = os.path.dirname(filename)\n", + "\n", + "if not os.path.exists(folder):\n", + " os.makedirs(folder)\n", "\n", "if not os.path.exists(filename):\n", " response = requests.get(url)\n", @@ -78,15 +106,12 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import getpass\n", - "\n", - "os.environ[\"COMET_URL_OVERRIDE\"] = \"http://localhost:5173/api\"\n", - "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key: \")" ] }, @@ -103,46 +128,15 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\n", - " \"result\": [\n", - " \"Which customer has made the most purchases in terms of total dollars spent?\",\n", - " \"What is the total number of tracks sold in each genre?\",\n", - " \"How many unique albums have been purchased by customers from different countries?\",\n", - " \"Which employee sold the most expensive track?\",\n", - " \"What is the average length of tracks purchased by customers from each country?\",\n", - " \"Which customer has spent the most money on tracks in the rock genre?\",\n", - " \"What is the total revenue generated by each employee?\",\n", - " \"How many unique artists are featured in each playlist?\",\n", - " \"Which customer has the highest average rating on their purchased tracks?\",\n", - " \"What is the total value of invoices generated by each sales support agent?\",\n", - " \"How many tracks have been sold to customers in each country?\",\n", - " \"Which artist has the most tracks featured in the top 100 selling tracks?\",\n", - " \"What is the total value of invoices generated in each year?\",\n", - " \"How many unique tracks have been purchased by customers in each city?\",\n", - " \"Which employee has the highest average rating on tracks they have sold?\",\n", - " \"What is the total number of tracks purchased by customers who have purchased tracks in the pop genre?\",\n", - " \"Which customer has purchased the highest number of unique tracks?\",\n", - " \"How many customer transactions have occurred in each year?\",\n", - " \"Which artist has the most tracks featured in the top 100 selling tracks in the rock genre?\",\n", - " \"What is the total number of tracks purchased by customers who have purchased tracks in the jazz genre?\"\n", - " ]\n", - "}\n" - ] - } - ], + "outputs": [], "source": [ "from opik.integrations.openai import track_openai\n", "from openai import OpenAI\n", "import json\n", "\n", - "os.environ[\"COMET_PROJECT_NAME\"] = \"openai-integration\"\n", + "os.environ[\"OPIK_PROJECT_NAME\"] = \"langchain-integration-demo\"\n", "client = OpenAI()\n", "\n", "openai_client = track_openai(client)\n", @@ -174,7 +168,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -202,34 +196,25 @@ "\n", "We will be using the `create_sql_query_chain` function from the `langchain` library to create a SQL query to answer the question.\n", "\n", - "We will be using the `CometTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet." + "We will be using the `OpikTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet." ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "SELECT COUNT(\"EmployeeId\") AS \"TotalEmployees\" FROM \"Employee\"\n" - ] - } - ], + "outputs": [], "source": [ "# Use langchain to create a SQL query to answer the question\n", "from langchain.chains import create_sql_query_chain\n", "from langchain_openai import ChatOpenAI\n", "from opik.integrations.langchain import OpikTracer\n", "\n", - "os.environ[\"COMET_PROJECT_NAME\"] = \"sql_question_answering\"\n", "opik_tracer = OpikTracer(tags=[\"simple_chain\"])\n", "\n", "llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n", - "chain = create_sql_query_chain(llm, db)\n", - "response = chain.invoke({\"question\": \"How many employees are there ?\"}, {\"callbacks\": [opik_tracer]})\n", + "chain = create_sql_query_chain(llm, db).with_config({\"callbacks\": [opik_tracer]})\n", + "response = chain.invoke({\"question\": \"How many employees are there ?\"})\n", "response\n", "\n", "print(response)" @@ -248,77 +233,45 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Running tasks: 100%|██████████| 20/20 [00:03<00:00, 5.37it/s]\n", - "Scoring outputs: 100%|██████████| 20/20 [00:00<00:00, 82321.96it/s]\n" - ] - }, - { - "data": { - "text/html": [ - "
╭─ synthetic_questions (20 samples) ─╮\n",
-       "│                                    │\n",
-       "│ Total time:        00:00:03        │\n",
-       "│ Number of samples: 20              │\n",
-       "│                                    │\n",
-       "│ ContainsHello: 0.0000 (avg)        │\n",
-       "│                                    │\n",
-       "╰────────────────────────────────────╯\n",
-       "
\n" - ], - "text/plain": [ - "╭─ synthetic_questions (20 samples) ─╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:03 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 20 │\n", - "│ │\n", - "│ \u001b[1;32mContainsHello: 0.0000 (avg)\u001b[0m │\n", - "│ │\n", - "╰────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
-       "
\n" - ], - "text/plain": [ - "Uploading results to Opik \u001b[33m...\u001b[0m \n" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "from opik import Opik, track\n", "from opik.evaluation import evaluate\n", - "from opik.evaluation.metrics import Contains\n", - "\n", - "\n", - "contains_hello = Contains(name=\"ContainsHello\")\n", + "from opik.evaluation.metrics import base_metric, score_result\n", + "from typing import Any\n", + "\n", + "class ValidSQLQuery(base_metric.BaseMetric):\n", + " def __init__(self, name: str, db: Any):\n", + " self.name = name\n", + " self.db = db\n", + "\n", + " def score(self, output: str, **ignored_kwargs: Any):\n", + " # Add you logic here\n", + "\n", + " try:\n", + " db.run(output)\n", + " return score_result.ScoreResult(\n", + " name=self.name,\n", + " value=1,\n", + " reason=\"Query ran successfully\"\n", + " )\n", + " except Exception as e:\n", + " return score_result.ScoreResult(\n", + " name=self.name,\n", + " value=0,\n", + " reason=str(e)\n", + " )\n", + "\n", + "valid_sql_query = ValidSQLQuery(name=\"valid_sql_query\", db=db)\n", "\n", "client = Opik()\n", "dataset = client.get_dataset(\"synthetic_questions\")\n", "\n", "@track()\n", - "def llm_chain(input):\n", - " opik_tracer = OpikTracer(tags=[\"simple_chain\"])\n", - "\n", - " db = SQLDatabase.from_uri(\"sqlite:///Chinook_Sqlite.sqlite\")\n", - " llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n", - " chain = create_sql_query_chain(llm, db)\n", - " response = chain.invoke({\"question\": input}, {\"callbacks\": [opik_tracer]})\n", + "def llm_chain(input: str) -> str:\n", + " response = chain.invoke({\"question\": input})\n", " \n", " return response\n", "\n", @@ -331,25 +284,25 @@ " }\n", "\n", "res = evaluate(\n", - " experiment_name=\"sql_question_answering_v2\",\n", + " experiment_name=\"SQL question answering\",\n", " dataset=dataset,\n", " task=evaluation_task,\n", - " scoring_metrics=[contains_hello]\n", + " scoring_metrics=[valid_sql_query]\n", ")" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "The evaluation results are now uploaded to the Opik platform and can be viewed in the UI.\n", + "\n", + "![LangChain Evaluation](/img/cookbook/langchain_cookbook.png)" + ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [] } ], diff --git a/apps/opik-documentation/documentation/docs/cookbook/langchain.md b/apps/opik-documentation/documentation/docs/cookbook/langchain.md index fd1f4fecf0..066841484a 100644 --- a/apps/opik-documentation/documentation/docs/cookbook/langchain.md +++ b/apps/opik-documentation/documentation/docs/cookbook/langchain.md @@ -1,6 +1,4 @@ -# Using LLM Evaluation with Langchain - -*This cookbook was created from a Jypyter notebook which can be found [here](TBD).* +# Using Opik with Langchain For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data. @@ -10,18 +8,38 @@ We will highlight three different parts of the workflow: 2. Creating a LangChain chain to generate SQL queries 3. Automating the evaluation of the SQL queries on the synthetic dataset -## Preparing our environment +## Creating an account on Comet.com -First, we will install the necessary libraries, download the Chinook database and set up our different API keys. +[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. + +> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. + + +```python +import os +import getpass + +os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") +os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") +``` + +If you are running the Opik platform locally, simply set: ```python -%pip install --upgrade --quiet langchain langchain-community langchain-openai +# import os +# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" ``` - Note: you may need to restart the kernel to use updated packages. +## Preparing our environment + +First, we will install the necessary libraries, download the Chinook database and set up our different API keys. +```python +%pip install --upgrade --quiet opik langchain langchain-community langchain-openai +``` + ```python # Download the relevant data @@ -32,7 +50,12 @@ import requests import os url = "https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite" -filename = "Chinook_Sqlite.sqlite" +filename = "./data/chinook/Chinook_Sqlite.sqlite" + +folder = os.path.dirname(filename) + +if not os.path.exists(folder): + os.makedirs(folder) if not os.path.exists(filename): response = requests.get(url) @@ -43,16 +66,10 @@ if not os.path.exists(filename): db = SQLDatabase.from_uri(f"sqlite:///{filename}") ``` - Chinook database downloaded - - ```python import os import getpass - -os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api" - os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ") ``` @@ -68,7 +85,7 @@ from opik.integrations.openai import track_openai from openai import OpenAI import json -os.environ["COMET_PROJECT_NAME"] = "openai-integration" +os.environ["OPIK_PROJECT_NAME"] = "langchain-integration-demo" client = OpenAI() openai_client = track_openai(client) @@ -91,32 +108,6 @@ completion = openai_client.chat.completions.create( print(completion.choices[0].message.content) ``` - { - "result": [ - "Which customer has made the most purchases in terms of total dollars spent?", - "What is the total number of tracks sold in each genre?", - "How many unique albums have been purchased by customers from different countries?", - "Which employee sold the most expensive track?", - "What is the average length of tracks purchased by customers from each country?", - "Which customer has spent the most money on tracks in the rock genre?", - "What is the total revenue generated by each employee?", - "How many unique artists are featured in each playlist?", - "Which customer has the highest average rating on their purchased tracks?", - "What is the total value of invoices generated by each sales support agent?", - "How many tracks have been sold to customers in each country?", - "Which artist has the most tracks featured in the top 100 selling tracks?", - "What is the total value of invoices generated in each year?", - "How many unique tracks have been purchased by customers in each city?", - "Which employee has the highest average rating on tracks they have sold?", - "What is the total number of tracks purchased by customers who have purchased tracks in the pop genre?", - "Which customer has purchased the highest number of unique tracks?", - "How many customer transactions have occurred in each year?", - "Which artist has the most tracks featured in the top 100 selling tracks in the rock genre?", - "What is the total number of tracks purchased by customers who have purchased tracks in the jazz genre?" - ] - } - - Now that we have our synthetic dataset, we can create a dataset in Comet and insert the questions into it. @@ -141,7 +132,7 @@ except Exception as e: We will be using the `create_sql_query_chain` function from the `langchain` library to create a SQL query to answer the question. -We will be using the `CometTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet. +We will be using the `OpikTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet. ```python @@ -150,20 +141,16 @@ from langchain.chains import create_sql_query_chain from langchain_openai import ChatOpenAI from opik.integrations.langchain import OpikTracer -os.environ["COMET_PROJECT_NAME"] = "sql_question_answering" opik_tracer = OpikTracer(tags=["simple_chain"]) llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) -chain = create_sql_query_chain(llm, db) -response = chain.invoke({"question": "How many employees are there ?"}, {"callbacks": [opik_tracer]}) +chain = create_sql_query_chain(llm, db).with_config({"callbacks": [opik_tracer]}) +response = chain.invoke({"question": "How many employees are there ?"}) response print(response) ``` - SELECT COUNT("EmployeeId") AS "TotalEmployees" FROM "Employee" - - ## Automatting the evaluation In order to ensure our LLM application is working correctly, we will test it on our synthetic dataset. @@ -174,22 +161,39 @@ For this we will be using the `evaluate` function from the `opik` library. We wi ```python from opik import Opik, track from opik.evaluation import evaluate -from opik.evaluation.metrics import Contains - - -contains_hello = Contains(name="ContainsHello") +from opik.evaluation.metrics import base_metric, score_result +from typing import Any + +class ValidSQLQuery(base_metric.BaseMetric): + def __init__(self, name: str, db: Any): + self.name = name + self.db = db + + def score(self, output: str, **ignored_kwargs: Any): + # Add you logic here + + try: + db.run(output) + return score_result.ScoreResult( + name=self.name, + value=1, + reason="Query ran successfully" + ) + except Exception as e: + return score_result.ScoreResult( + name=self.name, + value=0, + reason=str(e) + ) + +valid_sql_query = ValidSQLQuery(name="valid_sql_query", db=db) client = Opik() dataset = client.get_dataset("synthetic_questions") @track() -def llm_chain(input): - opik_tracer = OpikTracer(tags=["simple_chain"]) - - db = SQLDatabase.from_uri("sqlite:///Chinook_Sqlite.sqlite") - llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) - chain = create_sql_query_chain(llm, db) - response = chain.invoke({"question": input}, {"callbacks": [opik_tracer]}) +def llm_chain(input: str) -> str: + response = chain.invoke({"question": input}) return response @@ -202,42 +206,15 @@ def evaluation_task(item): } res = evaluate( - experiment_name="sql_question_answering_v2", + experiment_name="SQL question answering", dataset=dataset, task=evaluation_task, - scoring_metrics=[contains_hello] + scoring_metrics=[valid_sql_query] ) ``` - Running tasks: 100%|██████████| 20/20 [00:03<00:00, 5.37it/s] - Scoring outputs: 100%|██████████| 20/20 [00:00<00:00, 82321.96it/s] - - - -
╭─ synthetic_questions (20 samples) ─╮
-│                                    │
-│ Total time:        00:00:03        │
-│ Number of samples: 20              │
-│                                    │
-│ ContainsHello: 0.0000 (avg)        │
-│                                    │
-╰────────────────────────────────────╯
-
- +The evaluation results are now uploaded to the Opik platform and can be viewed in the UI. +![LangChain Evaluation](/img/cookbook/langchain_cookbook.png) -
Uploading results to Opik ... 
-
- - - - -```python - -``` - - -```python - -``` diff --git a/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb b/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb new file mode 100644 index 0000000000..95ec0d7ee6 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb @@ -0,0 +1,232 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using Opik with OpenAI\n", + "\n", + "Opik integrates with OpenAI to provide a simple way to log traces for all OpenAI LLM calls. This works for all OpenAI models, including if you are using the streaming API.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating an account on Comet.com\n", + "\n", + "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", + "\n", + "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", + "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are running the Opik platform locally, simply set:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# import os\n", + "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing our environment\n", + "\n", + "First, we will install the necessary libraries and set up our OpenAI API keys." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet opik openai" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key: \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Logging traces\n", + "\n", + "In order to log traces to Opik, we need to wrap our OpenAI calls with the `track_openai` function:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Opik was a mischievous little elf who loved pulling pranks on his friends in the enchanted forest. One day, his antics went too far and he accidentally turned himself into a fluffy pink bunny.\n" + ] + } + ], + "source": [ + "from opik.integrations.openai import track_openai\n", + "from openai import OpenAI\n", + "\n", + "os.environ[\"OPIK_PROJECT_NAME\"] = \"openai-integration-demo\"\n", + "client = OpenAI()\n", + "\n", + "openai_client = track_openai(client)\n", + "\n", + "prompt = \"\"\"\n", + "Write a short two sentence story about Opik.\n", + "\"\"\"\n", + "\n", + "completion = openai_client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\",\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + ")\n", + "\n", + "print(completion.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The prompt and response messages are automatically logged to Opik and can be viewed in the UI.\n", + "\n", + "![OpenAI Integration](/img/cookbook/openai_trace_cookbook.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using it with the `track` decorator\n", + "\n", + "If you have multiple steps in your LLM pipeline, you can use the `track` decorator to log the traces for each step. If OpenAI is called within one of these steps, the LLM call with be associated with that corresponding step:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Opik was a young wizard who lived in the small village of Mithos, where magic was both feared and revered. From a young age, Opik had shown a natural talent for magic, much to the dismay of his parents who were simple farmers. They feared the power that their son possessed and did everything they could to suppress it.\\n\\nDespite his parents' efforts, Opik continued to practice his magic in secret, honing his skills and learning all he could about the ancient art. He longed to become a powerful wizard, respected and feared by all who knew him. But as he grew older, he also began to realize that his thirst for power was beginning to consume him, turning him into a dark and reckless mage.\\n\\nOne day, a mysterious figure approached Opik in the village square, offering him a chance to join a secret society of powerful wizards. Intrigued by the offer, Opik accepted and was soon initiated into the group, which called themselves the Arcanum.\\n\\nUnder the guidance of the Arcanum, Opik's power grew exponentially. He could wield spells of immense power, bending reality to his will with a mere flick of his wrist. But as his power grew, so did his arrogance and greed. He began to see himself as above all others, using his magic to manipulate and control those around him.\\n\\nOne day, a great evil swept across the land, threatening to destroy everything in its path. The Arcanum tasked Opik with defeating this evil, seeing it as a chance for him to prove his worth and redeem himself. But as he faced the darkness head-on, Opik realized that true power lay not in domination and control, but in compassion and selflessness.\\n\\nIn a moment of clarity, Opik cast aside his dark ambitions and embraced the light within him. With newfound resolve, he fought against the evil that threatened his home, using his magic not to destroy, but to protect and heal. In the end, it was not his raw power that saved the day, but his courage and heart.\\n\\nAnd so, Opik returned to his village a changed man, no longer seeking power for power's sake, but striving to use his magic for the good of all. The villagers welcomed him back with open arms, seeing in him a hero and a protector. And as he walked among them, a new journey unfolded before him - a journey of redemption, compassion, and true magic.\"" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from opik import track\n", + "from opik.integrations.openai import track_openai\n", + "from openai import OpenAI\n", + "\n", + "os.environ[\"OPIK_PROJECT_NAME\"] = \"openai-integration-demo\"\n", + "\n", + "client = OpenAI()\n", + "openai_client = track_openai(client)\n", + "\n", + "@track\n", + "def generate_story(prompt):\n", + " res = openai_client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\",\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return res.choices[0].message.content\n", + "\n", + "@track\n", + "def generate_topic():\n", + " prompt = \"Generate a topic for a story about Opik.\"\n", + " res = openai_client.chat.completions.create(\n", + " model=\"gpt-3.5-turbo\",\n", + " messages=[\n", + " {\"role\": \"user\", \"content\": prompt}\n", + " ]\n", + " )\n", + " return res.choices[0].message.content\n", + "\n", + "@track\n", + "def generate_opik_story():\n", + " topic = generate_topic()\n", + " story = generate_story(topic)\n", + " return story\n", + "\n", + "generate_opik_story()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The trace can now be viewed in the UI:\n", + "\n", + "![OpenAI Integration](/img/cookbook/openai_trace_decorator_cookbook.png)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "py312_llm_eval", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/apps/opik-documentation/documentation/docs/cookbook/openai.md b/apps/opik-documentation/documentation/docs/cookbook/openai.md new file mode 100644 index 0000000000..7c0a84d861 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/cookbook/openai.md @@ -0,0 +1,135 @@ +# Using Opik with OpenAI + +Opik integrates with OpenAI to provide a simple way to log traces for all OpenAI LLM calls. This works for all OpenAI models, including if you are using the streaming API. + + +## Creating an account on Comet.com + +[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. + +> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. + + +```python +import os +import getpass + +os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") +os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") +``` + +If you are running the Opik platform locally, simply set: + + +```python +# import os +# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" +``` + +## Preparing our environment + +First, we will install the necessary libraries and set up our OpenAI API keys. + + +```python +%pip install --upgrade --quiet opik openai +``` + + +```python +import os +import getpass +os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ") +``` + +## Logging traces + +In order to log traces to Opik, we need to wrap our OpenAI calls with the `track_openai` function: + + +```python +from opik.integrations.openai import track_openai +from openai import OpenAI + +os.environ["OPIK_PROJECT_NAME"] = "openai-integration-demo" +client = OpenAI() + +openai_client = track_openai(client) + +prompt = """ +Write a short two sentence story about Opik. +""" + +completion = openai_client.chat.completions.create( + model="gpt-3.5-turbo", + messages=[ + {"role": "user", "content": prompt} + ] +) + +print(completion.choices[0].message.content) +``` + + Opik was a mischievous little elf who loved pulling pranks on his friends in the enchanted forest. One day, his antics went too far and he accidentally turned himself into a fluffy pink bunny. + + +The prompt and response messages are automatically logged to Opik and can be viewed in the UI. + +![OpenAI Integration](/img/cookbook/openai_trace_cookbook.png) + +## Using it with the `track` decorator + +If you have multiple steps in your LLM pipeline, you can use the `track` decorator to log the traces for each step. If OpenAI is called within one of these steps, the LLM call with be associated with that corresponding step: + + +```python +from opik import track +from opik.integrations.openai import track_openai +from openai import OpenAI + +os.environ["OPIK_PROJECT_NAME"] = "openai-integration-demo" + +client = OpenAI() +openai_client = track_openai(client) + +@track +def generate_story(prompt): + res = openai_client.chat.completions.create( + model="gpt-3.5-turbo", + messages=[ + {"role": "user", "content": prompt} + ] + ) + return res.choices[0].message.content + +@track +def generate_topic(): + prompt = "Generate a topic for a story about Opik." + res = openai_client.chat.completions.create( + model="gpt-3.5-turbo", + messages=[ + {"role": "user", "content": prompt} + ] + ) + return res.choices[0].message.content + +@track +def generate_opik_story(): + topic = generate_topic() + story = generate_story(topic) + return story + +generate_opik_story() + +``` + + + + + "Opik was a young wizard who lived in the small village of Mithos, where magic was both feared and revered. From a young age, Opik had shown a natural talent for magic, much to the dismay of his parents who were simple farmers. They feared the power that their son possessed and did everything they could to suppress it.\n\nDespite his parents' efforts, Opik continued to practice his magic in secret, honing his skills and learning all he could about the ancient art. He longed to become a powerful wizard, respected and feared by all who knew him. But as he grew older, he also began to realize that his thirst for power was beginning to consume him, turning him into a dark and reckless mage.\n\nOne day, a mysterious figure approached Opik in the village square, offering him a chance to join a secret society of powerful wizards. Intrigued by the offer, Opik accepted and was soon initiated into the group, which called themselves the Arcanum.\n\nUnder the guidance of the Arcanum, Opik's power grew exponentially. He could wield spells of immense power, bending reality to his will with a mere flick of his wrist. But as his power grew, so did his arrogance and greed. He began to see himself as above all others, using his magic to manipulate and control those around him.\n\nOne day, a great evil swept across the land, threatening to destroy everything in its path. The Arcanum tasked Opik with defeating this evil, seeing it as a chance for him to prove his worth and redeem himself. But as he faced the darkness head-on, Opik realized that true power lay not in domination and control, but in compassion and selflessness.\n\nIn a moment of clarity, Opik cast aside his dark ambitions and embraced the light within him. With newfound resolve, he fought against the evil that threatened his home, using his magic not to destroy, but to protect and heal. In the end, it was not his raw power that saved the day, but his courage and heart.\n\nAnd so, Opik returned to his village a changed man, no longer seeking power for power's sake, but striving to use his magic for the good of all. The villagers welcomed him back with open arms, seeing in him a hero and a protector. And as he walked among them, a new journey unfolded before him - a journey of redemption, compassion, and true magic." + + + +The trace can now be viewed in the UI: + +![OpenAI Integration](/img/cookbook/openai_trace_decorator_cookbook.png) diff --git a/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb b/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb new file mode 100644 index 0000000000..0a6ce78e65 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb @@ -0,0 +1,285 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using Ragas to evaluate RAG pipelines\n", + "\n", + "In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines.\n", + "\n", + "There are two main ways to use Opik with Ragas:\n", + "\n", + "1. Using Ragas metrics to score traces\n", + "2. Using the Ragas `evaluate` function to score a dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating an account on Comet.com\n", + "\n", + "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n", + "\n", + "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n", + "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you are running the Opik platform locally, simply set:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# import os\n", + "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing our environment\n", + "\n", + "First, we will install the necessary libraries and configure the OpenAI API key." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install opik ragas --quiet\n", + "\n", + "import os\n", + "import getpass\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Integrating Opik with Ragas\n", + "\n", + "### Using Ragas metrics to score traces\n", + "\n", + "Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: `answer_relevancy`, `answer_similarity`, `answer_correctness`, `context_precision`, `context_recall`, `context_entity_recall`, `summarization_score`. You can find a full list of metrics in the [Ragas documentation](https://docs.ragas.io/en/latest/references/metrics.html#).\n", + "\n", + "These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the `answer_relevancy` metric.\n", + "\n", + "#### Create the Ragas metric\n", + "\n", + "In order to use the Ragas metric without using the `evaluate` function, you need to initialize the metric with a `RunConfig` object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled.\n", + "\n", + "We will first start by initializing the Ragas metric:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import the metric\n", + "from ragas.metrics import AnswerRelevancy\n", + "\n", + "# Import some additional dependencies\n", + "from langchain_openai.chat_models import ChatOpenAI\n", + "from langchain_openai.embeddings import OpenAIEmbeddings\n", + "from ragas.llms import LangchainLLMWrapper\n", + "from ragas.embeddings import LangchainEmbeddingsWrapper\n", + "\n", + "# Initialize the Ragas metric\n", + "llm = LangchainLLMWrapper(ChatOpenAI())\n", + "emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())\n", + "\n", + "answer_relevancy_metric = AnswerRelevancy(llm=llm, embeddings=emb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once the metric is initialized, you can use it to score a sample question. Given that the metric scoring is done asynchronously, you need to use the `asyncio` library to run the scoring function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run this cell first if you are running this in a Jupyter notebook\n", + "import nest_asyncio\n", + "\n", + "nest_asyncio.apply()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from ragas.integrations.opik import OpikTracer\n", + "\n", + "# Define the scoring function\n", + "def compute_metric(opik_tracer, metric, row):\n", + " async def get_score(opik_tracer, metric, row):\n", + " score = await metric.ascore(row, callbacks=[opik_tracer])\n", + " return score\n", + "\n", + " # Run the async function using the current event loop\n", + " loop = asyncio.get_event_loop()\n", + " \n", + " result = loop.run_until_complete(get_score(opik_tracer, metric, row))\n", + " return result\n", + "\n", + "# Score a simple example\n", + "row = {\n", + " \"question\": \"What is the capital of France?\",\n", + " \"answer\": \"Paris\",\n", + " \"contexts\": [\"Paris is the capital of France.\", \"Paris is in France.\"]\n", + "}\n", + "\n", + "opik_tracer = OpikTracer()\n", + "score = compute_metric(opik_tracer, answer_relevancy_metric, row)\n", + "print(\"Answer Relevancy score:\", score)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you now navigate to Opik, you will be able to see that a new trace has been created in the `Default Project` project.\n", + "\n", + "#### Score traces\n", + "\n", + "You can score traces by using the `get_current_trace` function to get the current trace and then calling the `log_feedback_score` function.\n", + "\n", + "The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from opik import track\n", + "from opik.opik_context import get_current_trace\n", + "\n", + "@track\n", + "def retrieve_contexts(question):\n", + " # Define the retrieval function, in this case we will hard code the contexts\n", + " return [\"Paris is the capital of France.\", \"Paris is in France.\"]\n", + "\n", + "@track\n", + "def answer_question(question, contexts):\n", + " # Define the answer function, in this case we will hard code the answer\n", + " return \"Paris\"\n", + "\n", + "@track(name=\"Compute Ragas metric score\", capture_input=False)\n", + "def compute_rag_score(answer_relevancy_metric, question, answer, contexts):\n", + " # Define the score function\n", + " row = {\"question\": question, \"answer\": answer, \"contexts\": contexts}\n", + " score = compute_metric(answer_relevancy_metric, row)\n", + " return score\n", + "\n", + "@track\n", + "def rag_pipeline(question):\n", + " # Define the pipeline\n", + " contexts = retrieve_contexts(question)\n", + " answer = answer_question(question, contexts)\n", + "\n", + " trace = get_current_trace()\n", + " score = compute_rag_score(answer_relevancy_metric, question, answer, contexts)\n", + " trace.log_feedback_score(\"answer_relevancy\", round(score, 4), category_name=\"ragas\")\n", + " \n", + " return answer\n", + "\n", + "rag_pipeline(\"What is the capital of France?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Evaluating datasets\n", + "\n", + "If you looking at evaluating a dataset, you can use the Ragas `evaluate` function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results.\n", + "\n", + "You can use the `OpikTracer` callback to log the results of the evaluation to the Opik platform:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import load_dataset\n", + "from ragas.metrics import context_precision, answer_relevancy, faithfulness\n", + "from ragas import evaluate\n", + "from ragas.integrations.opik import OpikTracer\n", + "\n", + "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n", + "\n", + "opik_tracer_eval = OpikTracer(tags=[\"ragas_eval\"], metadata={\"evaluation_run\": True})\n", + "\n", + "result = evaluate(\n", + " fiqa_eval[\"baseline\"].select(range(3)),\n", + " metrics=[context_precision, faithfulness, answer_relevancy],\n", + " callbacks=[opik_tracer_eval]\n", + ")\n", + "\n", + "print(result)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "py312_llm_eval", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/apps/opik-documentation/documentation/docs/cookbook/ragas.md b/apps/opik-documentation/documentation/docs/cookbook/ragas.md new file mode 100644 index 0000000000..18e4cf0d0d --- /dev/null +++ b/apps/opik-documentation/documentation/docs/cookbook/ragas.md @@ -0,0 +1,187 @@ +# Using Ragas to evaluate RAG pipelines + +In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines. + +There are two main ways to use Opik with Ragas: + +1. Using Ragas metrics to score traces +2. Using the Ragas `evaluate` function to score a dataset + +## Creating an account on Comet.com + +[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key. + +> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information. + + +```python +import os +import getpass + +os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ") +os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ") +``` + +If you are running the Opik platform locally, simply set: + + +```python +# import os +# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api" +``` + +## Preparing our environment + +First, we will install the necessary libraries and configure the OpenAI API key. + + +```python +%pip install opik ragas --quiet + +import os +import getpass + +os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ") +``` + +## Integrating Opik with Ragas + +### Using Ragas metrics to score traces + +Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: `answer_relevancy`, `answer_similarity`, `answer_correctness`, `context_precision`, `context_recall`, `context_entity_recall`, `summarization_score`. You can find a full list of metrics in the [Ragas documentation](https://docs.ragas.io/en/latest/references/metrics.html#). + +These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the `answer_relevancy` metric. + +#### Create the Ragas metric + +In order to use the Ragas metric without using the `evaluate` function, you need to initialize the metric with a `RunConfig` object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled. + +We will first start by initializing the Ragas metric: + + +```python +# Import the metric +from ragas.metrics import AnswerRelevancy + +# Import some additional dependencies +from langchain_openai.chat_models import ChatOpenAI +from langchain_openai.embeddings import OpenAIEmbeddings +from ragas.llms import LangchainLLMWrapper +from ragas.embeddings import LangchainEmbeddingsWrapper + +# Initialize the Ragas metric +llm = LangchainLLMWrapper(ChatOpenAI()) +emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings()) + +answer_relevancy_metric = AnswerRelevancy(llm=llm, embeddings=emb) +``` + +Once the metric is initialized, you can use it to score a sample question. Given that the metric scoring is done asynchronously, you need to use the `asyncio` library to run the scoring function. + + +```python +# Run this cell first if you are running this in a Jupyter notebook +import nest_asyncio + +nest_asyncio.apply() +``` + + +```python +import asyncio +from ragas.integrations.opik import OpikTracer + +# Define the scoring function +def compute_metric(opik_tracer, metric, row): + async def get_score(opik_tracer, metric, row): + score = await metric.ascore(row, callbacks=[opik_tracer]) + return score + + # Run the async function using the current event loop + loop = asyncio.get_event_loop() + + result = loop.run_until_complete(get_score(opik_tracer, metric, row)) + return result + +# Score a simple example +row = { + "question": "What is the capital of France?", + "answer": "Paris", + "contexts": ["Paris is the capital of France.", "Paris is in France."] +} + +opik_tracer = OpikTracer() +score = compute_metric(opik_tracer, answer_relevancy_metric, row) +print("Answer Relevancy score:", score) +``` + +If you now navigate to Opik, you will be able to see that a new trace has been created in the `Default Project` project. + +#### Score traces + +You can score traces by using the `get_current_trace` function to get the current trace and then calling the `log_feedback_score` function. + +The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases. + + +```python +from opik import track +from opik.opik_context import get_current_trace + +@track +def retrieve_contexts(question): + # Define the retrieval function, in this case we will hard code the contexts + return ["Paris is the capital of France.", "Paris is in France."] + +@track +def answer_question(question, contexts): + # Define the answer function, in this case we will hard code the answer + return "Paris" + +@track(name="Compute Ragas metric score", capture_input=False) +def compute_rag_score(answer_relevancy_metric, question, answer, contexts): + # Define the score function + row = {"question": question, "answer": answer, "contexts": contexts} + score = compute_metric(answer_relevancy_metric, row) + return score + +@track +def rag_pipeline(question): + # Define the pipeline + contexts = retrieve_contexts(question) + answer = answer_question(question, contexts) + + trace = get_current_trace() + score = compute_rag_score(answer_relevancy_metric, question, answer, contexts) + trace.log_feedback_score("answer_relevancy", round(score, 4), category_name="ragas") + + return answer + +rag_pipeline("What is the capital of France?") +``` + +#### Evaluating datasets + +If you looking at evaluating a dataset, you can use the Ragas `evaluate` function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results. + +You can use the `OpikTracer` callback to log the results of the evaluation to the Opik platform: + + +```python +from datasets import load_dataset +from ragas.metrics import context_precision, answer_relevancy, faithfulness +from ragas import evaluate +from ragas.integrations.opik import OpikTracer + +fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval") + +opik_tracer_eval = OpikTracer(tags=["ragas_eval"], metadata={"evaluation_run": True}) + +result = evaluate( + fiqa_eval["baseline"].select(range(3)), + metrics=[context_precision, faithfulness, answer_relevancy], + callbacks=[opik_tracer_eval] +) + +print(result) +``` diff --git a/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md b/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md index e1224f8539..47a4ee3ca2 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md +++ b/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md @@ -27,7 +27,7 @@ openai_client = track_openai(openai.OpenAI()) # This method is the LLM application that you want to evaluate # Typically this is not updated when creating evaluations -@track() +@track def your_llm_application(input: str) -> str: response = openai_client.chat.completions.create( model="gpt-3.5-turbo", @@ -36,12 +36,12 @@ def your_llm_application(input: str) -> str: return response.choices[0].message.content -@track() +@track def your_context_retriever(input: str) -> str: return ["..."] ``` -:::note +:::tip We have added here the `track` decorator so that this traces and all it's nested steps are logged to the platform for further analysis. ::: @@ -89,8 +89,8 @@ equals_metric = Equals() contains_metric = Hallucination() ``` -:::note - Each metric expects the data in a certain format, you will need to ensure that the task you have defined in step 1. returns the data in the correct format. +:::tip +Each metric expects the data in a certain format, you will need to ensure that the task you have defined in step 1. returns the data in the correct format. ::: ## 4. Run the evaluation @@ -108,7 +108,7 @@ from opik.integrations.openai import track_openai openai_client = track_openai(openai.OpenAI()) -@track() +@track def your_llm_application(input: str) -> str: response = openai_client.chat.completions.create( model="gpt-3.5-turbo", @@ -118,7 +118,7 @@ def your_llm_application(input: str) -> str: return response.choices[0].message.content -@track() +@track def your_context_retriever(input: str) -> str: return ["..."] @@ -149,6 +149,10 @@ evaluation = evaluate( ) ``` -:::note +:::tip We will track the traces for all evaluations and will be logged to the `evaluation` project by default. To log it to a specific project, you can pass the `project_name` parameter to the `evaluate` function. ::: + +## Advanced usage + +In order to evaluate datasets more efficiently, Opik uses multiple background threads to evaluate the dataset. If this is causing issues, you can disable these by setting `task_threads` and `scoring_threads` to `1` which will lead Opik to run all calculations in the main thread. diff --git a/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md b/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md index 83b9f1af28..03d71536b6 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md +++ b/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md @@ -47,10 +47,15 @@ dataset.insert([ ]) ``` -:::note - Instead of using the `DatasetItem` class, you can also use a dictionary to insert items to a dataset. The dictionary should have the `input` key, `expected_output` and `metadata` are optional. +:::tip +Instead of using the `DatasetItem` class, you can also use a dictionary to insert items to a dataset. The dictionary should have the `input` key, `expected_output` and `metadata` are optional. ::: +Once the items have been inserted, you can view them them in the Opik UI: + +![Opik Dataset](/img/evaluation/dataset_items_page.png) + + ### Deleting items You can delete items in a dataset by using the `delete` method: @@ -60,14 +65,25 @@ from opik import Opik # Get or create a dataset client = Opik() -try: - dataset = client.create_dataset(name="My dataset") -except: - dataset = client.get_dataset(name="My dataset") +dataset = client.get_dataset(name="My dataset") dataset.delete(items_ids=["123", "456"]) ``` +:::tip +You can also remove all the items in a dataset by using the `clear` method: + +```python +from opik import Opik + +# Get or create a dataset +client = Opik() +dataset = client.get_dataset(name="My dataset") + +dataset.clear() +``` +::: + ## Downloading a dataset from Comet You can download a dataset from Comet using the `get_dataset` method: diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md index 7074978dac..f63e4d992e 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md @@ -1,5 +1,5 @@ --- -sidebar_position: 4 +sidebar_position: 5 sidebar_label: AnswerRelevance --- @@ -72,4 +72,4 @@ Answer: Contexts: {contexts} *** -``` \ No newline at end of file +``` diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md index c89ea9310c..0d836155c9 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md @@ -1,5 +1,5 @@ --- -sidebar_position: 4 +sidebar_position: 6 sidebar_label: ContextPrecision --- diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md index cdfd248239..ed53eae33f 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md @@ -1,5 +1,5 @@ --- -sidebar_position: 5 +sidebar_position: 7 sidebar_label: ContextRecall --- diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md index 14c7153ee2..fa7be76b6d 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md @@ -28,10 +28,10 @@ class MyCustomMetric(base_metric.BaseMetric): ) ``` -You can also return a list of `ScoreResult` objects as part of your custom metric. This is useful if you want to return multiple scores for a given input and output pair. +The `score` method should return a `ScoreResult` object. The `ascore` method is optional and can be used to compute the asynchronously if needed. -:::note -The `score` method should return a `ScoreResult` object. The `ascore` method is optional and can be used to compute the score for a given input and output pair. +:::tip +You can also return a list of `ScoreResult` objects as part of your custom metric. This is useful if you want to return multiple scores for a given input and output pair. ::: This metric can now be used in the `evaluate` function as explained here: [Evaluating LLMs](/evaluation/evaluate_your_llm). diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md index 26d20e8cff..6406ff6d97 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md @@ -1,5 +1,5 @@ --- -sidebar_position: 2 +sidebar_position: 3 sidebar_label: Hallucination --- diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md index c33b2d8c02..0cb57cf6f4 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md @@ -1,5 +1,5 @@ --- -sidebar_position: 1 +sidebar_position: 2 sidebar_label: Heuristic Metrics --- @@ -32,7 +32,9 @@ score = metric.score("Hello world !") print(score) ``` -## Equals +## Metrics + +### Equals The `Equals` metric can be used to check if the output of an LLM exactly matches a specific string. It can be used in the following way: @@ -48,7 +50,7 @@ score = metric.score("Hello world !") print(score) ``` -## Contains +### Contains The `Contains` metric can be used to check if the output of an LLM contains a specific substring. It can be used in the following way: @@ -65,7 +67,7 @@ score = metric.score("Hello world !") print(score) ``` -## RegexMatch +### RegexMatch The `RegexMatch` metric can be used to check if the output of an LLM matches a specified regular expression pattern. It can be used in the following way: @@ -81,7 +83,7 @@ score = metric.score("Hello world !") print(score) ``` -## IsJson +### IsJson The `IsJson` metric can be used to check if the output of an LLM is valid. It can be used in the following way: @@ -94,7 +96,7 @@ score = metric.score('{"key": "some_valid_sql"}') print(score) ``` -## LevenshteinRatio +### LevenshteinRatio The `LevenshteinRatio` metric can be used to check if the output of an LLM is valid. It can be used in the following way: @@ -105,4 +107,4 @@ metric = LevenshteinRatio(name="levenshtein_ratio_metric", searched_value="hello score = metric.score("Hello world !") print(score) -``` \ No newline at end of file +``` diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md index fe88e8ceef..1c8509a745 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md @@ -1,5 +1,5 @@ --- -sidebar_position: 3 +sidebar_position: 4 sidebar_label: Moderation --- diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md index ee56ead77d..c6f1ef165c 100644 --- a/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md +++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md @@ -1,8 +1,30 @@ --- sidebar_position: 1 -sidebar_label: Overview - TBD +sidebar_label: Overview --- # Overview -Under cosntruction \ No newline at end of file +Opik provides a set of built-in evaluation metrics that can be used to evaluate the output of your LLM calls. These metrics are broken down into two main categories: + +1. Heuristic metrics +2. LLM as a Judge metrics + +Heuristic metrics are deterministic and are often statistical in nature. LLM as a Judge metrics are non-deterministic and are based on the idea of using an LLM to evaluate the output of another LLM. + +Opik provides the following built-in evaluation metrics: + +| Metric | Type | Description | Documentation | +| --- | --- | --- | --- | +| Equals | Heuristic | Checks if the output exactly matches an expected string | [Equals](/evaluation/metrics/heuristic_metrics#equals) | +| Contains | Heuristic | Check if the output contains a specific substring, can be both case sensitive or case insensitive | [Contains](/evaluation/metrics/heuristic_metrics#contains) | +| RegexMatch | Heuristic | Checks if the output matches a specified regular expression pattern | [RegexMatch](/evaluation/metrics/heuristic_metrics#regexmatch) | +| IsJson | Heuristic | Checks if the output is a valid JSON object | [IsJson](/evaluation/metrics/heuristic_metrics#isjson) | +| Levenshtein | Heuristic | Calculates the Levenshtein distance between the output and an expected string | [Levenshtein](/evaluation/metrics/heuristic_metrics#levenshteinratio) | +| Hallucination | LLM as a Judge | Check if the output contains any hallucinations | [Hallucination](/evaluation/metrics/hallucination) | +| Moderation | LLM as a Judge | Check if the output contains any harmful content | [Moderation](/evaluation/metrics/moderation) | +| AnswerRelevance | LLM as a Judge | Check if the output is relevant to the question | [AnswerRelevance](/evaluation/metrics/answer_relevance) | +| ContextRecall | LLM as a Judge | Check if the output contains any hallucinations | [ContextRecall](/evaluation/metrics/context_recall) | +| ContextPrecision | LLM as a Judge | Check if the output contains any hallucinations | [ContextPrecision](/evaluation/metrics/context_precision) | + +You can also create your own custom metric, learn more about it in the [Custom Metric](/evaluation/metrics/custom_metric) section. diff --git a/apps/opik-documentation/documentation/docs/home.md b/apps/opik-documentation/documentation/docs/home.md index 9baa440133..80b9f5d13b 100644 --- a/apps/opik-documentation/documentation/docs/home.md +++ b/apps/opik-documentation/documentation/docs/home.md @@ -4,46 +4,37 @@ slug: / sidebar_label: Home --- -# Comet Opik +# Opik by Comet -The LLM Evaluation platform allows you log, view and evaluate your LLM traces during both development and production. Using the platform and our LLM as a Judge evaluators, you can identify and fix issues in your LLM application. +The Opik platform allows you log, view and evaluate your LLM traces during both development and production. Using the platform and our LLM as a Judge evaluators, you can identify and fix issues in your LLM application. ![LLM Evaluation Platform](/img/home/traces_page_with_sidebar.png) -# Overview +## Overview -## Development +### Development During development, you can use the platform to log, view and debug your LLM traces: 1. Log traces using: - a. One of our [integrations](./) - b. The `@track` decorator for Python - c. The [Rest API](./) -2. Review and debug traces in the [Tracing UI](./) -3. [Annotate and label traces](./) through the UI -## Evaluation and Testing + a. One of our [integrations](/tracing/integrations/overview). -Evaluating the output of your LLM calls is critical to ensure that your application is working as expected and can be challenging. Using the Comet LLM Evaluation platformm, you can: - -1. Use one of our [LLM as a Judge evaluators](./) or [Heuristic evaluators](./) to score your traces and LLM calls -2. [Store evaluation datasets](./) in the platform and [run evaluations](./) -3. Use our [pytest integration](./) to track unit test results and compare results between runs + b. The `@track` decorator for Python, learn more in the [Logging Traces](/tracing/log_traces) guide. +3. [Annotate and label traces](/tracing/annotate_traces) through the SDK or the UI. -## Monitoring +### Evaluation and Testing -You can use the LLM platform to monitor your LLM applications in production, both the SDK and the Backend have been designed to support high volumes of requests. - -The platform allows you: +Evaluating the output of your LLM calls is critical to ensure that your application is working as expected and can be challenging. Using the Comet LLM Evaluation platformm, you can: -1. Track all LLM calls and traces using our [Python SDK](./) and a [Rest API](./) -2. View, filter and analyze traces in our [Tracing UI](./) -3. Update evaluation datasets with [failed traces](./) +1. Use one of our [LLM as a Judge evaluators](/evaluation/metrics/overview) or [Heuristic evaluators](/evaluation/metrics/heuristic_metrics) to score your traces and LLM calls +2. [Store evaluation datasets](/evaluation/manage_datasets) in the platform and [run evaluations](/evaluation/evaluate_your_llm) +3. Use our [pytest integration](/testing/pytest_integration) to track unit test results and compare results between runs +## Getting Started -# Getting Started +[Comet](https://www.comet.com/site) provides a managed Cloud offering for Opik, simply [create an account](https://www.comet.com/signup?from=llm) to get started. -The Comet LLM Evaluation platform allows you log, view and evaluate your LLM traces during both development and production. \ No newline at end of file +You can also run Opik locally using our [local installer](//self-host/self_hosting_opik#all-in-one-installation). If you are looking for a more production ready deployment, you can also use our [Kubernetes deployment option](/self-host/self_hosting_opik#kubernetes-installation). diff --git a/apps/opik-documentation/documentation/docs/quickstart.md b/apps/opik-documentation/documentation/docs/quickstart.md index 2b64dd760b..81215da3ce 100644 --- a/apps/opik-documentation/documentation/docs/quickstart.md +++ b/apps/opik-documentation/documentation/docs/quickstart.md @@ -5,31 +5,38 @@ sidebar_label: Quickstart # Quickstart -This guide helps you integrate the Comet LLM Evaluation platform with your existing LLM application. +This guide helps you integrate the Opik platform with your existing LLM application. ## Set up -Getting started is as simple as creating an [account on Comet](./) or [self-hosting the platform](./). +Getting started is as simple as creating an [account on Comet](https://www.comet.com/signup?from=llm) or [self-hosting the platform](/self-host/self_hosting_opik). -Once your account is created, you can start logging traces by installing and configuring the Python SDK: +Once your account is created, you can start logging traces by installing the Opik Python SDK: ```bash pip install opik +``` + +and configuring the SDK with: + +```python +import os -export COMET_API_KEY=<...> +os.environ["OPIK_API_KEY"] = "" +os.environ["OPIK_WORKSPACE"] = "" ``` -:::note -You do not need to set the `COMET_API_KEY` environment variable if you are self-hosting the platform. Instead you will need to set: +:::tip +If you are self-hosting the platform, you don't need to set the `OPIK_API_KEY` and `OPIK_WORKSPACE` environment variables. Instead simply set: ```bash -EXPORT COMET_URL_OVERRIDE="http://localhost:5173/api" +export OPIK_URL_OVERRIDE="http://localhost:5173/api" ``` ::: ## Integrating with your LLM application -You can start logging traces to Comet by simply adding the `opik.track` decorator to your LLM application: +You can start logging traces to Opik by simply adding the `@track` decorator to your LLM application: ```python from opik import track @@ -41,5 +48,6 @@ def your_llm_application(input): return output ``` -To learn more about the `track` decorator, see the [`track` documentation](./track). +To learn more about the `track` decorator, see the [`track` documentation](./track). Once the traces are logged, you can view them in the OPIK UI: +![Opik Traces](/img/home/traces_page_for_quickstart.png) diff --git a/apps/opik-documentation/documentation/docs/self-host/kubernetes_deployments.md b/apps/opik-documentation/documentation/docs/self-host/kubernetes_deployments.md deleted file mode 100644 index 48a6da4fbb..0000000000 --- a/apps/opik-documentation/documentation/docs/self-host/kubernetes_deployments.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -sidebar_position: 2 -sidebar_label: Kubernetes Deployments - TBD ---- - -# Kubernetes Deployments - -Under construction \ No newline at end of file diff --git a/apps/opik-documentation/documentation/docs/self-host/local_deployments.md b/apps/opik-documentation/documentation/docs/self-host/local_deployments.md deleted file mode 100644 index e336370e79..0000000000 --- a/apps/opik-documentation/documentation/docs/self-host/local_deployments.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -sidebar_position: 1 -sidebar_label: Local Deployments - TBD ---- - -# Local Deployments - -Under construction. \ No newline at end of file diff --git a/apps/opik-documentation/documentation/docs/self-host/self_hosting_opik.md b/apps/opik-documentation/documentation/docs/self-host/self_hosting_opik.md new file mode 100644 index 0000000000..ba903a3660 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/self-host/self_hosting_opik.md @@ -0,0 +1,123 @@ +--- +sidebar_position: 1 +sidebar_label: Overview +--- + +# Self-host + +You can use Opik through [Comet's Managed Cloud offering](https://comet.com/site) or you can self-host Opik on your own infrastructure. When choosing to self-host Opik, you get access to all Opik features including tracing, evaluation, etc but without user management features. + +If you choose to self-host Opik, you can choose between two deployment options: + +1. All-in-one installation: The Opik platform runs on a single server. +2. Kubernetes installation: The Opik platform runs on a Kubernetes cluster. + +If you are looking at just getting started, we recommend the all-in-one installation. For more advanced use cases, you can choose the Kubernetes installation. + +## All-in-one installation + +The all-in-one installer is the easiest way to get started with Opik. + +### Installation + +To install the Opik server, run the following command: + +```bash +opik-server install +``` + +You can also run the installer in debug mode to see the details of the +installation process: + +```bash +opik-server --debug install +``` + +:::tip +We recommend installing using the `--debug` flag as the installation can take a couple of minutes +::: + +By default, the installer will install the same version of the Opik as its +own version (`opik-server -v`). If you want to install a specific version, you +can specify the version using the `--opik-version` flag: + +```bash +opik-server install --opik-version 0.1.0 +``` + +By default, the installer will setup a local port forward to the Opik server +using the port `5173`. If you want to use a different port, you can specify +the port using the `--local-port` flag: + +```bash +opik-server install --local-port 5174 +``` + +The installation process takes a couple of minutes and when complete, Opik will be available at `http://localhost:5173`. + +### Upgrading the Opik server + +To upgrade the Opik server, run the following command: + +```bash +pip install --upgrade opik-server +opik-server upgrade +``` + +Or upgrade to a specific version: + +```bash +opik-server upgrade --opik-version 0.1.1 +``` + +### Uninstalling the Opik server + +To uninstall the Opik server, you can run the following command: + +```bash +minikube delete +``` + +## Kubernetes installation + +If you are looking for a more customization options, you can choose to install Opik on a Kubernetes cluster. + +In order to install Opik on a Kubernetes cluster, you will need to have the following tools installed: + +- [Docker](https://www.docker.com/) +- [Helm](https://helm.sh/) +- [kubectl](https://kubernetes.io/docs/tasks/tools/) +- [kubectx](https://github.com/ahmetb/kubectx) and [kubens](https://github.com/ahmetb/kubectx) to switch between Kubernetes clusters and namespaces. + +To install Opik, you can use the helm chart defined in the `deployment/helm_chart/opik` directory of the [Opik repository](https://github.com/comet-ml/opik): + +```bash +# Navigate to the directory +cd deployment/helm_chart/opik + +# Define the version of the Opik server you want to install +VERSION=main + +# Add helm dependencies +helm repo add bitnami https://charts.bitnami.com/bitnami +helm dependency build + +# Install Opik +helm upgrade --install opik -n llm --create-namespace -f values.yaml \ + --set registry=docker.dev.comet.com/comet-ml \ + --set component.backend.image.tag=$VERSION --set component.frontend.image.tag=$VERSION-os \ + --set component.backend.env.ANALYTICS_DB_MIGRATIONS_PASS=opik --set component.backend.env.ANALYTICS_DB_PASS=opik \ + --set component.backend.env.STATE_DB_PASS=opik . +``` + +To access the Opik UI, you will need to port-forward the frontend service: + +```bash +kubectl port-forward -n llm svc/opik-frontend 5173 +``` + +You can now open the Opik UI at `http://localhost:5173/llm`. + +### Configuration + +You can find a full list the configuration options in the [helm chart documentation](https://github.com/comet-ml/opik/tree/main/deployment/helm_chart/opik). diff --git a/apps/opik-documentation/documentation/docs/testing/_category_.json b/apps/opik-documentation/documentation/docs/testing/_category_.json new file mode 100644 index 0000000000..42544d1d56 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/testing/_category_.json @@ -0,0 +1,7 @@ +{ + "label": "Testing", + "position": 4, + "link": { + "type": "generated-index" + } + } diff --git a/apps/opik-documentation/documentation/docs/testing/pytest_integration.md b/apps/opik-documentation/documentation/docs/testing/pytest_integration.md new file mode 100644 index 0000000000..e2a4a27b9e --- /dev/null +++ b/apps/opik-documentation/documentation/docs/testing/pytest_integration.md @@ -0,0 +1,60 @@ +--- +sidebar_position: 1 +sidebar_label: Pytest Integration +--- + +# Pytest Integration + +Ensuring your LLM applications is working as expected is a crucial step before deploying to production. Opik provides a Pytest integration so that you can easily track the overall pass / fail rates of your tests as well as the individual pass / fail rates of each test. + +## Using the Pytest Integration + +We recommend using the `llm_unit` decorator to wrap your tests. This will ensure that Opik can track the results of your tests and provide you with a detailed report. It also works well when used in conjunction with the `track` decorator used to trace your LLM application. + + +```python +import pytest +from opik import track, llm_unit + +@track +def llm_application(user_question: str) -> str: + # LLM application code here + return "Paris" + +@llm_unit() +def test_simple_passing_test(): + user_question = "What is the capital of France?" + response = llm_application(user_question) + assert response == "Paris" +``` + +When you run the tests, Opik will create a new experiment for each run and log each test result. My navigating to the `tests` dataset, you will see a new experiment for each test run. + +![Test Experiments](/img/testing/test_experiments.png) + +:::tip +If you are evaluating your LLM application during development, we recommend using the `evaluate` function as it will provide you with a more detailed report. You can learn more about the `evaluate` function in the [evaluation documentation](/evaluation/evaluate_your_llm). +::: + +### Advanced Usage + +The `llm_unit` decorator also works well when used in conjunctions with the `parametrize` Pytest decorator that allows you to run the same test with different inputs: + +```python +import pytest +from opik import track, llm_unit + +@track +def llm_application(user_question: str) -> str: + # LLM application code here + return "Paris" + +@llm_unit(expected_output_key="expected_output") +@pytest.mark.parametrize("user_question, expected_output", [ + ("What is the capital of France?", "Paris"), + ("What is the capital of Germany?", "Berlin") +]) +def test_simple_passing_test(user_question, expected_output): + response = llm_application(user_question) + assert response == expected_output +``` diff --git a/apps/opik-documentation/documentation/docs/tracing/log_feedback_scores.md b/apps/opik-documentation/documentation/docs/tracing/annotate_traces.md similarity index 63% rename from apps/opik-documentation/documentation/docs/tracing/log_feedback_scores.md rename to apps/opik-documentation/documentation/docs/tracing/annotate_traces.md index 1fe431fc0a..43c42a50a7 100644 --- a/apps/opik-documentation/documentation/docs/tracing/log_feedback_scores.md +++ b/apps/opik-documentation/documentation/docs/tracing/annotate_traces.md @@ -1,11 +1,11 @@ --- sidebar_position: 5 -sidebar_label: Log Feedback Scores +sidebar_label: Annotate Traces --- -# Log Feedback Scores +# Annotate Traces -Logging feedback scores is a crucial aspect of evaluating and improving your LLM-based applications. By systematically recording qualitative or quantitative feedback on specific interactions or entire conversation flows, you can: +Annotating traces is a crucial aspect of evaluating and improving your LLM-based applications. By systematically recording qualitative or quantitative feedback on specific interactions or entire conversation flows, you can: 1. Track performance over time 2. Identify areas for improvement @@ -13,15 +13,27 @@ Logging feedback scores is a crucial aspect of evaluating and improving your LLM 4. Gather data for fine-tuning or retraining 5. Provide stakeholders with concrete metrics on system effectiveness -Comet provides powerful tools to log feedback scores for both individual spans (specific interactions) and entire traces (complete conversation flows). This granular approach allows you to pinpoint exactly where your system excels or needs improvement. +Opik allows you to annotate traces through the SDK or the UI. -## Logging Feedback Scores +## Annotating Traces through the UI -Feedback scores can be logged at both a trace and a span level using `log_traces_feedback_scores` and `log_spans_feedback_scores` respectively. +To annotate traces through the UI, you can navigate to the trace you want to annotate in the traces page and click on the `Annotate` button. This will open a sidebar where you can add annotations to the trace. -### Logging Feedback Scores for Traces +You can annotate both traces and spans through the UI, make sure you have selected the correct span in the sidebar. -To log feedback scores for entire traces, use the `log_traces_feedback_scores` method: +![Annotate Traces](/img/tracing/annotate_traces.png) + +:::tip +In order to ensure a consistent set of feedback, you will need to define feedback definitions in the `Feedback Definitions` page which supports both numerical and categorical annotations. +::: + +## Annotating traces and spans using the SDK + +You can use the SDK to annotate traces and spans which can be useful both as part of the evaluation process or if you receive user feedback scores in your application. + +### Annotating Traces through the SDK + +Feedback scores can be logged for traces using the `log_traces_feedback_scores` method: ```python from opik import Opik @@ -38,11 +50,11 @@ client.log_traces_feedback_scores( ) ``` -:::note +:::tip The `scores` argument supports an optional `reason` field that can be provided to each score. This can be used to provide a human-readable explanation for the feedback score. ::: -### Logging Feedback Scores for Spans +### Annotating Spans through the SDK To log feedback scores for individual spans, use the `log_spans_feedback_scores` method: @@ -66,11 +78,11 @@ comet.log_spans_feedback_scores( The `FeedbackScoreDict` class supports an optional `reason` field that can be used to provide a human-readable explanation for the feedback score. ::: -## Computing Feedback Scores +### Using Opik's built-in evaluation metrics -Computing feedback scores can be challenging due to the fact that Large Language Models can return unstructured text and non-deterministic outputs. In order to help with the computation of these scores, Comet provides some built-in evaluation metrics. +Computing feedback scores can be challenging due to the fact that Large Language Models can return unstructured text and non-deterministic outputs. In order to help with the computation of these scores, Opik provides some built-in evaluation metrics. -Comet's built-in evaluation metrics are broken down into two main categories: +Opik's built-in evaluation metrics are broken down into two main categories: 1. Heuristic metrics 2. LLM as a judge metrics @@ -78,7 +90,7 @@ Comet's built-in evaluation metrics are broken down into two main categories: Heuristic metrics are use rule-based or statistical methods that can be used to evaluate the output of LLM models. -Comet supports a variety of heuristic metrics including: +Opik supports a variety of heuristic metrics including: * `EqualsMetric` * `RegexMatchMetric` * `ContainsMetric` diff --git a/apps/opik-documentation/documentation/docs/tracing/integrations/langchain.md b/apps/opik-documentation/documentation/docs/tracing/integrations/langchain.md index e7f813455f..b82c12af27 100644 --- a/apps/opik-documentation/documentation/docs/tracing/integrations/langchain.md +++ b/apps/opik-documentation/documentation/docs/tracing/integrations/langchain.md @@ -1,5 +1,5 @@ --- -sidebar_position: 2 +sidebar_position: 3 sidebar_label: LangChain --- diff --git a/apps/opik-documentation/documentation/docs/tracing/integrations/openai.md b/apps/opik-documentation/documentation/docs/tracing/integrations/openai.md index 124d3cb43c..491d786864 100644 --- a/apps/opik-documentation/documentation/docs/tracing/integrations/openai.md +++ b/apps/opik-documentation/documentation/docs/tracing/integrations/openai.md @@ -1,5 +1,5 @@ --- -sidebar_position: 3 +sidebar_position: 2 sidebar_label: OpenAI --- @@ -37,4 +37,4 @@ response = openai_client.Completion.create( The `openai_wrapper` will automatically track and log the API call, including the input prompt, model used, and response generated. You can view these logs in your Comet project dashboard. -By following these steps, you can seamlessly integrate Comet Opik with the OpenAI Python SDK and gain valuable insights into your model's performance and usage. \ No newline at end of file +By following these steps, you can seamlessly integrate Comet Opik with the OpenAI Python SDK and gain valuable insights into your model's performance and usage. diff --git a/apps/opik-documentation/documentation/docs/tracing/integrations/overview.md b/apps/opik-documentation/documentation/docs/tracing/integrations/overview.md index 99525a7b07..d3c2bc48d3 100644 --- a/apps/opik-documentation/documentation/docs/tracing/integrations/overview.md +++ b/apps/opik-documentation/documentation/docs/tracing/integrations/overview.md @@ -1,8 +1,16 @@ --- sidebar_position: 1 -sidebar_label: Overview - TBD +sidebar_label: Overview --- # Overview -Under construction. \ No newline at end of file +Opik aims to make it as easy as possible to log, view and evaluate your LLM traces. We do this by providing a set of integrations: + + +| Integration | Description | Documentation | Try in Colab | +| ----------- | ----------- | ------------- | ------------ | +| OpenAI | Log traces for all OpenAI LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/openai) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb) | +| LangChain | Log traces for all LangChain LLM calls | [Documentation](https://www.comet.com/docs/opik/integrations/langchain) | [![Open Quickstart In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/opik/blob/master/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb) | + +If you would like to see more integrations, please open an issue on our [GitHub repository](https://github.com/comet-ml/opik). diff --git a/apps/opik-documentation/documentation/docs/tracing/integrations/ragas.md b/apps/opik-documentation/documentation/docs/tracing/integrations/ragas.md new file mode 100644 index 0000000000..6e12919561 --- /dev/null +++ b/apps/opik-documentation/documentation/docs/tracing/integrations/ragas.md @@ -0,0 +1,122 @@ +--- +sidebar_position: 4 +sidebar_label: Ragas +--- + +# Ragas + +The Opik SDK provides a simple way to integrate with Ragas, a framework for evaluating RAG systems. + +There are two main ways to use Ragas with Opik: + +1. Using Ragas to score traces or spans. +2. Using Ragas to evaluate a RAG pipeline. + +## Using Ragas to score traces or spans + +Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, a full list of the supported metrics can be found in the [Ragas documentation](https://docs.ragas.io/en/latest/references/metrics.html#). + +In addition to being able to track these feedback scores in Opik, you can also use the `OpikTracer` callback to keep track of the score calculation in Opik. + +Due to the asynchronous nature of the score calculation, we will need to define a coroutine to compute the score: + +```python +# Import the metric +from ragas.metrics import AnswerRelevancy + +# Import some additional dependencies +from langchain_openai.chat_models import ChatOpenAI +from langchain_openai.embeddings import OpenAIEmbeddings +from ragas.llms import LangchainLLMWrapper +from ragas.embeddings import LangchainEmbeddingsWrapper + +import asyncio +from ragas.integrations.opik import OpikTracer + +# Initialize the Ragas metric +llm = LangchainLLMWrapper(ChatOpenAI()) +emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings()) + +# Define the scoring function +def compute_metric(opik_tracer, metric, row): + async def get_score(): + score = await metric.ascore(row, callbacks=[opik_tracer]) + return score + + # Run the async function using the current event loop + loop = asyncio.get_event_loop() + + result = loop.run_until_complete(get_score()) + return result +``` + +Once the `compute_metric` function is defined, you can use it to score a trace or span: + +```python +from opik import track +from opik.opik_context import get_current_trace + +@track +def retrieve_contexts(question): + # Define the retrieval function, in this case we will hard code the contexts + return ["Paris is the capital of France.", "Paris is in France."] + +@track +def answer_question(question, contexts): + # Define the answer function, in this case we will hard code the answer + return "Paris" + +@track(name="Compute Ragas metric score", capture_input=False) +def compute_rag_score(answer_relevancy_metric, question, answer, contexts): + # Define the score function + row = {"question": question, "answer": answer, "contexts": contexts} + score = compute_metric(answer_relevancy_metric, row) + return score + +@track +def rag_pipeline(question): + # Define the pipeline + contexts = retrieve_contexts(question) + answer = answer_question(question, contexts) + + trace = get_current_trace() + score = compute_rag_score(answer_relevancy_metric, question, answer, contexts) + trace.log_feedback_score("answer_relevancy", round(score, 4), category_name="ragas") + + return answer + +rag_pipeline("What is the capital of France?") +``` + +In the Opik UI, you will be able to see the full trace including the score calculation: + +![Ragas chain](/img/tracing/ragas_opik_trace.png) + +## Using Ragas to evaluate a RAG pipeline + +:::tip + +We recommend using the Opik [evaluation framework](/evaluation/evaluate_your_llm) to evaluate your RAG pipeline. It shares similar concepts with the Ragas `evaluate` functionality but has a tighter integration with Opik. + +::: + +If you are using the Ragas `evaluate` functionality, you can use the `OpikTracer` callback to keep track of the score calculation in Opik. This will track as traces the computation of each evaluation metric: + +```python +from datasets import load_dataset +from ragas.metrics import context_precision, answer_relevancy, faithfulness +from ragas import evaluate +from ragas.integrations.opik import OpikTracer + +fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval") + +opik_tracer_eval = OpikTracer(tags=["ragas_eval"], metadata={"evaluation_run": True}) + +result = evaluate( + fiqa_eval["baseline"].select(range(3)), + metrics=[context_precision, faithfulness, answer_relevancy], + callbacks=[opik_tracer_eval] +) + +print(result) +``` diff --git a/apps/opik-documentation/documentation/docs/tracing/log_traces.md b/apps/opik-documentation/documentation/docs/tracing/log_traces.md index cb3f85b997..493c8fbc77 100644 --- a/apps/opik-documentation/documentation/docs/tracing/log_traces.md +++ b/apps/opik-documentation/documentation/docs/tracing/log_traces.md @@ -17,8 +17,8 @@ pip install opik Once the SDK is installed, you can log traces to using one our Comet's integration, function annotations or manually. -:::note -If you are using LangChain or OpenAI, we recommend checking out their respective documentation for more information. +:::tip +Opik has a number of integrations for popular LLM frameworks like LangChain or OpenAI, checkout a full list of integrations in the [integrations](/tracing/integrations/overview) section. ::: ## Log using function annotators @@ -32,11 +32,11 @@ from opik.integrations.openai import track_openai openai_client = track_openai(openai.OpenAI()) -@track() +@track def preprocess_input(text): return text.strip().lower() -@track() +@track def generate_response(prompt): response = openai_client.chat.completions.create( model="gpt-3.5-turbo", @@ -44,11 +44,11 @@ def generate_response(prompt): ) return response.choices[0].message.content -@track() +@track def postprocess_output(response): return response.capitalize() -@track(name="llm_chain") +@track(name="my_llm_application) def llm_chain(input_text): preprocessed = preprocess_input(input_text) generated = generate_response(preprocessed) @@ -60,7 +60,7 @@ result = llm_chain("Hello, how are you?") print(result) ``` -:::note +:::tip If the `track` function annotators are used in conjunction with the `track_openai` or `CometTracer` callbacks, the LLM calls will be automatically logged to the corresponding trace. ::: @@ -94,6 +94,9 @@ trace.span( input={"prompt": "Translate the following text to French: hello, how are you?"}, output={"response": "Comment ça va?"} ) + +# End the trace +trace.end() ``` ## Update trace and span attributes @@ -104,7 +107,7 @@ You can access the Trace and Span objects to update their attributes. This is us from opik.opik_context import get_current_trace, get_current_span from opik import track -@track() +@track def llm_chain(input_text): # LLM chain code # ... @@ -127,7 +130,7 @@ def llm_chain(input_text): You can learn more about the `Trace` object in the [Trace reference docs](/sdk-reference-docs/Objects/Trace.html) and the `Span` object in the [Span reference docs](/sdk-reference-docs/Objects/Span.html). -## Log scores to traces +## Log scores to traces and spans You can log scores to traces and spans using the `log_traces_feedback_scores` and `log_spans_feedback_scores` methods: diff --git a/apps/opik-documentation/documentation/package.json b/apps/opik-documentation/documentation/package.json index e345823207..63d122063c 100644 --- a/apps/opik-documentation/documentation/package.json +++ b/apps/opik-documentation/documentation/package.json @@ -5,8 +5,8 @@ "scripts": { "docusaurus": "docusaurus", "start": "docusaurus start", - "dev": "concurrently \"docusaurus start\" \"nodemon --watch docs/cookbook -e ipynb --exec 'jupyter nbconvert docs/cookbook/*.ipynb --to markdown'\"", - "build": "jupyter nbconvert docs/cookbook/*.ipynb --to markdown && docusaurus build", + "dev": "concurrently \"docusaurus start\" \"nodemon --watch docs/cookbook/ -e ipynb --exec 'jupyter nbconvert docs/cookbook/*.ipynb --to markdown'\"", + "build": "jupyter nbconvert docs/cookbook/*.ipynb --clear-output --to markdown && docusaurus build", "swizzle": "docusaurus swizzle", "deploy": "docusaurus deploy", "clear": "docusaurus clear", diff --git a/apps/opik-documentation/documentation/sidebars.ts b/apps/opik-documentation/documentation/sidebars.ts index 4632d88828..f1dd5a6b4e 100644 --- a/apps/opik-documentation/documentation/sidebars.ts +++ b/apps/opik-documentation/documentation/sidebars.ts @@ -14,14 +14,20 @@ const sidebars: SidebarsConfig = { guideSidebar: [ 'home', 'quickstart', + { + type: 'category', + label: 'Self-host', + collapsed: false, + items: ['self-host/self_hosting_opik'] + }, { type: 'category', label: 'Tracing', collapsed: false, - items: ['tracing/log_traces', 'tracing/log_distributed_traces', 'tracing/log_feedback_scores', { + items: ['tracing/log_traces', 'tracing/log_distributed_traces', 'tracing/annotate_traces', { type: 'category', label: 'Integrations', - items: ['tracing/integrations/langchain', 'tracing/integrations/openai'] + items: ['tracing/integrations/overview', 'tracing/integrations/langchain', 'tracing/integrations/openai'] }], }, { @@ -31,14 +37,20 @@ const sidebars: SidebarsConfig = { items: ['evaluation/manage_datasets', 'evaluation/evaluate_your_llm', { type: 'category', label: 'Metrics', - items: ['evaluation/metrics/heuristic_metrics', 'evaluation/metrics/hallucination', 'evaluation/metrics/answer_relevance', 'evaluation/metrics/moderation', 'evaluation/metrics/context_precision', 'evaluation/metrics/context_recall', 'evaluation/metrics/custom_metric'] + items: ['evaluation/metrics/overview', 'evaluation/metrics/heuristic_metrics', 'evaluation/metrics/hallucination', 'evaluation/metrics/moderation', 'evaluation/metrics/answer_relevance', 'evaluation/metrics/context_precision', 'evaluation/metrics/context_recall', 'evaluation/metrics/custom_metric'] }], }, + { + type: 'category', + label: 'Testing', + collapsed: false, + items: ['testing/pytest_integration'] + }, { type: 'category', label: 'Cookbooks', collapsed: false, - items: ['cookbook/langchain', 'cookbook/evaluate_hallucination_metric', 'cookbook/evaluate_moderation_metric'], + items: ['cookbook/openai', 'cookbook/langchain', 'cookbook/evaluate_hallucination_metric', 'cookbook/evaluate_moderation_metric'], }, ], }; diff --git a/apps/opik-documentation/documentation/src/css/components/_sidebar.scss b/apps/opik-documentation/documentation/src/css/components/_sidebar.scss index 6397e53019..69a38bca26 100644 --- a/apps/opik-documentation/documentation/src/css/components/_sidebar.scss +++ b/apps/opik-documentation/documentation/src/css/components/_sidebar.scss @@ -26,6 +26,10 @@ background: none; } } + + .menu__link--sublist-caret:after { + background: var(--ifm-menu-link-sublist-icon) 50% / 4rem 1.4rem; + } .menu__list-item:not(:first-child) { margin-top: 0rem; @@ -36,6 +40,9 @@ // margin-left: 0.25rem; } + .menu__list .menu__list { + margin-bottom: 0.25rem; + } .menu__list-item { .menu__list { position: relative; @@ -56,4 +63,4 @@ } } -} \ No newline at end of file +} diff --git a/apps/opik-documentation/documentation/src/css/custom.scss b/apps/opik-documentation/documentation/src/css/custom.scss index b611825ac3..b526721038 100644 --- a/apps/opik-documentation/documentation/src/css/custom.scss +++ b/apps/opik-documentation/documentation/src/css/custom.scss @@ -70,4 +70,4 @@ --ifm-heading-color-border: #e0f3ff1a; --ifm-sidebar-color-line: #262626; -} \ No newline at end of file +} diff --git a/apps/opik-documentation/documentation/static/img/cookbook/hallucination_metric_cookbook.png b/apps/opik-documentation/documentation/static/img/cookbook/hallucination_metric_cookbook.png new file mode 100644 index 0000000000..6a34512d28 Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/cookbook/hallucination_metric_cookbook.png differ diff --git a/apps/opik-documentation/documentation/static/img/cookbook/langchain_cookbook.png b/apps/opik-documentation/documentation/static/img/cookbook/langchain_cookbook.png new file mode 100644 index 0000000000..75894b541f Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/cookbook/langchain_cookbook.png differ diff --git a/apps/opik-documentation/documentation/static/img/cookbook/moderation_metric_cookbook.png b/apps/opik-documentation/documentation/static/img/cookbook/moderation_metric_cookbook.png new file mode 100644 index 0000000000..c1e129e216 Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/cookbook/moderation_metric_cookbook.png differ diff --git a/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_cookbook.png b/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_cookbook.png new file mode 100644 index 0000000000..759887448f Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_cookbook.png differ diff --git a/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_decorator_cookbook.png b/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_decorator_cookbook.png new file mode 100644 index 0000000000..029e881afd Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/cookbook/openai_trace_decorator_cookbook.png differ diff --git a/apps/opik-documentation/documentation/static/img/evaluation/dataset_items_page.png b/apps/opik-documentation/documentation/static/img/evaluation/dataset_items_page.png new file mode 100644 index 0000000000..cbe4ad5831 Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/evaluation/dataset_items_page.png differ diff --git a/apps/opik-documentation/documentation/static/img/home/traces_page_for_quickstart.png b/apps/opik-documentation/documentation/static/img/home/traces_page_for_quickstart.png new file mode 100644 index 0000000000..6d866f0195 Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/home/traces_page_for_quickstart.png differ diff --git a/apps/opik-documentation/documentation/static/img/testing/test_experiments.png b/apps/opik-documentation/documentation/static/img/testing/test_experiments.png new file mode 100644 index 0000000000..299ac2dcdd Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/testing/test_experiments.png differ diff --git a/apps/opik-documentation/documentation/static/img/tracing/annotate_traces.png b/apps/opik-documentation/documentation/static/img/tracing/annotate_traces.png new file mode 100644 index 0000000000..868d5d5baa Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/tracing/annotate_traces.png differ diff --git a/apps/opik-documentation/documentation/static/img/tracing/ragas_opik_trace.png b/apps/opik-documentation/documentation/static/img/tracing/ragas_opik_trace.png new file mode 100644 index 0000000000..573e6524d0 Binary files /dev/null and b/apps/opik-documentation/documentation/static/img/tracing/ragas_opik_trace.png differ diff --git a/apps/opik-documentation/python-sdk-docs/source/Comet.rst b/apps/opik-documentation/python-sdk-docs/source/Opik.rst similarity index 53% rename from apps/opik-documentation/python-sdk-docs/source/Comet.rst rename to apps/opik-documentation/python-sdk-docs/source/Opik.rst index 5541fb7277..4a2361c5f7 100644 --- a/apps/opik-documentation/python-sdk-docs/source/Comet.rst +++ b/apps/opik-documentation/python-sdk-docs/source/Opik.rst @@ -1,7 +1,7 @@ -Comet -===== +Opik +==== -.. autoclass:: opik.Comet +.. autoclass:: opik.Opik :members: :inherited-members: \ No newline at end of file diff --git a/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_span.rst b/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_span.rst deleted file mode 100644 index ddfe559747..0000000000 --- a/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_span.rst +++ /dev/null @@ -1,4 +0,0 @@ -get_current_span -================ - -.. autofunction:: opik.comet_context.get_current_span \ No newline at end of file diff --git a/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_trace.rst b/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_trace.rst deleted file mode 100644 index f16795c01a..0000000000 --- a/apps/opik-documentation/python-sdk-docs/source/comet_context/get_current_trace.rst +++ /dev/null @@ -1,4 +0,0 @@ -get_current_trace -================= - -.. autofunction:: opik.comet_context.get_current_trace \ No newline at end of file diff --git a/apps/opik-documentation/python-sdk-docs/source/comet_context/index.rst b/apps/opik-documentation/python-sdk-docs/source/comet_context/index.rst deleted file mode 100644 index fe1d891c2c..0000000000 --- a/apps/opik-documentation/python-sdk-docs/source/comet_context/index.rst +++ /dev/null @@ -1,10 +0,0 @@ -comet_context -============= - -.. toctree:: - :hidden: - :maxdepth: 4 - :titlesonly: - - get_current_span - get_current_trace \ No newline at end of file diff --git a/apps/opik-documentation/python-sdk-docs/source/evaluation/metrics/index.rst b/apps/opik-documentation/python-sdk-docs/source/evaluation/metrics/index.rst index 880978f0a5..865cb4a0f4 100644 --- a/apps/opik-documentation/python-sdk-docs/source/evaluation/metrics/index.rst +++ b/apps/opik-documentation/python-sdk-docs/source/evaluation/metrics/index.rst @@ -1,8 +1,25 @@ metrics ======= +Opik includes a number of pre-built metrics to help you evaluate your LLM application. + +Each metric can be called as a standalone function using the `score` method:: + + from opik.evaluation.metrics import Hallucination + + metric = Hallucination() + + metric.score( + input="What is the capital of France?", + output="The capital of France is Paris. It is famous for its iconic Eiffel Tower and rich cultural heritage.", + context=["France is a country in Western Europe. Its capital is Paris, which is known for landmarks like the Eiffel Tower."], + ) + +Or as part of an evaluation run using the `evaluate` function. + +You can learn more about each metric in the following sections: + .. toctree:: - :hidden: :maxdepth: 4 :titlesonly: diff --git a/apps/opik-documentation/python-sdk-docs/source/index.rst b/apps/opik-documentation/python-sdk-docs/source/index.rst index 2bf280632e..eb5192bc90 100644 --- a/apps/opik-documentation/python-sdk-docs/source/index.rst +++ b/apps/opik-documentation/python-sdk-docs/source/index.rst @@ -1,5 +1,5 @@ -opik -============== +Opik +==== ============= Main features @@ -9,11 +9,11 @@ The Comet Opik platform is a suite of tools that allow you to evaluate the outpu In includes the following features: -- `Tracing <...>`_: Ability to log LLM calls and traces to the Comet platform. -- `LLM evaluation metrics <...>`_: A set of functions that evaluate the output of an LLM, these are both heuristic metrics and LLM as a Judge. -- `Evaluation <...>`_: Ability to log test datasets in Comet and evaluate using some of our LLM evaluation metrics. +- `Tracing `_: Ability to log LLM calls and traces to the Opik platform. +- `LLM evaluation metrics `_: A set of functions that evaluate the output of an LLM, these are both heuristic metrics and LLM as a Judge. +- `Evaluation `_: Ability to log test datasets in Opik and evaluate using some of our LLM evaluation metrics. -For a more detailed overview of the platform, you can refer to the `Comet Opik documentation <...>`_. +For a more detailed overview of the platform, you can refer to the `Comet Opik documentation `_. ============ Installation @@ -24,7 +24,7 @@ To get start with the package, you can install it using pip:: pip install opik By default, all traces, datasets and experiments will be logged to the Comet Cloud platform. If you -would like to self-host the platform, you can refer to our `self-serve documentation <...>`_. +would like to self-host the platform, you can refer to our `self-serve documentation `_. ============= Using the SDK @@ -49,7 +49,7 @@ To log your first trace, you can use the `track` decorator:: **Note:** The `track` decorator supports nested functions, if you track multiple functions, each functionc call will be associated with the parent trace. -**Integrations**: If you are using LangChain or OpenAI, Comet Opik as `built-in integrations <...>`_ for these libraries. +**Integrations**: If you are using LangChain or OpenAI, Comet Opik as `built-in integrations `_ for these libraries. ---------------------------- Using LLM evaluation metrics @@ -78,7 +78,7 @@ Running evaluations Evaluations are run using the `evaluate` function, this function takes a dataset, a task and a list of metrics and returns a dictionary of scores:: - from opik import Comet, track + from opik import Opik, track from opik.evaluation import evaluate from opik.evaluation.metrics import EqualsMetric, HallucinationMetric from opik.integrations.openai import track_openai @@ -100,8 +100,8 @@ Evaluations are run using the `evaluate` function, this function takes a dataset return ["..."] # Fetch the dataset - comet = Comet() - dataset = comet.get_dataset(name="your-dataset-name") + client = Opik() + dataset = client.get_dataset(name="your-dataset-name") # Define the metrics equals_metric = EqualsMetric() @@ -121,39 +121,46 @@ Evaluations are run using the `evaluate` function, this function takes a dataset metrics=[equals_metric, hallucination_metric], ) +========= +Reference +========= +You can learn more about the `opik` python SDK in the following sections: .. toctree:: - :hidden: - - Comet + :maxdepth: 1 + + Opik track - comet_context/index + opik_context/index .. toctree:: :caption: Evaluation - :hidden: - :maxdepth: 4 + :maxdepth: 1 evaluation/Dataset evaluation/DatasetItem evaluation/evaluate evaluation/metrics/index +.. toctree:: + :caption: Testing + :maxdepth: 1 + + testing/llm_unit + .. toctree:: :caption: Integrations - :hidden: - :maxdepth: 4 + :maxdepth: 1 integrations/openai/index integrations/langchain/index .. toctree:: :caption: Objects - :hidden: - :maxdepth: 4 + :maxdepth: 1 Objects/Trace.rst Objects/Span.rst Objects/FeedbackScoreDict.rst - Objects/UsageDict.rst \ No newline at end of file + Objects/UsageDict.rst diff --git a/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/CometTracer.rst b/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/CometTracer.rst deleted file mode 100644 index 8ba43effd3..0000000000 --- a/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/CometTracer.rst +++ /dev/null @@ -1,5 +0,0 @@ -CometTracer -=========== - -.. autoclass:: opik.integrations.langchain.CometTracer - :members: \ No newline at end of file diff --git a/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/OpikTracer.rst b/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/OpikTracer.rst new file mode 100644 index 0000000000..8f821c2eae --- /dev/null +++ b/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/OpikTracer.rst @@ -0,0 +1,5 @@ +OpikTracer +========== + +.. autoclass:: opik.integrations.langchain.OpikTracer + :members: diff --git a/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/index.rst b/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/index.rst index 889c074223..a1d917f31d 100644 --- a/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/index.rst +++ b/apps/opik-documentation/python-sdk-docs/source/integrations/langchain/index.rst @@ -1,9 +1,34 @@ langchain ========= +Opik integrates with Langchain to allow you to log your Langchain calls to the Opik platform, simply wrap the Langchain client with `OpikTracer` to start logging:: + + from langchain.chains import LLMChain + from langchain_openai import OpenAI + from langchain.prompts import PromptTemplate + from opik.integrations.langchain import OpikTracer + + # Initialize the tracer + opik_tracer = OpikTracer() + + # Create the LLM Chain using LangChain + llm = OpenAI(temperature=0) + + prompt_template = PromptTemplate( + input_variables=["input"], + template="Translate the following text to French: {input}" + ) + + llm_chain = LLMChain(llm=llm, prompt=prompt_template) + + # Generate the translations + translation = llm_chain.run("Hello, how are you?", callbacks=[opik_tracer]) + print(translation) + +You can learn more about the `OpikTracer` decorator in the following section: + .. toctree:: - :hidden: :maxdepth: 4 :titlesonly: - CometTracer \ No newline at end of file + OpikTracer diff --git a/apps/opik-documentation/python-sdk-docs/source/integrations/openai/index.rst b/apps/opik-documentation/python-sdk-docs/source/integrations/openai/index.rst index 92543672b2..72c2f77dab 100644 --- a/apps/opik-documentation/python-sdk-docs/source/integrations/openai/index.rst +++ b/apps/opik-documentation/python-sdk-docs/source/integrations/openai/index.rst @@ -1,9 +1,22 @@ openai ======= +Opik integrates with OpenAI to allow you to log your OpenAI calls to the Opik platform, simply wrap the OpenAI client with `track_openai` to start logging:: + + from opik.integrations.openai import openai_wrapper + from openai import OpenAI + + openai_client = OpenAI() + openai_client = openai_wrapper(openai_client) + + response = openai_client.Completion.create( + prompt="Hello, world!", + ) + +You can learn more about the `track_openai` decorator in the following section: + .. toctree:: - :hidden: :maxdepth: 4 :titlesonly: - track_openai \ No newline at end of file + track_openai diff --git a/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_span.rst b/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_span.rst new file mode 100644 index 0000000000..ec1a47df3c --- /dev/null +++ b/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_span.rst @@ -0,0 +1,4 @@ +get_current_span +================ + +.. autofunction:: opik.opik_context.get_current_span diff --git a/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_trace.rst b/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_trace.rst new file mode 100644 index 0000000000..aaf13b07b9 --- /dev/null +++ b/apps/opik-documentation/python-sdk-docs/source/opik_context/get_current_trace.rst @@ -0,0 +1,4 @@ +get_current_trace +================= + +.. autofunction:: opik.opik_context.get_current_trace diff --git a/apps/opik-documentation/python-sdk-docs/source/opik_context/index.rst b/apps/opik-documentation/python-sdk-docs/source/opik_context/index.rst new file mode 100644 index 0000000000..f0475eb708 --- /dev/null +++ b/apps/opik-documentation/python-sdk-docs/source/opik_context/index.rst @@ -0,0 +1,20 @@ +opik_context +============ + +The opik context module provides a way to access the current span and trace from within a tracked function:: + + from opik import opik_context, track + + @track + def my_function(): + span = opik_context.get_current_span() + trace = opik_context.get_current_trace() + +You can learn more about each function in the following sections: + +.. toctree:: + :maxdepth: 4 + :titlesonly: + + get_current_span + get_current_trace diff --git a/apps/opik-documentation/python-sdk-docs/source/testing/llm_unit.rst b/apps/opik-documentation/python-sdk-docs/source/testing/llm_unit.rst new file mode 100644 index 0000000000..54ce84107f --- /dev/null +++ b/apps/opik-documentation/python-sdk-docs/source/testing/llm_unit.rst @@ -0,0 +1,7 @@ +llm_unit +======== + +.. autoclass:: opik.llm_unit + :members: + :inherited-members: + \ No newline at end of file diff --git a/apps/opik-frontend/README.md b/apps/opik-frontend/README.md index f5e278575b..a772a184cb 100644 --- a/apps/opik-frontend/README.md +++ b/apps/opik-frontend/README.md @@ -1,81 +1,3 @@ -# React Comet Opik +# Opik frontend -This is a frontend part of Comet Opik project - -## Getting Started - -### Install - -Access the project directory. - -```bash -cd apps/opik-frontend -``` - -In order to run the frontend, you will need to have node available locally. For -this recommend installing [nvm](https://github.com/nvm-sh/nvm). For this guide -we will assume you have nvm installed locally: - -```bash -# Use version 20.15.0 of node -nvm use lts/iron - -npm install -``` - -Start Develop serve with hot reload at . -The dev server is set up to work with Opik BE run on http://localhost:8080. All requests that tarts with `/api` prefix is proxying to it. -The server port can be changed in `vite.config.ts` file section `proxy`. - -```bash -npm start -``` - -### Lint - -```bash -npm run lint -``` - -### Typecheck - -```bash -npm run typecheck -``` - -### Build - -```bash -npm run build -``` - -### Test - -```bash -npm run test -``` - -View and interact with your tests via UI. - -```bash -npm run test:ui -``` - -## Comet Integration - -In order to run the frontend locally with the Comet integration we have to run the frontend in `comet` mode, but first, we should override the environment variables - -1. Create a new `.env.comet.local` file with this content: - -``` -VITE_BASE_URL=/opik/ -VITE_BASE_API_URL=/opik/api -VITE_BASE_COMET_URL=https://staging.dev.comet.com/ -VITE_BASE_COMET_API_URL=https://staging.dev.comet.com/api -``` - -2. Now you can start the frontend in `comet` mode: - -```bash -npm start -- --mode=comet -``` +If you would like to contribute to the Opik frontend, please refer to the [Contribution guide](./CONTRIBUTING.md). diff --git a/deployment/installer/README.md b/deployment/installer/README.md index 60a897c582..fec8141c04 100644 --- a/deployment/installer/README.md +++ b/deployment/installer/README.md @@ -1,169 +1,3 @@ -# Opik Server Installer & Manager +# Opik installer -The Opik server installer is a Python package that installs and manages the -Opik server on a local machine. -It aims to make this process as simple as possible, by reducing the number of -steps required to install the Opik server. - -## Usage - -To install the tool, run the following command: - -```bash -pip install opik-server -``` - -### Installing Opik Server - -To install the Opik server, run the following command: - -```bash -opik-server install -``` - -You can also run the installer in debug mode to see the details of the -installation process: - -```bash -opik-server --debug install -``` - -By default, the installer will install the same version of the Opik as its -own version (`opik-server -v`). If you want to install a specific version, you -can specify the version using the `--opik-version` flag: - -```bash -opik-server install --opik-version 0.1.0 -``` - -By default, the installer will setup a local port forward to the Opik server -using the port `5173`. If you want to use a different port, you can specify -the port using the `--local-port` flag: - -```bash -opik-server install --local-port 5174 -``` - -### Upgrading Opik Server - -To upgrade the Opik server, run the following command: - -```bash -pip install --upgrade opik-server -opik-server upgrade -``` - -Or upgrade to a specific version: - -```bash -opik-server upgrade --opik-version 0.1.1 -``` - -## Building the Python Package - -To build the package: - -1. Ensure that you have the necessary packaging dependencies installed: - -```bash -pip install -r pub-requirements.txt -``` - -2. Run the following command to build the package: - -```bash -python -m build --wheel -``` - -This will create a `dist` directory containing the built package. - -3. You can now upload the package to the PyPi repository using `twine`: - -```bash -twine upload dist/* -``` - -## QA Testing - -To test the installer, clone this repository onto the machine you want to -install the Opik server on and install the package using the following -commands: - -```bash -# Make sure pip is up to date -pip install --upgrade pip - -# Clone the repository -git clone git@github.com:comet-ml/opik.git - -# You may need to checkout the branch you want to test -# git checkout installer-pkg - -cd opik/deployment/installer/ - -# Install the package -pip install . -``` - -If your pip installation path you may get a warning that the package is not -installed in your `PATH`. This is fine, the package will still work. -But you will need to call the fully qualified path to the executable. -Review the warning message to see the path to the executable. - -```bash -# When the package is publically released none of these flags will be needed. -# and you will be able to simply run `opik-server install` -opik-server install --opik-version 0.1.0 -``` - -This will install the Opik server on your machine. - -By default this will hide the details of the installation process. If you want -to see the details of the installation process, you can add the `--debug` -flag just before the `install` command. - -```bash -opik-server --debug install ........ -``` - -If successful, the message will instruct you to run a kubectl command to -forward the necessary ports to your local machine, and provide you with the -URL to access the Opik server. - -### Uninstalling - -To uninstall the Opik server, run the following command: - -```bash -minikube delete -``` - -To reset the machine to a clean state, with no Opik server installed, it is -best to use a fresh VM. But if you want to reset the machine to a clean state -without reinstalling the VM, you can run the following commands: - -#### macOS - -```bash -minikube delete -brew uninstall minikube -brew uninstall helm -brew uninstall kubectl -brew uninstall --cask docker -rm -rf ~/.minikube -rm -rf ~/.helm -rm -rf ~/.kube -rm -rf ~/.docker -sudo find /usr/local/bin -lname '/Applications/Docker.app/*' -exec rm {} + -``` - -#### Ubuntu - -```bash -minikube delete -sudo apt-get remove helm kubectl minikube docker-ce containerd.io -rm -rf ~/.minikube -rm -rf ~/.helm -rm -rf ~/.kube -rm -rf ~/.docker -``` +If you would like to contribute to the Opik installer, please refer to the [Contribution guide](./CONTRIBUTING.md). diff --git a/readme-thumbnail.png b/readme-thumbnail.png new file mode 100644 index 0000000000..9b323c9700 Binary files /dev/null and b/readme-thumbnail.png differ diff --git a/sdks/python/README.md b/sdks/python/README.md index 05dd6ca170..17ad4a928d 100644 --- a/sdks/python/README.md +++ b/sdks/python/README.md @@ -1,8 +1,3 @@ -# opik-python +# Opik Python SDK -To install package in development mode run `pip install -e .` from the repository root directory. - -# Linters - -Before pushing your changes please run `pre-commit run --all-files` from the python/sdk directory. -It will install and run required linters. +If you would like to contribute to the Opik python SDK, please refer to the [Contribution guide](./CONTRIBUTING.md).