diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000000..9324f63c07 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,293 @@ +# Contributing to Opik + +We're excited that you're interested in contributing to Opik! There are many ways to contribute, from writing code to improving the documentation. + +The easiest way to get started is to: + +* Submit [bug reports](https://github.com/comet-ml/opik/issues) and [feature requests](https://github.com/comet-ml/opik/issues) +* Review the documentation and submit [Pull Requests](https://github.com/comet-ml/opik/pulls) to improve it +* Speaking or writing about Opik and [letting us know](https://chat.comet.com) +* Upvoting [popular feature requests](https://github.com/comet-ml/opik/issues?q=is%3Aissue+is%3Aopen+label%3A%22feature+request%22) to show your support + + +## Submitting a new issue or feature request + +### Submitting a new issue + +Thanks for taking the time to submit an issue, it's the best way to help us improve Opik! + +Before submitting a new issue, please check the [existing issues](https://github.com/comet-ml/opik/issues) to avoid duplicates. + +To help us understand the issue you're experiencing, please provide steps to reproduce the issue included a minimal code snippet that reproduces the issue. This helps us diagnose the issue and fix it more quickly. + +### Submitting a new feature request + +Feature requests are welcome! To help us understand the feature you'd like to see, please provide: + +1. A short description of the motivation behind this request +2. A detailed description of the feature you'd like to see, including any code snippets if applicable + +If you are in a position to submit a PR for the feature, feel free to open a PR ! + +## Project set up and Architecture + +The Opik project is made up of five main sub-projects: + +* `apps/opik-documentation`: The Opik documentation website +* `deployment/installer`: The Opik installer +* `sdks/python`: The Opik Python SDK +* `apps/opik-frontend`: The Opik frontend application +* `apps/opik-backend`: The Opik backend server + + +In addition, Opik relies on: + +1. Clickhouse: Used to trace traces, spans and feedback scores +2. MySQL: Used to store metadata associated with projects, datasets, experiments, etc. +3. Redis: Used for caching + +### Contributing to the documentation + +The documentation is made up of three main parts: + +1. `apps/opik-documentation/documentation`: The Opik documentation website +2. `apps/opik-documentation/python-sdk-docs`: The Python reference documentation +3. `apps/opik-documentation/rest-api-docs`: The REST API reference documentation + +#### Contributing to the documentation website + +The documentation website is built using [Docusaurus](https://docusaurus.io/) and is located in `apps/opik-documentation/documentation`. + +In order to run the documentation website locally, you need to have `npm` installed. Once installed, you can run the documentation locally using the following command: + +```bash +cd apps/opik-documentation/documentation + +# Install dependencies - Only needs to be run once +npm install + +# Run the documentation website locally +npm run start +``` + +You can then access the documentation website at `http://localhost:3000`. Any change you make to the documentation will be updated in real-time. + +#### Contributing to the Python SDK reference documentation + +The Python SDK reference documentation is built using [Sphinx](https://www.sphinx-doc.org/en/master/) and is located in `apps/opik-documentation/python-sdk-docs`. + +In order to run the Python SDK reference documentation locally, you need to have `python` and `pip` installed. Once installed, you can run the documentation locally using the following command: + +```bash +cd apps/opik-documentation/python-sdk-docs + +# Install dependencies - Only needs to be run once +pip install -r requirements.txt + +# Run the python sdk reference documentation locally +make dev +``` + +The Python SDK reference documentation will be built and available at `http://127.0.0.1:8000`. Any change you make to the documentation will be updated in real-time. + +### Contributing to the Python SDK + +The Python SDK is available under `sdks/python` and can be installed locally using `pip install -e sdks/python`. + +To test your changes locally, you can run Opik locally using `opik server install`. + +Before submitting a PR, please ensure that your code passes the test suite: + +```bash +cd sdks/python + +pytest tests/ +``` + +and the linter: + +```bash +cd sdks/python + +pre-commit run --all-files +``` + +> [!NOTE] +> If you changes impact public facing methods or docstrings, please also update the documentation. You can find more information about updating the docs in the [documentation contribution guide](#contributing-to-the-documentation). + +### Contributing to the installer + +The Opik server installer is a Python package that installs and manages the Opik server on a local machine. In order to achieve this, the installer relies on: + +1. Minikube: Used to manage the Kubernetes cluster +2. Helm: Used to manage the Kubernetes charts +3. Ansible: Used to manage the installation of the Opik server + +#### Building the package +In order to build the package: + +1. Ensure that you have the necessary packaging dependencies installed: + +```bash +pip install -r pub-requirements.txt +``` + +2. Run the following command to build the package: + +```bash +python -m build --wheel +``` + +This will create a `dist` directory containing the built package. + +3. You can now upload the package to the PyPi repository using `twine`: + +```bash +twine upload dist/* +``` + +#### QA Testing + +To test the installer, clone this repository onto the machine you want to +install the Opik server on and install the package using the following +commands: + +```bash +# Make sure pip is up to date +pip install --upgrade pip + +# Clone the repository +git clone git@github.com:comet-ml/opik.git + +# You may need to checkout the branch you want to test +# git checkout installer-pkg + +cd opik/deployment/installer/ + +# Install the package +pip install . +``` + +If your pip installation path you may get a warning that the package is not +installed in your `PATH`. This is fine, the package will still work. +But you will need to call the fully qualified path to the executable. +Review the warning message to see the path to the executable. + +```bash +# When the package is publically released none of these flags will be needed. +# and you will be able to simply run `opik-server install` +opik-server install --opik-version 0.1.0 +``` + +This will install the Opik server on your machine. + +By default this will hide the details of the installation process. If you want +to see the details of the installation process, you can add the `--debug` +flag just before the `install` command. + +```bash +opik-server --debug install ........ +``` + +If successful, the message will instruct you to run a kubectl command to +forward the necessary ports to your local machine, and provide you with the +URL to access the Opik server. + +#### Uninstalling + +To uninstall the Opik server, run the following command: + +```bash +minikube delete +``` + +To reset the machine to a clean state, with no Opik server installed, it is +best to use a fresh VM. But if you want to reset the machine to a clean state +without reinstalling the VM, you can run the following commands: + +##### macOS + +```bash +minikube delete +brew uninstall minikube +brew uninstall helm +brew uninstall kubectl +brew uninstall --cask docker +rm -rf ~/.minikube +rm -rf ~/.helm +rm -rf ~/.kube +rm -rf ~/.docker +sudo find /usr/local/bin -lname '/Applications/Docker.app/*' -exec rm {} + +``` + +##### Ubuntu + +```bash +minikube delete +sudo apt-get remove helm kubectl minikube docker-ce containerd.io +rm -rf ~/.minikube +rm -rf ~/.helm +rm -rf ~/.kube +rm -rf ~/.docker +``` + +### Contributing to the frontend + +The Opik frontend is a React application that is located in `apps/opik-frontend`. + +In order to run the frontend locally, you need to have `npm` installed. Once installed, you can run the frontend locally using the following command: + +```bash +cd apps/opik-frontend + +# Install dependencies - Only needs to be run once +npm install + +# Run the frontend locally +npm run start +``` + +You can then access the development frontend at `http://localhost:5174/`. Any change you make to the frontend will be updated in real-time. + +The dev server is set up to work with Opik BE run on `http://localhost:8080`. All requests made to `http://localhost:5174/api` are proxied to the backend. +The server port can be changed in `vite.config.ts` file section `proxy`. + +> [!NOTE] +> You will need to have the backend running locally in order for the frontend to work. For this, we recommend running a local instance of Opik using `opik server install`. + +Before submitting a PR, please ensure that your code passes the test suite, the linter and the type checker: + +```bash +cd apps/opik-frontend + +npm run test +npm run lint +npm run typecheck +``` + +### Contributing to the backend + +The Opik backend is a Java application that is located in `apps/opik-backend`. + +In order to run the backend locally, you need to have `java` and `maven` installed. Once installed, you can run the backend locally using the following command: + +```bash +cd apps/opik-backend + +# Build the Opik application +mvn clean install + +# Run the Opik application - +java -jar target/opik-backend-{project.pom.version}.jar server config.yml +``` +Replace `{project.pom.version}` with the version of the project in the pom file. + +Once the backend is running, you can access the Opik API at `http://localhost:8080`. + +Before submitting a PR, please ensure that your code passes the test suite: + +```bash +cd apps/opik-backend + +mvn test +``` diff --git a/README.md b/README.md index 5e8f82154d..e6cbb64362 100644 --- a/README.md +++ b/README.md @@ -1,117 +1,161 @@ -# opik +
+Confidently evaluate, test and monitor LLM applications. +
-Comet Opik contains two main services: -1. Frontend available at `apps/opik-frontend/README.md` -2. Backend available at `apps/opik-backend/README.md` ++ Website • + Slack community • + Twitter • + Documentation +
-# Pip install the local version of the SDK -pip install -e . -U -``` +![Opik thumbnail](readme-thumbnail.png) -## Running the full application locally with minikube +## 🚀 What is Opik? -### Installation Prerequisites +[Opik](https://www.comet.com/site/products/opik) is an open-source platform for evaluating, testing and monitoring LLM applications. Built by [Comet](https://www.comet.com). -- Docker - https://docs.docker.com/engine/install/ +╭─ HaluBench (500 samples) ────────────╮\n", - "│ │\n", - "│ Total time: 00:00:53 │\n", - "│ Number of samples: 500 │\n", - "│ │\n", - "│ Detected hallucination: 0.8020 (avg) │\n", - "│ │\n", - "╰──────────────────────────────────────╯\n", - "\n" - ], - "text/plain": [ - "╭─ HaluBench (500 samples) ────────────╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:53 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 500 │\n", - "│ │\n", - "│ \u001b[1;32mDetected hallucination: 0.8020 (avg)\u001b[0m │\n", - "│ │\n", - "╰──────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
- "
\n"
- ],
- "text/plain": [
- "Uploading results to Opik \u001b[33m...\u001b[0m \n"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
+ "outputs": [],
"source": [
"from opik.evaluation.metrics import Hallucination\n",
"from opik.evaluation import evaluate\n",
@@ -154,8 +154,6 @@
" self.name = name\n",
"\n",
" def score(self, hallucination_score, expected_hallucination_score, **kwargs):\n",
- " expected_hallucination_score = 1 if expected_hallucination_score == \"FAIL\" else 0\n",
- " \n",
" return score_result.ScoreResult(\n",
" value= None if hallucination_score is None else hallucination_score == expected_hallucination_score,\n",
" name=self.name,\n",
@@ -179,7 +177,7 @@
" hallucination_reason = str(e)\n",
" \n",
" return {\n",
- " \"hallucination_score\": hallucination_score,\n",
+ " \"hallucination_score\": \"FAIL\" if hallucination_score == 1 else \"PASS\",\n",
" \"hallucination_reason\": hallucination_reason,\n",
" \"expected_hallucination_score\": x.expected_output[\"expected_output\"]\n",
" }\n",
@@ -198,8 +196,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset."
+ "We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset and we can see the specific items where hallucinations were not detected.\n",
+ "\n",
+ "![Hallucination Evaluation](/img/cookbook/hallucination_metric_cookbook.png)"
]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
}
],
"metadata": {
diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md
index 1a4ce3ed72..1d4267d341 100644
--- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md
+++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_hallucination_metric.md
@@ -1,9 +1,34 @@
# Evaluating Opik's Hallucination Metric
-*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*
-
For this guide we will be evaluating the Hallucination metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Hallucination metric included in the SDK.
+## Creating an account on Comet.com
+
+[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.
+
+> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information.
+
+
+```python
+import os
+import getpass
+
+os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ")
+os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ")
+```
+
+If you are running the Opik platform locally, simply set:
+
+
+```python
+# import os
+# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api"
+```
+
+## Preparing our environment
+
+First, we will install the necessary libraries, configure the OpenAI API key and create a new Opik dataset
+
```python
%pip install pyarrow fsspec huggingface_hub --quiet
@@ -15,7 +40,6 @@ For this guide we will be evaluating the Hallucination metric included in the LL
import os
import getpass
-os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api"
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")
```
@@ -28,6 +52,7 @@ from opik import Opik, DatasetItem
import pandas as pd
client = Opik()
+
try:
# Create dataset
dataset = client.create_dataset(name="HaluBench", description="HaluBench dataset")
@@ -54,8 +79,9 @@ except Exception as e:
print(e)
```
- status_code: 409, body: {'errors': ['Dataset already exists']}
+## Evaluating the hallucination metric
+We can use the Opik SDK to compute a hallucination score for each item in the dataset:
```python
@@ -72,8 +98,6 @@ class CheckHallucinated(base_metric.BaseMetric):
self.name = name
def score(self, hallucination_score, expected_hallucination_score, **kwargs):
- expected_hallucination_score = 1 if expected_hallucination_score == "FAIL" else 0
-
return score_result.ScoreResult(
value= None if hallucination_score is None else hallucination_score == expected_hallucination_score,
name=self.name,
@@ -97,7 +121,7 @@ def evaluation_task(x: DatasetItem):
hallucination_reason = str(e)
return {
- "hallucination_score": hallucination_score,
+ "hallucination_score": "FAIL" if hallucination_score == 1 else "PASS",
"hallucination_reason": hallucination_reason,
"expected_hallucination_score": x.expected_output["expected_output"]
}
@@ -112,27 +136,8 @@ res = evaluate(
)
```
- Running tasks: 100%|██████████| 500/500 [00:53<00:00, 9.43it/s]
- Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 513253.06it/s]
-
-
-
-╭─ HaluBench (500 samples) ────────────╮ -│ │ -│ Total time: 00:00:53 │ -│ Number of samples: 500 │ -│ │ -│ Detected hallucination: 0.8020 (avg) │ -│ │ -╰──────────────────────────────────────╯ -- - - - -
Uploading results to Opik ...
-
+We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset and we can see the specific items where hallucinations were not detected.
+![Hallucination Evaluation](/img/cookbook/hallucination_metric_cookbook.png)
-We can see that the hallucination metric is able to detect ~80% of the hallucinations contained in the dataset.
diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb
index 90bcb11862..98eb7e5a0f 100644
--- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb
+++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.ipynb
@@ -11,17 +11,65 @@
"For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK."
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Creating an account on Comet.com\n",
+ "\n",
+ "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n",
+ "\n",
+ "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import getpass\n",
+ "\n",
+ "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n",
+ "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are running the Opik platform locally, simply set:"
+ ]
+ },
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#import os\n",
+ "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preparing our environment\n",
+ "\n",
+ "First, we will install the necessary libraries and configure the OpenAI API key and download a reference moderation dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
- "# Configure OpenAI\n",
"import os\n",
"import getpass\n",
"\n",
- "os.environ[\"COMET_URL_OVERRIDE\"] = \"http://localhost:5173/api\"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key: \")"
]
},
@@ -34,17 +82,9 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "status_code: 409, body: {'errors': ['Dataset already exists']}\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Create dataset\n",
"from opik import Opik, DatasetItem\n",
@@ -87,60 +127,20 @@
" print(e)"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Evaluating the moderation metric\n",
+ "\n",
+ "We can use the Opik SDK to compute a moderation score for each item in the dataset:"
+ ]
+ },
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Running tasks: 100%|██████████| 500/500 [00:34<00:00, 14.44it/s]\n",
- "Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 379712.48it/s]\n"
- ]
- },
- {
- "data": {
- "text/html": [
- "╭─ OpenAIModerationDataset (500 samples) ─╮\n", - "│ │\n", - "│ Total time: 00:00:34 │\n", - "│ Number of samples: 500 │\n", - "│ │\n", - "│ Detected Moderation: 0.8460 (avg) │\n", - "│ │\n", - "╰─────────────────────────────────────────╯\n", - "\n" - ], - "text/plain": [ - "╭─ OpenAIModerationDataset (500 samples) ─╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:34 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 500 │\n", - "│ │\n", - "│ \u001b[1;32mDetected Moderation: 0.8460 (avg)\u001b[0m │\n", - "│ │\n", - "╰─────────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
- "
\n"
- ],
- "text/plain": [
- "Uploading results to Opik \u001b[33m...\u001b[0m \n"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
+ "outputs": [],
"source": [
"from opik.evaluation.metrics import Moderation\n",
"from opik.evaluation import evaluate\n",
@@ -196,7 +196,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model."
+ "We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. We can view a breakdown of the results in the Opik UI:\n",
+ "\n",
+ "![Moderation Evaluation](/img/cookbook/moderation_metric_cookbook.png)"
]
}
],
diff --git a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md
index 8da8450f6f..9dcfc96e45 100644
--- a/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md
+++ b/apps/opik-documentation/documentation/docs/cookbook/evaluate_moderation_metric.md
@@ -4,13 +4,38 @@
For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK.
+## Creating an account on Comet.com
+
+[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.
+
+> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information.
+
+
+```python
+import os
+import getpass
+
+os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ")
+os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ")
+```
+
+If you are running the Opik platform locally, simply set:
+
+
+```python
+#import os
+# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api"
+```
+
+## Preparing our environment
+
+First, we will install the necessary libraries and configure the OpenAI API key and download a reference moderation dataset.
+
```python
-# Configure OpenAI
import os
import getpass
-os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api"
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")
```
@@ -59,8 +84,9 @@ except Exception as e:
print(e)
```
- status_code: 409, body: {'errors': ['Dataset already exists']}
+## Evaluating the moderation metric
+We can use the Opik SDK to compute a moderation score for each item in the dataset:
```python
@@ -114,27 +140,6 @@ res = evaluate(
)
```
- Running tasks: 100%|██████████| 500/500 [00:34<00:00, 14.44it/s]
- Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 379712.48it/s]
-
-
-
-╭─ OpenAIModerationDataset (500 samples) ─╮ -│ │ -│ Total time: 00:00:34 │ -│ Number of samples: 500 │ -│ │ -│ Detected Moderation: 0.8460 (avg) │ -│ │ -╰─────────────────────────────────────────╯ -- - - - -
Uploading results to Opik ...
-
-
-
+We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. We can view a breakdown of the results in the Opik UI:
-We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model.
+![Moderation Evaluation](/img/cookbook/moderation_metric_cookbook.png)
diff --git a/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb b/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb
index 10683b5a43..aee519aa8d 100644
--- a/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb
+++ b/apps/opik-documentation/documentation/docs/cookbook/langchain.ipynb
@@ -4,9 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Using LLM Evaluation with Langchain\n",
- "\n",
- "*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*\n",
+ "# Using Opik with Langchain\n",
"\n",
"For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data.\n",
"\n",
@@ -17,6 +15,47 @@
"3. Automating the evaluation of the SQL queries on the synthetic dataset"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Creating an account on Comet.com\n",
+ "\n",
+ "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n",
+ "\n",
+ "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import getpass\n",
+ "\n",
+ "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n",
+ "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are running the Opik platform locally, simply set:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import os\n",
+ "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\""
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},
@@ -28,34 +67,18 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Note: you may need to restart the kernel to use updated packages.\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "%pip install --upgrade --quiet langchain langchain-community langchain-openai"
+ "%pip install --upgrade --quiet opik langchain langchain-community langchain-openai"
]
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 19,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Chinook database downloaded\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Download the relevant data\n",
"import os\n",
@@ -65,7 +88,12 @@
"import os\n",
"\n",
"url = \"https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite\"\n",
- "filename = \"Chinook_Sqlite.sqlite\"\n",
+ "filename = \"./data/chinook/Chinook_Sqlite.sqlite\"\n",
+ "\n",
+ "folder = os.path.dirname(filename)\n",
+ "\n",
+ "if not os.path.exists(folder):\n",
+ " os.makedirs(folder)\n",
"\n",
"if not os.path.exists(filename):\n",
" response = requests.get(url)\n",
@@ -78,15 +106,12 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
- "\n",
- "os.environ[\"COMET_URL_OVERRIDE\"] = \"http://localhost:5173/api\"\n",
- "\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key: \")"
]
},
@@ -103,46 +128,15 @@
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{\n",
- " \"result\": [\n",
- " \"Which customer has made the most purchases in terms of total dollars spent?\",\n",
- " \"What is the total number of tracks sold in each genre?\",\n",
- " \"How many unique albums have been purchased by customers from different countries?\",\n",
- " \"Which employee sold the most expensive track?\",\n",
- " \"What is the average length of tracks purchased by customers from each country?\",\n",
- " \"Which customer has spent the most money on tracks in the rock genre?\",\n",
- " \"What is the total revenue generated by each employee?\",\n",
- " \"How many unique artists are featured in each playlist?\",\n",
- " \"Which customer has the highest average rating on their purchased tracks?\",\n",
- " \"What is the total value of invoices generated by each sales support agent?\",\n",
- " \"How many tracks have been sold to customers in each country?\",\n",
- " \"Which artist has the most tracks featured in the top 100 selling tracks?\",\n",
- " \"What is the total value of invoices generated in each year?\",\n",
- " \"How many unique tracks have been purchased by customers in each city?\",\n",
- " \"Which employee has the highest average rating on tracks they have sold?\",\n",
- " \"What is the total number of tracks purchased by customers who have purchased tracks in the pop genre?\",\n",
- " \"Which customer has purchased the highest number of unique tracks?\",\n",
- " \"How many customer transactions have occurred in each year?\",\n",
- " \"Which artist has the most tracks featured in the top 100 selling tracks in the rock genre?\",\n",
- " \"What is the total number of tracks purchased by customers who have purchased tracks in the jazz genre?\"\n",
- " ]\n",
- "}\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"from opik.integrations.openai import track_openai\n",
"from openai import OpenAI\n",
"import json\n",
"\n",
- "os.environ[\"COMET_PROJECT_NAME\"] = \"openai-integration\"\n",
+ "os.environ[\"OPIK_PROJECT_NAME\"] = \"langchain-integration-demo\"\n",
"client = OpenAI()\n",
"\n",
"openai_client = track_openai(client)\n",
@@ -174,7 +168,7 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -202,34 +196,25 @@
"\n",
"We will be using the `create_sql_query_chain` function from the `langchain` library to create a SQL query to answer the question.\n",
"\n",
- "We will be using the `CometTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet."
+ "We will be using the `OpikTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet."
]
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "SELECT COUNT(\"EmployeeId\") AS \"TotalEmployees\" FROM \"Employee\"\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"# Use langchain to create a SQL query to answer the question\n",
"from langchain.chains import create_sql_query_chain\n",
"from langchain_openai import ChatOpenAI\n",
"from opik.integrations.langchain import OpikTracer\n",
"\n",
- "os.environ[\"COMET_PROJECT_NAME\"] = \"sql_question_answering\"\n",
"opik_tracer = OpikTracer(tags=[\"simple_chain\"])\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
- "chain = create_sql_query_chain(llm, db)\n",
- "response = chain.invoke({\"question\": \"How many employees are there ?\"}, {\"callbacks\": [opik_tracer]})\n",
+ "chain = create_sql_query_chain(llm, db).with_config({\"callbacks\": [opik_tracer]})\n",
+ "response = chain.invoke({\"question\": \"How many employees are there ?\"})\n",
"response\n",
"\n",
"print(response)"
@@ -248,77 +233,45 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "Running tasks: 100%|██████████| 20/20 [00:03<00:00, 5.37it/s]\n",
- "Scoring outputs: 100%|██████████| 20/20 [00:00<00:00, 82321.96it/s]\n"
- ]
- },
- {
- "data": {
- "text/html": [
- "╭─ synthetic_questions (20 samples) ─╮\n", - "│ │\n", - "│ Total time: 00:00:03 │\n", - "│ Number of samples: 20 │\n", - "│ │\n", - "│ ContainsHello: 0.0000 (avg) │\n", - "│ │\n", - "╰────────────────────────────────────╯\n", - "\n" - ], - "text/plain": [ - "╭─ synthetic_questions (20 samples) ─╮\n", - "│ │\n", - "│ \u001b[1mTotal time: \u001b[0m 00:00:03 │\n", - "│ \u001b[1mNumber of samples:\u001b[0m 20 │\n", - "│ │\n", - "│ \u001b[1;32mContainsHello: 0.0000 (avg)\u001b[0m │\n", - "│ │\n", - "╰────────────────────────────────────╯\n" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "
Uploading results to Opik ... \n",
- "
\n"
- ],
- "text/plain": [
- "Uploading results to Opik \u001b[33m...\u001b[0m \n"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
+ "outputs": [],
"source": [
"from opik import Opik, track\n",
"from opik.evaluation import evaluate\n",
- "from opik.evaluation.metrics import Contains\n",
- "\n",
- "\n",
- "contains_hello = Contains(name=\"ContainsHello\")\n",
+ "from opik.evaluation.metrics import base_metric, score_result\n",
+ "from typing import Any\n",
+ "\n",
+ "class ValidSQLQuery(base_metric.BaseMetric):\n",
+ " def __init__(self, name: str, db: Any):\n",
+ " self.name = name\n",
+ " self.db = db\n",
+ "\n",
+ " def score(self, output: str, **ignored_kwargs: Any):\n",
+ " # Add you logic here\n",
+ "\n",
+ " try:\n",
+ " db.run(output)\n",
+ " return score_result.ScoreResult(\n",
+ " name=self.name,\n",
+ " value=1,\n",
+ " reason=\"Query ran successfully\"\n",
+ " )\n",
+ " except Exception as e:\n",
+ " return score_result.ScoreResult(\n",
+ " name=self.name,\n",
+ " value=0,\n",
+ " reason=str(e)\n",
+ " )\n",
+ "\n",
+ "valid_sql_query = ValidSQLQuery(name=\"valid_sql_query\", db=db)\n",
"\n",
"client = Opik()\n",
"dataset = client.get_dataset(\"synthetic_questions\")\n",
"\n",
"@track()\n",
- "def llm_chain(input):\n",
- " opik_tracer = OpikTracer(tags=[\"simple_chain\"])\n",
- "\n",
- " db = SQLDatabase.from_uri(\"sqlite:///Chinook_Sqlite.sqlite\")\n",
- " llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0)\n",
- " chain = create_sql_query_chain(llm, db)\n",
- " response = chain.invoke({\"question\": input}, {\"callbacks\": [opik_tracer]})\n",
+ "def llm_chain(input: str) -> str:\n",
+ " response = chain.invoke({\"question\": input})\n",
" \n",
" return response\n",
"\n",
@@ -331,25 +284,25 @@
" }\n",
"\n",
"res = evaluate(\n",
- " experiment_name=\"sql_question_answering_v2\",\n",
+ " experiment_name=\"SQL question answering\",\n",
" dataset=dataset,\n",
" task=evaluation_task,\n",
- " scoring_metrics=[contains_hello]\n",
+ " scoring_metrics=[valid_sql_query]\n",
")"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [],
- "source": []
+ "source": [
+ "The evaluation results are now uploaded to the Opik platform and can be viewed in the UI.\n",
+ "\n",
+ "![LangChain Evaluation](/img/cookbook/langchain_cookbook.png)"
+ ]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {},
- "outputs": [],
"source": []
}
],
diff --git a/apps/opik-documentation/documentation/docs/cookbook/langchain.md b/apps/opik-documentation/documentation/docs/cookbook/langchain.md
index fd1f4fecf0..066841484a 100644
--- a/apps/opik-documentation/documentation/docs/cookbook/langchain.md
+++ b/apps/opik-documentation/documentation/docs/cookbook/langchain.md
@@ -1,6 +1,4 @@
-# Using LLM Evaluation with Langchain
-
-*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*
+# Using Opik with Langchain
For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data.
@@ -10,18 +8,38 @@ We will highlight three different parts of the workflow:
2. Creating a LangChain chain to generate SQL queries
3. Automating the evaluation of the SQL queries on the synthetic dataset
-## Preparing our environment
+## Creating an account on Comet.com
-First, we will install the necessary libraries, download the Chinook database and set up our different API keys.
+[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.
+
+> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information.
+
+
+```python
+import os
+import getpass
+
+os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ")
+os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ")
+```
+
+If you are running the Opik platform locally, simply set:
```python
-%pip install --upgrade --quiet langchain langchain-community langchain-openai
+# import os
+# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api"
```
- Note: you may need to restart the kernel to use updated packages.
+## Preparing our environment
+
+First, we will install the necessary libraries, download the Chinook database and set up our different API keys.
+```python
+%pip install --upgrade --quiet opik langchain langchain-community langchain-openai
+```
+
```python
# Download the relevant data
@@ -32,7 +50,12 @@ import requests
import os
url = "https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite"
-filename = "Chinook_Sqlite.sqlite"
+filename = "./data/chinook/Chinook_Sqlite.sqlite"
+
+folder = os.path.dirname(filename)
+
+if not os.path.exists(folder):
+ os.makedirs(folder)
if not os.path.exists(filename):
response = requests.get(url)
@@ -43,16 +66,10 @@ if not os.path.exists(filename):
db = SQLDatabase.from_uri(f"sqlite:///{filename}")
```
- Chinook database downloaded
-
-
```python
import os
import getpass
-
-os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api"
-
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")
```
@@ -68,7 +85,7 @@ from opik.integrations.openai import track_openai
from openai import OpenAI
import json
-os.environ["COMET_PROJECT_NAME"] = "openai-integration"
+os.environ["OPIK_PROJECT_NAME"] = "langchain-integration-demo"
client = OpenAI()
openai_client = track_openai(client)
@@ -91,32 +108,6 @@ completion = openai_client.chat.completions.create(
print(completion.choices[0].message.content)
```
- {
- "result": [
- "Which customer has made the most purchases in terms of total dollars spent?",
- "What is the total number of tracks sold in each genre?",
- "How many unique albums have been purchased by customers from different countries?",
- "Which employee sold the most expensive track?",
- "What is the average length of tracks purchased by customers from each country?",
- "Which customer has spent the most money on tracks in the rock genre?",
- "What is the total revenue generated by each employee?",
- "How many unique artists are featured in each playlist?",
- "Which customer has the highest average rating on their purchased tracks?",
- "What is the total value of invoices generated by each sales support agent?",
- "How many tracks have been sold to customers in each country?",
- "Which artist has the most tracks featured in the top 100 selling tracks?",
- "What is the total value of invoices generated in each year?",
- "How many unique tracks have been purchased by customers in each city?",
- "Which employee has the highest average rating on tracks they have sold?",
- "What is the total number of tracks purchased by customers who have purchased tracks in the pop genre?",
- "Which customer has purchased the highest number of unique tracks?",
- "How many customer transactions have occurred in each year?",
- "Which artist has the most tracks featured in the top 100 selling tracks in the rock genre?",
- "What is the total number of tracks purchased by customers who have purchased tracks in the jazz genre?"
- ]
- }
-
-
Now that we have our synthetic dataset, we can create a dataset in Comet and insert the questions into it.
@@ -141,7 +132,7 @@ except Exception as e:
We will be using the `create_sql_query_chain` function from the `langchain` library to create a SQL query to answer the question.
-We will be using the `CometTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet.
+We will be using the `OpikTracer` class from the `opik` library to ensure that the LangChan trace are being tracked in Comet.
```python
@@ -150,20 +141,16 @@ from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI
from opik.integrations.langchain import OpikTracer
-os.environ["COMET_PROJECT_NAME"] = "sql_question_answering"
opik_tracer = OpikTracer(tags=["simple_chain"])
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
-chain = create_sql_query_chain(llm, db)
-response = chain.invoke({"question": "How many employees are there ?"}, {"callbacks": [opik_tracer]})
+chain = create_sql_query_chain(llm, db).with_config({"callbacks": [opik_tracer]})
+response = chain.invoke({"question": "How many employees are there ?"})
response
print(response)
```
- SELECT COUNT("EmployeeId") AS "TotalEmployees" FROM "Employee"
-
-
## Automatting the evaluation
In order to ensure our LLM application is working correctly, we will test it on our synthetic dataset.
@@ -174,22 +161,39 @@ For this we will be using the `evaluate` function from the `opik` library. We wi
```python
from opik import Opik, track
from opik.evaluation import evaluate
-from opik.evaluation.metrics import Contains
-
-
-contains_hello = Contains(name="ContainsHello")
+from opik.evaluation.metrics import base_metric, score_result
+from typing import Any
+
+class ValidSQLQuery(base_metric.BaseMetric):
+ def __init__(self, name: str, db: Any):
+ self.name = name
+ self.db = db
+
+ def score(self, output: str, **ignored_kwargs: Any):
+ # Add you logic here
+
+ try:
+ db.run(output)
+ return score_result.ScoreResult(
+ name=self.name,
+ value=1,
+ reason="Query ran successfully"
+ )
+ except Exception as e:
+ return score_result.ScoreResult(
+ name=self.name,
+ value=0,
+ reason=str(e)
+ )
+
+valid_sql_query = ValidSQLQuery(name="valid_sql_query", db=db)
client = Opik()
dataset = client.get_dataset("synthetic_questions")
@track()
-def llm_chain(input):
- opik_tracer = OpikTracer(tags=["simple_chain"])
-
- db = SQLDatabase.from_uri("sqlite:///Chinook_Sqlite.sqlite")
- llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
- chain = create_sql_query_chain(llm, db)
- response = chain.invoke({"question": input}, {"callbacks": [opik_tracer]})
+def llm_chain(input: str) -> str:
+ response = chain.invoke({"question": input})
return response
@@ -202,42 +206,15 @@ def evaluation_task(item):
}
res = evaluate(
- experiment_name="sql_question_answering_v2",
+ experiment_name="SQL question answering",
dataset=dataset,
task=evaluation_task,
- scoring_metrics=[contains_hello]
+ scoring_metrics=[valid_sql_query]
)
```
- Running tasks: 100%|██████████| 20/20 [00:03<00:00, 5.37it/s]
- Scoring outputs: 100%|██████████| 20/20 [00:00<00:00, 82321.96it/s]
-
-
-
-╭─ synthetic_questions (20 samples) ─╮ -│ │ -│ Total time: 00:00:03 │ -│ Number of samples: 20 │ -│ │ -│ ContainsHello: 0.0000 (avg) │ -│ │ -╰────────────────────────────────────╯ -- +The evaluation results are now uploaded to the Opik platform and can be viewed in the UI. +![LangChain Evaluation](/img/cookbook/langchain_cookbook.png) -
Uploading results to Opik ...
-
-
-
-
-
-```python
-
-```
-
-
-```python
-
-```
diff --git a/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb b/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb
new file mode 100644
index 0000000000..95ec0d7ee6
--- /dev/null
+++ b/apps/opik-documentation/documentation/docs/cookbook/openai.ipynb
@@ -0,0 +1,232 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Using Opik with OpenAI\n",
+ "\n",
+ "Opik integrates with OpenAI to provide a simple way to log traces for all OpenAI LLM calls. This works for all OpenAI models, including if you are using the streaming API.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Creating an account on Comet.com\n",
+ "\n",
+ "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n",
+ "\n",
+ "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import getpass\n",
+ "\n",
+ "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n",
+ "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are running the Opik platform locally, simply set:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import os\n",
+ "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preparing our environment\n",
+ "\n",
+ "First, we will install the necessary libraries and set up our OpenAI API keys."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install --upgrade --quiet opik openai"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import getpass\n",
+ "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key: \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Logging traces\n",
+ "\n",
+ "In order to log traces to Opik, we need to wrap our OpenAI calls with the `track_openai` function:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Opik was a mischievous little elf who loved pulling pranks on his friends in the enchanted forest. One day, his antics went too far and he accidentally turned himself into a fluffy pink bunny.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from opik.integrations.openai import track_openai\n",
+ "from openai import OpenAI\n",
+ "\n",
+ "os.environ[\"OPIK_PROJECT_NAME\"] = \"openai-integration-demo\"\n",
+ "client = OpenAI()\n",
+ "\n",
+ "openai_client = track_openai(client)\n",
+ "\n",
+ "prompt = \"\"\"\n",
+ "Write a short two sentence story about Opik.\n",
+ "\"\"\"\n",
+ "\n",
+ "completion = openai_client.chat.completions.create(\n",
+ " model=\"gpt-3.5-turbo\",\n",
+ " messages=[\n",
+ " {\"role\": \"user\", \"content\": prompt}\n",
+ " ]\n",
+ ")\n",
+ "\n",
+ "print(completion.choices[0].message.content)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The prompt and response messages are automatically logged to Opik and can be viewed in the UI.\n",
+ "\n",
+ "![OpenAI Integration](/img/cookbook/openai_trace_cookbook.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Using it with the `track` decorator\n",
+ "\n",
+ "If you have multiple steps in your LLM pipeline, you can use the `track` decorator to log the traces for each step. If OpenAI is called within one of these steps, the LLM call with be associated with that corresponding step:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "\"Opik was a young wizard who lived in the small village of Mithos, where magic was both feared and revered. From a young age, Opik had shown a natural talent for magic, much to the dismay of his parents who were simple farmers. They feared the power that their son possessed and did everything they could to suppress it.\\n\\nDespite his parents' efforts, Opik continued to practice his magic in secret, honing his skills and learning all he could about the ancient art. He longed to become a powerful wizard, respected and feared by all who knew him. But as he grew older, he also began to realize that his thirst for power was beginning to consume him, turning him into a dark and reckless mage.\\n\\nOne day, a mysterious figure approached Opik in the village square, offering him a chance to join a secret society of powerful wizards. Intrigued by the offer, Opik accepted and was soon initiated into the group, which called themselves the Arcanum.\\n\\nUnder the guidance of the Arcanum, Opik's power grew exponentially. He could wield spells of immense power, bending reality to his will with a mere flick of his wrist. But as his power grew, so did his arrogance and greed. He began to see himself as above all others, using his magic to manipulate and control those around him.\\n\\nOne day, a great evil swept across the land, threatening to destroy everything in its path. The Arcanum tasked Opik with defeating this evil, seeing it as a chance for him to prove his worth and redeem himself. But as he faced the darkness head-on, Opik realized that true power lay not in domination and control, but in compassion and selflessness.\\n\\nIn a moment of clarity, Opik cast aside his dark ambitions and embraced the light within him. With newfound resolve, he fought against the evil that threatened his home, using his magic not to destroy, but to protect and heal. In the end, it was not his raw power that saved the day, but his courage and heart.\\n\\nAnd so, Opik returned to his village a changed man, no longer seeking power for power's sake, but striving to use his magic for the good of all. The villagers welcomed him back with open arms, seeing in him a hero and a protector. And as he walked among them, a new journey unfolded before him - a journey of redemption, compassion, and true magic.\""
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from opik import track\n",
+ "from opik.integrations.openai import track_openai\n",
+ "from openai import OpenAI\n",
+ "\n",
+ "os.environ[\"OPIK_PROJECT_NAME\"] = \"openai-integration-demo\"\n",
+ "\n",
+ "client = OpenAI()\n",
+ "openai_client = track_openai(client)\n",
+ "\n",
+ "@track\n",
+ "def generate_story(prompt):\n",
+ " res = openai_client.chat.completions.create(\n",
+ " model=\"gpt-3.5-turbo\",\n",
+ " messages=[\n",
+ " {\"role\": \"user\", \"content\": prompt}\n",
+ " ]\n",
+ " )\n",
+ " return res.choices[0].message.content\n",
+ "\n",
+ "@track\n",
+ "def generate_topic():\n",
+ " prompt = \"Generate a topic for a story about Opik.\"\n",
+ " res = openai_client.chat.completions.create(\n",
+ " model=\"gpt-3.5-turbo\",\n",
+ " messages=[\n",
+ " {\"role\": \"user\", \"content\": prompt}\n",
+ " ]\n",
+ " )\n",
+ " return res.choices[0].message.content\n",
+ "\n",
+ "@track\n",
+ "def generate_opik_story():\n",
+ " topic = generate_topic()\n",
+ " story = generate_story(topic)\n",
+ " return story\n",
+ "\n",
+ "generate_opik_story()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The trace can now be viewed in the UI:\n",
+ "\n",
+ "![OpenAI Integration](/img/cookbook/openai_trace_decorator_cookbook.png)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "py312_llm_eval",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/apps/opik-documentation/documentation/docs/cookbook/openai.md b/apps/opik-documentation/documentation/docs/cookbook/openai.md
new file mode 100644
index 0000000000..7c0a84d861
--- /dev/null
+++ b/apps/opik-documentation/documentation/docs/cookbook/openai.md
@@ -0,0 +1,135 @@
+# Using Opik with OpenAI
+
+Opik integrates with OpenAI to provide a simple way to log traces for all OpenAI LLM calls. This works for all OpenAI models, including if you are using the streaming API.
+
+
+## Creating an account on Comet.com
+
+[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.
+
+> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information.
+
+
+```python
+import os
+import getpass
+
+os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ")
+os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ")
+```
+
+If you are running the Opik platform locally, simply set:
+
+
+```python
+# import os
+# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api"
+```
+
+## Preparing our environment
+
+First, we will install the necessary libraries and set up our OpenAI API keys.
+
+
+```python
+%pip install --upgrade --quiet opik openai
+```
+
+
+```python
+import os
+import getpass
+os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")
+```
+
+## Logging traces
+
+In order to log traces to Opik, we need to wrap our OpenAI calls with the `track_openai` function:
+
+
+```python
+from opik.integrations.openai import track_openai
+from openai import OpenAI
+
+os.environ["OPIK_PROJECT_NAME"] = "openai-integration-demo"
+client = OpenAI()
+
+openai_client = track_openai(client)
+
+prompt = """
+Write a short two sentence story about Opik.
+"""
+
+completion = openai_client.chat.completions.create(
+ model="gpt-3.5-turbo",
+ messages=[
+ {"role": "user", "content": prompt}
+ ]
+)
+
+print(completion.choices[0].message.content)
+```
+
+ Opik was a mischievous little elf who loved pulling pranks on his friends in the enchanted forest. One day, his antics went too far and he accidentally turned himself into a fluffy pink bunny.
+
+
+The prompt and response messages are automatically logged to Opik and can be viewed in the UI.
+
+![OpenAI Integration](/img/cookbook/openai_trace_cookbook.png)
+
+## Using it with the `track` decorator
+
+If you have multiple steps in your LLM pipeline, you can use the `track` decorator to log the traces for each step. If OpenAI is called within one of these steps, the LLM call with be associated with that corresponding step:
+
+
+```python
+from opik import track
+from opik.integrations.openai import track_openai
+from openai import OpenAI
+
+os.environ["OPIK_PROJECT_NAME"] = "openai-integration-demo"
+
+client = OpenAI()
+openai_client = track_openai(client)
+
+@track
+def generate_story(prompt):
+ res = openai_client.chat.completions.create(
+ model="gpt-3.5-turbo",
+ messages=[
+ {"role": "user", "content": prompt}
+ ]
+ )
+ return res.choices[0].message.content
+
+@track
+def generate_topic():
+ prompt = "Generate a topic for a story about Opik."
+ res = openai_client.chat.completions.create(
+ model="gpt-3.5-turbo",
+ messages=[
+ {"role": "user", "content": prompt}
+ ]
+ )
+ return res.choices[0].message.content
+
+@track
+def generate_opik_story():
+ topic = generate_topic()
+ story = generate_story(topic)
+ return story
+
+generate_opik_story()
+
+```
+
+
+
+
+ "Opik was a young wizard who lived in the small village of Mithos, where magic was both feared and revered. From a young age, Opik had shown a natural talent for magic, much to the dismay of his parents who were simple farmers. They feared the power that their son possessed and did everything they could to suppress it.\n\nDespite his parents' efforts, Opik continued to practice his magic in secret, honing his skills and learning all he could about the ancient art. He longed to become a powerful wizard, respected and feared by all who knew him. But as he grew older, he also began to realize that his thirst for power was beginning to consume him, turning him into a dark and reckless mage.\n\nOne day, a mysterious figure approached Opik in the village square, offering him a chance to join a secret society of powerful wizards. Intrigued by the offer, Opik accepted and was soon initiated into the group, which called themselves the Arcanum.\n\nUnder the guidance of the Arcanum, Opik's power grew exponentially. He could wield spells of immense power, bending reality to his will with a mere flick of his wrist. But as his power grew, so did his arrogance and greed. He began to see himself as above all others, using his magic to manipulate and control those around him.\n\nOne day, a great evil swept across the land, threatening to destroy everything in its path. The Arcanum tasked Opik with defeating this evil, seeing it as a chance for him to prove his worth and redeem himself. But as he faced the darkness head-on, Opik realized that true power lay not in domination and control, but in compassion and selflessness.\n\nIn a moment of clarity, Opik cast aside his dark ambitions and embraced the light within him. With newfound resolve, he fought against the evil that threatened his home, using his magic not to destroy, but to protect and heal. In the end, it was not his raw power that saved the day, but his courage and heart.\n\nAnd so, Opik returned to his village a changed man, no longer seeking power for power's sake, but striving to use his magic for the good of all. The villagers welcomed him back with open arms, seeing in him a hero and a protector. And as he walked among them, a new journey unfolded before him - a journey of redemption, compassion, and true magic."
+
+
+
+The trace can now be viewed in the UI:
+
+![OpenAI Integration](/img/cookbook/openai_trace_decorator_cookbook.png)
diff --git a/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb b/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb
new file mode 100644
index 0000000000..0a6ce78e65
--- /dev/null
+++ b/apps/opik-documentation/documentation/docs/cookbook/ragas.ipynb
@@ -0,0 +1,285 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Using Ragas to evaluate RAG pipelines\n",
+ "\n",
+ "In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines.\n",
+ "\n",
+ "There are two main ways to use Opik with Ragas:\n",
+ "\n",
+ "1. Using Ragas metrics to score traces\n",
+ "2. Using the Ragas `evaluate` function to score a dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Creating an account on Comet.com\n",
+ "\n",
+ "[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.\n",
+ "\n",
+ "> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import getpass\n",
+ "\n",
+ "os.environ[\"OPIK_API_KEY\"] = getpass.getpass(\"Opik API Key: \")\n",
+ "os.environ[\"OPIK_WORKSPACE\"] = input(\"Comet workspace (often the same as your username): \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are running the Opik platform locally, simply set:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import os\n",
+ "# os.environ[\"OPIK_URL_OVERRIDE\"] = \"http://localhost:5173/api\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preparing our environment\n",
+ "\n",
+ "First, we will install the necessary libraries and configure the OpenAI API key."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%pip install opik ragas --quiet\n",
+ "\n",
+ "import os\n",
+ "import getpass\n",
+ "\n",
+ "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"Enter your OpenAI API key: \")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Integrating Opik with Ragas\n",
+ "\n",
+ "### Using Ragas metrics to score traces\n",
+ "\n",
+ "Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: `answer_relevancy`, `answer_similarity`, `answer_correctness`, `context_precision`, `context_recall`, `context_entity_recall`, `summarization_score`. You can find a full list of metrics in the [Ragas documentation](https://docs.ragas.io/en/latest/references/metrics.html#).\n",
+ "\n",
+ "These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the `answer_relevancy` metric.\n",
+ "\n",
+ "#### Create the Ragas metric\n",
+ "\n",
+ "In order to use the Ragas metric without using the `evaluate` function, you need to initialize the metric with a `RunConfig` object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled.\n",
+ "\n",
+ "We will first start by initializing the Ragas metric:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the metric\n",
+ "from ragas.metrics import AnswerRelevancy\n",
+ "\n",
+ "# Import some additional dependencies\n",
+ "from langchain_openai.chat_models import ChatOpenAI\n",
+ "from langchain_openai.embeddings import OpenAIEmbeddings\n",
+ "from ragas.llms import LangchainLLMWrapper\n",
+ "from ragas.embeddings import LangchainEmbeddingsWrapper\n",
+ "\n",
+ "# Initialize the Ragas metric\n",
+ "llm = LangchainLLMWrapper(ChatOpenAI())\n",
+ "emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())\n",
+ "\n",
+ "answer_relevancy_metric = AnswerRelevancy(llm=llm, embeddings=emb)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Once the metric is initialized, you can use it to score a sample question. Given that the metric scoring is done asynchronously, you need to use the `asyncio` library to run the scoring function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Run this cell first if you are running this in a Jupyter notebook\n",
+ "import nest_asyncio\n",
+ "\n",
+ "nest_asyncio.apply()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import asyncio\n",
+ "from ragas.integrations.opik import OpikTracer\n",
+ "\n",
+ "# Define the scoring function\n",
+ "def compute_metric(opik_tracer, metric, row):\n",
+ " async def get_score(opik_tracer, metric, row):\n",
+ " score = await metric.ascore(row, callbacks=[opik_tracer])\n",
+ " return score\n",
+ "\n",
+ " # Run the async function using the current event loop\n",
+ " loop = asyncio.get_event_loop()\n",
+ " \n",
+ " result = loop.run_until_complete(get_score(opik_tracer, metric, row))\n",
+ " return result\n",
+ "\n",
+ "# Score a simple example\n",
+ "row = {\n",
+ " \"question\": \"What is the capital of France?\",\n",
+ " \"answer\": \"Paris\",\n",
+ " \"contexts\": [\"Paris is the capital of France.\", \"Paris is in France.\"]\n",
+ "}\n",
+ "\n",
+ "opik_tracer = OpikTracer()\n",
+ "score = compute_metric(opik_tracer, answer_relevancy_metric, row)\n",
+ "print(\"Answer Relevancy score:\", score)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you now navigate to Opik, you will be able to see that a new trace has been created in the `Default Project` project.\n",
+ "\n",
+ "#### Score traces\n",
+ "\n",
+ "You can score traces by using the `get_current_trace` function to get the current trace and then calling the `log_feedback_score` function.\n",
+ "\n",
+ "The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from opik import track\n",
+ "from opik.opik_context import get_current_trace\n",
+ "\n",
+ "@track\n",
+ "def retrieve_contexts(question):\n",
+ " # Define the retrieval function, in this case we will hard code the contexts\n",
+ " return [\"Paris is the capital of France.\", \"Paris is in France.\"]\n",
+ "\n",
+ "@track\n",
+ "def answer_question(question, contexts):\n",
+ " # Define the answer function, in this case we will hard code the answer\n",
+ " return \"Paris\"\n",
+ "\n",
+ "@track(name=\"Compute Ragas metric score\", capture_input=False)\n",
+ "def compute_rag_score(answer_relevancy_metric, question, answer, contexts):\n",
+ " # Define the score function\n",
+ " row = {\"question\": question, \"answer\": answer, \"contexts\": contexts}\n",
+ " score = compute_metric(answer_relevancy_metric, row)\n",
+ " return score\n",
+ "\n",
+ "@track\n",
+ "def rag_pipeline(question):\n",
+ " # Define the pipeline\n",
+ " contexts = retrieve_contexts(question)\n",
+ " answer = answer_question(question, contexts)\n",
+ "\n",
+ " trace = get_current_trace()\n",
+ " score = compute_rag_score(answer_relevancy_metric, question, answer, contexts)\n",
+ " trace.log_feedback_score(\"answer_relevancy\", round(score, 4), category_name=\"ragas\")\n",
+ " \n",
+ " return answer\n",
+ "\n",
+ "rag_pipeline(\"What is the capital of France?\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Evaluating datasets\n",
+ "\n",
+ "If you looking at evaluating a dataset, you can use the Ragas `evaluate` function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results.\n",
+ "\n",
+ "You can use the `OpikTracer` callback to log the results of the evaluation to the Opik platform:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from datasets import load_dataset\n",
+ "from ragas.metrics import context_precision, answer_relevancy, faithfulness\n",
+ "from ragas import evaluate\n",
+ "from ragas.integrations.opik import OpikTracer\n",
+ "\n",
+ "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n",
+ "\n",
+ "opik_tracer_eval = OpikTracer(tags=[\"ragas_eval\"], metadata={\"evaluation_run\": True})\n",
+ "\n",
+ "result = evaluate(\n",
+ " fiqa_eval[\"baseline\"].select(range(3)),\n",
+ " metrics=[context_precision, faithfulness, answer_relevancy],\n",
+ " callbacks=[opik_tracer_eval]\n",
+ ")\n",
+ "\n",
+ "print(result)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "py312_llm_eval",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/apps/opik-documentation/documentation/docs/cookbook/ragas.md b/apps/opik-documentation/documentation/docs/cookbook/ragas.md
new file mode 100644
index 0000000000..18e4cf0d0d
--- /dev/null
+++ b/apps/opik-documentation/documentation/docs/cookbook/ragas.md
@@ -0,0 +1,187 @@
+# Using Ragas to evaluate RAG pipelines
+
+In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines.
+
+There are two main ways to use Opik with Ragas:
+
+1. Using Ragas metrics to score traces
+2. Using the Ragas `evaluate` function to score a dataset
+
+## Creating an account on Comet.com
+
+[Comet](https://www.comet.com/site) provides a hosted version of the Opik platform, [simply create an account](https://www.comet.com/signup?from=llm) and grab you API Key.
+
+> You can also run the Opik platform locally, see the [installation guide](https://www.comet.com/docs/opik/self-host/self_hosting_opik) for more information.
+
+
+```python
+import os
+import getpass
+
+os.environ["OPIK_API_KEY"] = getpass.getpass("Opik API Key: ")
+os.environ["OPIK_WORKSPACE"] = input("Comet workspace (often the same as your username): ")
+```
+
+If you are running the Opik platform locally, simply set:
+
+
+```python
+# import os
+# os.environ["OPIK_URL_OVERRIDE"] = "http://localhost:5173/api"
+```
+
+## Preparing our environment
+
+First, we will install the necessary libraries and configure the OpenAI API key.
+
+
+```python
+%pip install opik ragas --quiet
+
+import os
+import getpass
+
+os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
+```
+
+## Integrating Opik with Ragas
+
+### Using Ragas metrics to score traces
+
+Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: `answer_relevancy`, `answer_similarity`, `answer_correctness`, `context_precision`, `context_recall`, `context_entity_recall`, `summarization_score`. You can find a full list of metrics in the [Ragas documentation](https://docs.ragas.io/en/latest/references/metrics.html#).
+
+These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the `answer_relevancy` metric.
+
+#### Create the Ragas metric
+
+In order to use the Ragas metric without using the `evaluate` function, you need to initialize the metric with a `RunConfig` object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled.
+
+We will first start by initializing the Ragas metric:
+
+
+```python
+# Import the metric
+from ragas.metrics import AnswerRelevancy
+
+# Import some additional dependencies
+from langchain_openai.chat_models import ChatOpenAI
+from langchain_openai.embeddings import OpenAIEmbeddings
+from ragas.llms import LangchainLLMWrapper
+from ragas.embeddings import LangchainEmbeddingsWrapper
+
+# Initialize the Ragas metric
+llm = LangchainLLMWrapper(ChatOpenAI())
+emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
+
+answer_relevancy_metric = AnswerRelevancy(llm=llm, embeddings=emb)
+```
+
+Once the metric is initialized, you can use it to score a sample question. Given that the metric scoring is done asynchronously, you need to use the `asyncio` library to run the scoring function.
+
+
+```python
+# Run this cell first if you are running this in a Jupyter notebook
+import nest_asyncio
+
+nest_asyncio.apply()
+```
+
+
+```python
+import asyncio
+from ragas.integrations.opik import OpikTracer
+
+# Define the scoring function
+def compute_metric(opik_tracer, metric, row):
+ async def get_score(opik_tracer, metric, row):
+ score = await metric.ascore(row, callbacks=[opik_tracer])
+ return score
+
+ # Run the async function using the current event loop
+ loop = asyncio.get_event_loop()
+
+ result = loop.run_until_complete(get_score(opik_tracer, metric, row))
+ return result
+
+# Score a simple example
+row = {
+ "question": "What is the capital of France?",
+ "answer": "Paris",
+ "contexts": ["Paris is the capital of France.", "Paris is in France."]
+}
+
+opik_tracer = OpikTracer()
+score = compute_metric(opik_tracer, answer_relevancy_metric, row)
+print("Answer Relevancy score:", score)
+```
+
+If you now navigate to Opik, you will be able to see that a new trace has been created in the `Default Project` project.
+
+#### Score traces
+
+You can score traces by using the `get_current_trace` function to get the current trace and then calling the `log_feedback_score` function.
+
+The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases.
+
+
+```python
+from opik import track
+from opik.opik_context import get_current_trace
+
+@track
+def retrieve_contexts(question):
+ # Define the retrieval function, in this case we will hard code the contexts
+ return ["Paris is the capital of France.", "Paris is in France."]
+
+@track
+def answer_question(question, contexts):
+ # Define the answer function, in this case we will hard code the answer
+ return "Paris"
+
+@track(name="Compute Ragas metric score", capture_input=False)
+def compute_rag_score(answer_relevancy_metric, question, answer, contexts):
+ # Define the score function
+ row = {"question": question, "answer": answer, "contexts": contexts}
+ score = compute_metric(answer_relevancy_metric, row)
+ return score
+
+@track
+def rag_pipeline(question):
+ # Define the pipeline
+ contexts = retrieve_contexts(question)
+ answer = answer_question(question, contexts)
+
+ trace = get_current_trace()
+ score = compute_rag_score(answer_relevancy_metric, question, answer, contexts)
+ trace.log_feedback_score("answer_relevancy", round(score, 4), category_name="ragas")
+
+ return answer
+
+rag_pipeline("What is the capital of France?")
+```
+
+#### Evaluating datasets
+
+If you looking at evaluating a dataset, you can use the Ragas `evaluate` function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results.
+
+You can use the `OpikTracer` callback to log the results of the evaluation to the Opik platform:
+
+
+```python
+from datasets import load_dataset
+from ragas.metrics import context_precision, answer_relevancy, faithfulness
+from ragas import evaluate
+from ragas.integrations.opik import OpikTracer
+
+fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
+
+opik_tracer_eval = OpikTracer(tags=["ragas_eval"], metadata={"evaluation_run": True})
+
+result = evaluate(
+ fiqa_eval["baseline"].select(range(3)),
+ metrics=[context_precision, faithfulness, answer_relevancy],
+ callbacks=[opik_tracer_eval]
+)
+
+print(result)
+```
diff --git a/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md b/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md
index e1224f8539..47a4ee3ca2 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/evaluate_your_llm.md
@@ -27,7 +27,7 @@ openai_client = track_openai(openai.OpenAI())
# This method is the LLM application that you want to evaluate
# Typically this is not updated when creating evaluations
-@track()
+@track
def your_llm_application(input: str) -> str:
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
@@ -36,12 +36,12 @@ def your_llm_application(input: str) -> str:
return response.choices[0].message.content
-@track()
+@track
def your_context_retriever(input: str) -> str:
return ["..."]
```
-:::note
+:::tip
We have added here the `track` decorator so that this traces and all it's nested steps are logged to the platform for further analysis.
:::
@@ -89,8 +89,8 @@ equals_metric = Equals()
contains_metric = Hallucination()
```
-:::note
- Each metric expects the data in a certain format, you will need to ensure that the task you have defined in step 1. returns the data in the correct format.
+:::tip
+Each metric expects the data in a certain format, you will need to ensure that the task you have defined in step 1. returns the data in the correct format.
:::
## 4. Run the evaluation
@@ -108,7 +108,7 @@ from opik.integrations.openai import track_openai
openai_client = track_openai(openai.OpenAI())
-@track()
+@track
def your_llm_application(input: str) -> str:
response = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
@@ -118,7 +118,7 @@ def your_llm_application(input: str) -> str:
return response.choices[0].message.content
-@track()
+@track
def your_context_retriever(input: str) -> str:
return ["..."]
@@ -149,6 +149,10 @@ evaluation = evaluate(
)
```
-:::note
+:::tip
We will track the traces for all evaluations and will be logged to the `evaluation` project by default. To log it to a specific project, you can pass the `project_name` parameter to the `evaluate` function.
:::
+
+## Advanced usage
+
+In order to evaluate datasets more efficiently, Opik uses multiple background threads to evaluate the dataset. If this is causing issues, you can disable these by setting `task_threads` and `scoring_threads` to `1` which will lead Opik to run all calculations in the main thread.
diff --git a/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md b/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md
index 83b9f1af28..03d71536b6 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/manage_datasets.md
@@ -47,10 +47,15 @@ dataset.insert([
])
```
-:::note
- Instead of using the `DatasetItem` class, you can also use a dictionary to insert items to a dataset. The dictionary should have the `input` key, `expected_output` and `metadata` are optional.
+:::tip
+Instead of using the `DatasetItem` class, you can also use a dictionary to insert items to a dataset. The dictionary should have the `input` key, `expected_output` and `metadata` are optional.
:::
+Once the items have been inserted, you can view them them in the Opik UI:
+
+![Opik Dataset](/img/evaluation/dataset_items_page.png)
+
+
### Deleting items
You can delete items in a dataset by using the `delete` method:
@@ -60,14 +65,25 @@ from opik import Opik
# Get or create a dataset
client = Opik()
-try:
- dataset = client.create_dataset(name="My dataset")
-except:
- dataset = client.get_dataset(name="My dataset")
+dataset = client.get_dataset(name="My dataset")
dataset.delete(items_ids=["123", "456"])
```
+:::tip
+You can also remove all the items in a dataset by using the `clear` method:
+
+```python
+from opik import Opik
+
+# Get or create a dataset
+client = Opik()
+dataset = client.get_dataset(name="My dataset")
+
+dataset.clear()
+```
+:::
+
## Downloading a dataset from Comet
You can download a dataset from Comet using the `get_dataset` method:
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md
index 7074978dac..f63e4d992e 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/answer_relevance.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 5
sidebar_label: AnswerRelevance
---
@@ -72,4 +72,4 @@ Answer:
Contexts:
{contexts}
***
-```
\ No newline at end of file
+```
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md
index c89ea9310c..0d836155c9 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_precision.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 6
sidebar_label: ContextPrecision
---
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md
index cdfd248239..ed53eae33f 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/context_recall.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 5
+sidebar_position: 7
sidebar_label: ContextRecall
---
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md
index 14c7153ee2..fa7be76b6d 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/custom_metric.md
@@ -28,10 +28,10 @@ class MyCustomMetric(base_metric.BaseMetric):
)
```
-You can also return a list of `ScoreResult` objects as part of your custom metric. This is useful if you want to return multiple scores for a given input and output pair.
+The `score` method should return a `ScoreResult` object. The `ascore` method is optional and can be used to compute the asynchronously if needed.
-:::note
-The `score` method should return a `ScoreResult` object. The `ascore` method is optional and can be used to compute the score for a given input and output pair.
+:::tip
+You can also return a list of `ScoreResult` objects as part of your custom metric. This is useful if you want to return multiple scores for a given input and output pair.
:::
This metric can now be used in the `evaluate` function as explained here: [Evaluating LLMs](/evaluation/evaluate_your_llm).
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md
index 26d20e8cff..6406ff6d97 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/hallucination.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
sidebar_label: Hallucination
---
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md
index c33b2d8c02..0cb57cf6f4 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/heuristic_metrics.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 1
+sidebar_position: 2
sidebar_label: Heuristic Metrics
---
@@ -32,7 +32,9 @@ score = metric.score("Hello world !")
print(score)
```
-## Equals
+## Metrics
+
+### Equals
The `Equals` metric can be used to check if the output of an LLM exactly matches a specific string. It can be used in the following way:
@@ -48,7 +50,7 @@ score = metric.score("Hello world !")
print(score)
```
-## Contains
+### Contains
The `Contains` metric can be used to check if the output of an LLM contains a specific substring. It can be used in the following way:
@@ -65,7 +67,7 @@ score = metric.score("Hello world !")
print(score)
```
-## RegexMatch
+### RegexMatch
The `RegexMatch` metric can be used to check if the output of an LLM matches a specified regular expression pattern. It can be used in the following way:
@@ -81,7 +83,7 @@ score = metric.score("Hello world !")
print(score)
```
-## IsJson
+### IsJson
The `IsJson` metric can be used to check if the output of an LLM is valid. It can be used in the following way:
@@ -94,7 +96,7 @@ score = metric.score('{"key": "some_valid_sql"}')
print(score)
```
-## LevenshteinRatio
+### LevenshteinRatio
The `LevenshteinRatio` metric can be used to check if the output of an LLM is valid. It can be used in the following way:
@@ -105,4 +107,4 @@ metric = LevenshteinRatio(name="levenshtein_ratio_metric", searched_value="hello
score = metric.score("Hello world !")
print(score)
-```
\ No newline at end of file
+```
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md
index fe88e8ceef..1c8509a745 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/moderation.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 3
+sidebar_position: 4
sidebar_label: Moderation
---
diff --git a/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md b/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md
index ee56ead77d..c6f1ef165c 100644
--- a/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md
+++ b/apps/opik-documentation/documentation/docs/evaluation/metrics/overview.md
@@ -1,8 +1,30 @@
---
sidebar_position: 1
-sidebar_label: Overview - TBD
+sidebar_label: Overview
---
# Overview
-Under cosntruction
\ No newline at end of file
+Opik provides a set of built-in evaluation metrics that can be used to evaluate the output of your LLM calls. These metrics are broken down into two main categories:
+
+1. Heuristic metrics
+2. LLM as a Judge metrics
+
+Heuristic metrics are deterministic and are often statistical in nature. LLM as a Judge metrics are non-deterministic and are based on the idea of using an LLM to evaluate the output of another LLM.
+
+Opik provides the following built-in evaluation metrics:
+
+| Metric | Type | Description | Documentation |
+| --- | --- | --- | --- |
+| Equals | Heuristic | Checks if the output exactly matches an expected string | [Equals](/evaluation/metrics/heuristic_metrics#equals) |
+| Contains | Heuristic | Check if the output contains a specific substring, can be both case sensitive or case insensitive | [Contains](/evaluation/metrics/heuristic_metrics#contains) |
+| RegexMatch | Heuristic | Checks if the output matches a specified regular expression pattern | [RegexMatch](/evaluation/metrics/heuristic_metrics#regexmatch) |
+| IsJson | Heuristic | Checks if the output is a valid JSON object | [IsJson](/evaluation/metrics/heuristic_metrics#isjson) |
+| Levenshtein | Heuristic | Calculates the Levenshtein distance between the output and an expected string | [Levenshtein](/evaluation/metrics/heuristic_metrics#levenshteinratio) |
+| Hallucination | LLM as a Judge | Check if the output contains any hallucinations | [Hallucination](/evaluation/metrics/hallucination) |
+| Moderation | LLM as a Judge | Check if the output contains any harmful content | [Moderation](/evaluation/metrics/moderation) |
+| AnswerRelevance | LLM as a Judge | Check if the output is relevant to the question | [AnswerRelevance](/evaluation/metrics/answer_relevance) |
+| ContextRecall | LLM as a Judge | Check if the output contains any hallucinations | [ContextRecall](/evaluation/metrics/context_recall) |
+| ContextPrecision | LLM as a Judge | Check if the output contains any hallucinations | [ContextPrecision](/evaluation/metrics/context_precision) |
+
+You can also create your own custom metric, learn more about it in the [Custom Metric](/evaluation/metrics/custom_metric) section.
diff --git a/apps/opik-documentation/documentation/docs/home.md b/apps/opik-documentation/documentation/docs/home.md
index 9baa440133..80b9f5d13b 100644
--- a/apps/opik-documentation/documentation/docs/home.md
+++ b/apps/opik-documentation/documentation/docs/home.md
@@ -4,46 +4,37 @@ slug: /
sidebar_label: Home
---
-# Comet Opik
+# Opik by Comet
-The LLM Evaluation platform allows you log, view and evaluate your LLM traces during both development and production. Using the platform and our LLM as a Judge evaluators, you can identify and fix issues in your LLM application.
+The Opik platform allows you log, view and evaluate your LLM traces during both development and production. Using the platform and our LLM as a Judge evaluators, you can identify and fix issues in your LLM application.
![LLM Evaluation Platform](/img/home/traces_page_with_sidebar.png)
-# Overview
+## Overview
-## Development
+### Development
During development, you can use the platform to log, view and debug your LLM traces:
1. Log traces using:
- a. One of our [integrations](./)
- b. The `@track` decorator for Python
- c. The [Rest API](./)
-2. Review and debug traces in the [Tracing UI](./)
-3. [Annotate and label traces](./) through the UI
-## Evaluation and Testing
+ a. One of our [integrations](/tracing/integrations/overview).
-Evaluating the output of your LLM calls is critical to ensure that your application is working as expected and can be challenging. Using the Comet LLM Evaluation platformm, you can:
-
-1. Use one of our [LLM as a Judge evaluators](./) or [Heuristic evaluators](./) to score your traces and LLM calls
-2. [Store evaluation datasets](./) in the platform and [run evaluations](./)
-3. Use our [pytest integration](./) to track unit test results and compare results between runs
+ b. The `@track` decorator for Python, learn more in the [Logging Traces](/tracing/log_traces) guide.
+3. [Annotate and label traces](/tracing/annotate_traces) through the SDK or the UI.
-## Monitoring
+### Evaluation and Testing
-You can use the LLM platform to monitor your LLM applications in production, both the SDK and the Backend have been designed to support high volumes of requests.
-
-The platform allows you:
+Evaluating the output of your LLM calls is critical to ensure that your application is working as expected and can be challenging. Using the Comet LLM Evaluation platformm, you can:
-1. Track all LLM calls and traces using our [Python SDK](./) and a [Rest API](./)
-2. View, filter and analyze traces in our [Tracing UI](./)
-3. Update evaluation datasets with [failed traces](./)
+1. Use one of our [LLM as a Judge evaluators](/evaluation/metrics/overview) or [Heuristic evaluators](/evaluation/metrics/heuristic_metrics) to score your traces and LLM calls
+2. [Store evaluation datasets](/evaluation/manage_datasets) in the platform and [run evaluations](/evaluation/evaluate_your_llm)
+3. Use our [pytest integration](/testing/pytest_integration) to track unit test results and compare results between runs
+## Getting Started
-# Getting Started
+[Comet](https://www.comet.com/site) provides a managed Cloud offering for Opik, simply [create an account](https://www.comet.com/signup?from=llm) to get started.
-The Comet LLM Evaluation platform allows you log, view and evaluate your LLM traces during both development and production.
\ No newline at end of file
+You can also run Opik locally using our [local installer](//self-host/self_hosting_opik#all-in-one-installation). If you are looking for a more production ready deployment, you can also use our [Kubernetes deployment option](/self-host/self_hosting_opik#kubernetes-installation).
diff --git a/apps/opik-documentation/documentation/docs/quickstart.md b/apps/opik-documentation/documentation/docs/quickstart.md
index 2b64dd760b..81215da3ce 100644
--- a/apps/opik-documentation/documentation/docs/quickstart.md
+++ b/apps/opik-documentation/documentation/docs/quickstart.md
@@ -5,31 +5,38 @@ sidebar_label: Quickstart
# Quickstart
-This guide helps you integrate the Comet LLM Evaluation platform with your existing LLM application.
+This guide helps you integrate the Opik platform with your existing LLM application.
## Set up
-Getting started is as simple as creating an [account on Comet](./) or [self-hosting the platform](./).
+Getting started is as simple as creating an [account on Comet](https://www.comet.com/signup?from=llm) or [self-hosting the platform](/self-host/self_hosting_opik).
-Once your account is created, you can start logging traces by installing and configuring the Python SDK:
+Once your account is created, you can start logging traces by installing the Opik Python SDK:
```bash
pip install opik
+```
+
+and configuring the SDK with:
+
+```python
+import os
-export COMET_API_KEY=<...>
+os.environ["OPIK_API_KEY"] = "