Skip to content

Commit

Permalink
Merge branch 'main' into thiagohora/batch_spans_creation
Browse files Browse the repository at this point in the history
  • Loading branch information
thiagohora committed Sep 13, 2024
2 parents 59c9e5b + 9f19c0f commit c6b6d46
Show file tree
Hide file tree
Showing 45 changed files with 299 additions and 227 deletions.
3 changes: 1 addition & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,7 @@ minikube stop
```
Next time you will start the minikube, it will run everything with the same configuration and data you had before.


### Contributing to the documentation
### Contributing to the documentation

The documentation is made up of three main parts:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,4 @@ public static class Write {
public static class Public {
}
}

}
21 changes: 2 additions & 19 deletions apps/opik-backend/src/main/java/com/comet/opik/domain/SpanDAO.java
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,7 @@ LEFT JOIN (
*
FROM
spans
WHERE id IN :ids
WHERE id = :id
AND workspace_id = :workspace_id
ORDER BY last_updated_at DESC
LIMIT 1
Expand Down Expand Up @@ -735,7 +735,7 @@ public Mono<Span> getById(@NonNull UUID id) {

private Publisher<? extends Result> getById(UUID id, Connection connection) {
var statement = connection.createStatement(SELECT_BY_ID)
.bind("ids", new String[]{id.toString()});
.bind("id", id);

Segment segment = startSegment("spans", "Clickhouse", "get_by_id");

Expand Down Expand Up @@ -907,21 +907,4 @@ public Mono<List<WorkspaceAndResourceId>> getSpanWorkspace(@NonNull Set<UUID> sp
row.get("id", UUID.class))))
.collectList();
}

public Mono<List<Span>> getByIds(@NonNull List<UUID> ids) {

if (ids.isEmpty()) {
return Mono.just(List.of());
}

return Mono.from(connectionFactory.create())
.flatMapMany(connection -> {
var statement = connection.createStatement(SELECT_BY_ID)
.bind("ids", ids.toArray(UUID[]::new));

return makeFluxContextAware(bindWorkspaceIdToFlux(statement));
})
.flatMap(this::mapToDto)
.collectList();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ public Mono<Boolean> validateSpanWorkspace(@NonNull String workspaceId, @NonNull
}

@Trace(dispatcher = true)
public Mono<Void> create(@NonNull SpanBatch batch) {
public Mono<Long> create(@NonNull SpanBatch batch) {

Preconditions.checkArgument(!batch.spans().isEmpty(), "Batch spans must not be empty");

Expand All @@ -269,8 +269,7 @@ public Mono<Void> create(@NonNull SpanBatch batch) {
.subscribeOn(Schedulers.boundedElastic());

return resolveProjects
.flatMap(spanDAO::batchInsert)
.then();
.flatMap(spanDAO::batchInsert);
}

private List<Span> bindSpanToProjectAndId(SpanBatch batch, List<Project> projects) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ public class DistributedLockConfig {
@Valid
@JsonProperty
@NotNull private int lockTimeoutMS;

}
1 change: 0 additions & 1 deletion apps/opik-backend/src/test/resources/config-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ health:

distributedLock:
lockTimeout: 500
bulkLockTimeout: 5000

redis:
singleNodeUrl:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@
"source": [
"# Evaluating Opik's Moderation Metric\n",
"\n",
"*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*\n",
"\n",
"For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK."
]
},
Expand All @@ -24,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -46,7 +44,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -65,24 +63,16 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"outputs": [],
"source": [
"%pip install opik --upgrade --quiet"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -102,17 +92,9 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"status_code: 409, body: {'errors': ['Dataset already exists']}\n"
]
}
],
"outputs": [],
"source": [
"# Create dataset\n",
"from opik import Opik, DatasetItem\n",
Expand Down Expand Up @@ -173,57 +155,9 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluation: 100%|██████████| 50/50 [00:06<00:00, 8.09it/s]\n"
]
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─ OpenAIModerationDataset (50 samples) ─╮\n",
"│ │\n",
"│ <span style=\"font-weight: bold\">Total time: </span> 00:00:06 │\n",
"│ <span style=\"font-weight: bold\">Number of samples:</span> 50 │\n",
"│ │\n",
"│ <span style=\"color: #008000; text-decoration-color: #008000; font-weight: bold\">Correct moderation score: 0.8400 (avg)</span> │\n",
"│ │\n",
"╰────────────────────────────────────────╯\n",
"</pre>\n"
],
"text/plain": [
"╭─ OpenAIModerationDataset (50 samples) ─╮\n",
"│ │\n",
"\u001b[1mTotal time: \u001b[0m 00:00:06 │\n",
"\u001b[1mNumber of samples:\u001b[0m 50 │\n",
"│ │\n",
"\u001b[1;32mCorrect moderation score: 0.8400 (avg)\u001b[0m │\n",
"│ │\n",
"╰────────────────────────────────────────╯\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Uploading results to Opik <span style=\"color: #808000; text-decoration-color: #808000\">...</span> \n",
"</pre>\n"
],
"text/plain": [
"Uploading results to Opik \u001b[33m...\u001b[0m \n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"outputs": [],
"source": [
"from opik.evaluation.metrics import Moderation, Equals\n",
"from opik.evaluation import evaluate\n",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Evaluating Opik's Moderation Metric

*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*

For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK.

## Creating an account on Comet.com
Expand Down Expand Up @@ -38,9 +36,6 @@ First, we will install the necessary libraries and configure the OpenAI API key
%pip install opik --upgrade --quiet
```

Note: you may need to restart the kernel to use updated packages.



```python
import os
Expand Down Expand Up @@ -95,9 +90,6 @@ except Exception as e:
print(e)
```

status_code: 409, body: {'errors': ['Dataset already exists']}


## Evaluating the moderation metric

In order to evaluate the performance of the Opik moderation metric, we will define:
Expand Down Expand Up @@ -153,28 +145,6 @@ res = evaluate(
)
```

Evaluation: 100%|██████████| 50/50 [00:06<00:00, 8.09it/s]



<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">╭─ OpenAIModerationDataset (50 samples) ─╮
│ │
│ <span style="font-weight: bold">Total time: </span> 00:00:06 │
│ <span style="font-weight: bold">Number of samples:</span> 50 │
│ │
│ <span style="color: #008000; text-decoration-color: #008000; font-weight: bold">Correct moderation score: 0.8400 (avg)</span> │
│ │
╰────────────────────────────────────────╯
</pre>




<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">Uploading results to Opik <span style="color: #808000; text-decoration-color: #808000">...</span>
</pre>



We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model. We can view a breakdown of the results in the Opik UI:

![Moderation Evaluation](https://raw.githubusercontent.com/comet-ml/opik/main/apps/opik-documentation/documentation/static/img/cookbook/moderation_metric_cookbook.png)
Expand Down
Loading

0 comments on commit c6b6d46

Please sign in to comment.