Merge pull request #229 from vespa-engine/tgm/add-text-image-use-case

Update documentation
vespa-engine · Dec 10, 2021 · c2e7d30 · c2e7d30
2 parents 2d3f0ab + 14b5b4a
commit c2e7d30
Show file tree

Hide file tree

Showing 9 changed files with 372 additions and 31 deletions.
diff --git a/docs/sphinx/source/howto.rst b/docs/sphinx/source/howto.rst
diff --git a/docs/sphinx/source/index.rst b/docs/sphinx/source/index.rst
@@ -12,7 +12,6 @@ Vespa python API
    install
    three-ways-to-get-started-with-pyvespa
    quickstart
-   howto
    usecases
    reference-api
 

diff --git a/docs/sphinx/source/use_cases/image_search/clip-evaluation-boxplot.png b/docs/sphinx/source/use_cases/image_search/clip-evaluation-boxplot.png
diff --git a/docs/sphinx/source/use_cases/image_search/demo.gif b/docs/sphinx/source/use_cases/image_search/demo.gif
diff --git a/docs/sphinx/source/use_cases/image_search/image-search-scratch.ipynb b/docs/sphinx/source/use_cases/image_search/image-search-scratch.ipynb
@@ -0,0 +1,365 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "unauthorized-sentence",
+   "metadata": {},
+   "source": [
+    "# Image search\n",
+    "> Define a text to image search application"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "initial-height",
+   "metadata": {},
+   "source": [
+    "This page will walk through the pyvespa code that was used to create the [text to image search sample application](https://github.com/vespa-engine/sample-apps/tree/master/text-image-search/src/python). "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "trying-jacksonville",
+   "metadata": {},
+   "source": [
+    "![SegmentLocal](demo.gif \"segment\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "median-brown",
+   "metadata": {},
+   "source": [
+    "## Create the application package"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "interracial-scientist",
+   "metadata": {},
+   "source": [
+    "The first step is to create an application package instance named `image_search`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "inner-minimum",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import ApplicationPackage\n",
+    "\n",
+    "app_package = ApplicationPackage(name=\"image_search\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gentle-mountain",
+   "metadata": {},
+   "source": [
+    "Add a field to hold the name of the image file. This is used in the sample app to load the final images that should be displayed to the end user. \n",
+    "\n",
+    "The `summary` indexing ensures this field is returned as part of the query response. The `attribute` indexing store the fields in memory as an attribute for sorting, querying, and grouping."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "micro-oxygen",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import Field\n",
+    "\n",
+    "app_package.schema.add_fields(\n",
+    "    Field(name=\"image_file_name\", type=\"string\", indexing=[\"summary\", \"attribute\"]),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "historic-explorer",
+   "metadata": {},
+   "source": [
+    "Add a field to hold an image embedding. The embeddings are usually generated by a ML model. We can add multiple embedding fields to our application. This is useful when making experiments. For example, the sample app adds 6 image embeddings, one for each of the six pre-trained CLIP models available at the time.\n",
+    "\n",
+    "In the example below, the embedding vector has size `512` and is of type `float`. The `index` is required to enable [approximate matching](https://docs.vespa.ai/en/approximate-nn-hnsw.html) and the `HNSW` instance configure the HNSW index.   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "twenty-montgomery",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import HNSW\n",
+    "\n",
+    "app_package.schema.add_fields(\n",
+    "    Field(\n",
+    "        name=\"embedding_image\",\n",
+    "        type=\"tensor<float>(x[512])\",\n",
+    "        indexing=[\"attribute\", \"index\"],\n",
+    "        ann=HNSW(\n",
+    "            distance_metric=\"angular\",\n",
+    "            max_links_per_node=16,\n",
+    "            neighbors_to_explore_at_insert=500,\n",
+    "        ),\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "instant-fluid",
+   "metadata": {},
+   "source": [
+    "Adds a rank profile that ranks the images by how close the image embedding vector is from the query embedding vector."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "systematic-manitoba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import RankProfile\n",
+    "\n",
+    "app_package.schema.add_rank_profile(\n",
+    "    RankProfile(\n",
+    "        name=\"embedding_similarity\",\n",
+    "        inherits=\"default\",\n",
+    "        first_phase=\"closeness(embedding_image)\",\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "engaging-return",
+   "metadata": {},
+   "source": [
+    "The tensors used in queries must have their type declared in a query profile in the application package. The code below declares the text embedding that will be sent through the Vespa query. It has the same size and type of the image embedding."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "divine-legislature",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.package import QueryTypeField\n",
+    "\n",
+    "app_package.query_profile_type.add_fields(\n",
+    "    QueryTypeField(\n",
+    "        name=\"ranking.features.query(embedding_text)\",\n",
+    "        type=\"tensor<float>(x[512])\",\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "controlled-playing",
+   "metadata": {},
+   "source": [
+    "## Deploy the application"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "liked-market",
+   "metadata": {},
+   "source": [
+    "The application package created above can be deployed using [Docker](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-docker.html) or [Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html). Follow the instructions based on the desired deployment mode. Either option will create a Vespa connection instance that can be stored in a variable that will be denoted here as `app`.\n",
+    "\n",
+    "We can then use `app` to interact with the deployed application."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "overhead-record",
+   "metadata": {},
+   "source": [
+    "## Feed the image data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "recovered-championship",
+   "metadata": {},
+   "source": [
+    "To feed the image data: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "under-paste",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "responses = app.feed_batch(batch)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adequate-edinburgh",
+   "metadata": {},
+   "source": [
+    "where `batch` is a list of dictionaries like the one below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "experimental-heritage",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "{\n",
+    "    \"id\": \"dog1\",\n",
+    "    \"fields\": {\n",
+    "        \"image_file_name\": \"dog1.jpg\",\n",
+    "        \"embedding_image\": {\"values\": [0.884, -0.345, ..., 0.326]},\n",
+    "    }\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "sharing-health",
+   "metadata": {},
+   "source": [
+    "One of the advantages of having a python API is that it can integrate with commonly used ML frameworks. The sample application [show how to create a PyTorch DataLoader](https://github.com/vespa-engine/sample-apps/blob/master/text-image-search/src/python/embedding.py#L85-L113) to generate batches of image data by using CLIP models to generate image embeddings."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "educational-danish",
+   "metadata": {},
+   "source": [
+    "## Query the application"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "continental-student",
+   "metadata": {},
+   "source": [
+    "The following query will use approximate nearest neighbor search to match the closest images to the query text and rank the images according to their distance to the query text. The sample application used CLIP models to generate image and query embeddings."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "large-switch",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = app.query(body={\n",
+    "    \"yql\": 'select * from sources * where ([{\"targetNumHits\":100}]nearestNeighbor(embedding_image,embedding_text));',\n",
+    "    \"hits\": 100,\n",
+    "    \"ranking.features.query(embedding_text)\": [0.632, -0.987, ..., 0.534],\n",
+    "    \"ranking.profile\": \"embedding_similarity\"\n",
+    "})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "incoming-hollywood",
+   "metadata": {},
+   "source": [
+    "## Evaluate different query models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "governing-morgan",
+   "metadata": {},
+   "source": [
+    "Define metrics to evaluate:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "elder-tower",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from vespa.evaluation import MatchRatio, Recall, ReciprocalRank\n",
+    "\n",
+    "eval_metrics = [\n",
+    "    MatchRatio(), \n",
+    "    Recall(at=100), \n",
+    "    ReciprocalRank(at=100)\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caroline-devices",
+   "metadata": {},
+   "source": [
+    "The sample application illustrates how to evaluate different CLIP models through the `evaluate` method:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "functional-stand",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "result = app.evaluate(\n",
+    "    labeled_data=labeled_data,  # Labeled data to define which images should be returned to a given query\n",
+    "    eval_metrics=eval_metrics,  # Metrics used\n",
+    "    query_model=query_models,   # Each query model uses a different CLIP model version\n",
+    "    id_field=\"image_file_name\", # The name of the id field used by the labeled data to identify the image\n",
+    "    per_query=True              # Return results per query rather the aggragated.\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "level-colors",
+   "metadata": {},
+   "source": [
+    "The figure below is the reciprocal rank at 100 computed based on the output of the `evaluate` method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "canadian-gambling",
+   "metadata": {},
+   "source": [
+    "![evaluation](clip-evaluation-boxplot.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}