Skip to content

Commit

Permalink
Merge pull request #1416 from vespa-engine/tht-update-text-video-search
Browse files Browse the repository at this point in the history
Updates to text-video-search sample app
  • Loading branch information
kkraune authored Apr 17, 2024
2 parents 574530c + 6ace5ca commit c89b2cf
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 44 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,16 @@
"metadata": {},
"source": [
"# Create a text-video search app with Vespa\n",
"> Create, deploy, feed and query the Vespa app using the Vespa python API "
"\n",
"> Create, deploy, feed and query the Vespa app using the Vespa python API\n"
]
},
{
"cell_type": "markdown",
"id": "bearing-adelaide",
"metadata": {},
"source": [
"## Install required packages"
"## Install required packages\n"
]
},
{
Expand All @@ -32,15 +33,15 @@
"id": "recreational-characterization",
"metadata": {},
"source": [
"## CLIP model"
"## CLIP model\n"
]
},
{
"cell_type": "markdown",
"id": "operational-transcript",
"metadata": {},
"source": [
"There are multiple CLIP model variations"
"There are multiple CLIP model variations\n"
]
},
{
Expand Down Expand Up @@ -71,7 +72,7 @@
"id": "adolescent-freedom",
"metadata": {},
"source": [
"Each CLIP model might have a different embedding size. We need this information when creating the schema of the text-video search application."
"Each CLIP model might have a different embedding size. We need this information when creating the schema of the text-video search application.\n"
]
},
{
Expand All @@ -97,7 +98,9 @@
}
],
"source": [
"embedding_info = {name: clip.load(name)[0].visual.output_dim for name in clip.available_models()}\n",
"embedding_info = {\n",
" name: clip.load(name)[0].visual.output_dim for name in clip.available_models()\n",
"}\n",
"embedding_info"
]
},
Expand All @@ -106,31 +109,31 @@
"id": "different-remainder",
"metadata": {},
"source": [
"## Create and deploy a text-video search app"
"## Create and deploy a text-video search app\n"
]
},
{
"cell_type": "markdown",
"id": "thirty-territory",
"metadata": {},
"source": [
"### Create the Vespa application package"
"### Create the Vespa application package\n"
]
},
{
"cell_type": "markdown",
"id": "international-question",
"metadata": {},
"source": [
"The function `create_text_video_app` below uses [the Vespa python API](https://pyvespa.readthedocs.io/en/latest/) to create an application package with fields to store image embeddings extracted from the videos that we want to search based on the selected CLIP models. It also declares the types of the text embeddings that we are going to send along with the query when searching for images, and creates one ranking profile for each (text, image) embedding model."
"The function `create_text_video_app` below uses [the Vespa python API](https://pyvespa.readthedocs.io/en/latest/) to create an application package with fields to store image embeddings extracted from the videos that we want to search based on the selected CLIP models. It also declares the types of the text embeddings that we are going to send along with the query when searching for images, and creates one ranking profile for each (text, image) embedding model.\n"
]
},
{
"cell_type": "markdown",
"id": "38f995b3",
"metadata": {},
"source": [
"For this demonstration we are going to use only one CLIP model but we could very well index all the available models for comparison, just as we did for [the text-image sample app](https://github.com/vespa-engine/sample-apps/blob/master/text-image-search/src/python/compare-pre-trained-clip-for-text-image-search.ipynb)."
"For this demonstration we are going to use only one CLIP model but we could very well index all the available models for comparison, just as we did for [the text-image sample app](https://github.com/vespa-engine/sample-apps/blob/master/text-image-search/src/python/compare-pre-trained-clip-for-text-image-search.ipynb).\n"
]
},
{
Expand All @@ -150,7 +153,7 @@
"id": "neutral-fence",
"metadata": {},
"source": [
"We can inspect how the `schema` of the resulting application package looks like:"
"We can inspect how the `schema` of the resulting application package looks like:\n"
]
},
{
Expand Down Expand Up @@ -199,15 +202,15 @@
"id": "meaning-report",
"metadata": {},
"source": [
"### Deploy to Vespa Cloud"
"### Deploy to Vespa Cloud\n"
]
},
{
"cell_type": "markdown",
"id": "assured-possible",
"metadata": {},
"source": [
"Follow [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html) to learn how to set the environment variables below before deploying to Vespa Cloud."
"Follow [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html) to learn how to set the environment variables below before deploying to Vespa Cloud.\n"
]
},
{
Expand Down Expand Up @@ -235,37 +238,37 @@
"id": "foreign-complaint",
"metadata": {},
"source": [
"Alternatively, check [this guide](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-docker.html) to deploy locally in a Docker container."
"Alternatively, check [this guide](https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html) to deploy locally in a Docker container.\n"
]
},
{
"cell_type": "markdown",
"id": "large-institution",
"metadata": {},
"source": [
"## Feed data"
"## Feed data\n"
]
},
{
"cell_type": "markdown",
"id": "7230b645",
"metadata": {},
"source": [
"### Download the data"
"### Download the data\n"
]
},
{
"cell_type": "markdown",
"id": "8cc37439",
"metadata": {},
"source": [
"We are going to use the UCF101 dataset to allow users to follow along from \n",
"their laptop. We downloaded a [zipped file](http://storage.googleapis.com/thumos14_files/UCF101_videos.zip) \n",
"containing 13320 trimmed videos, each including one action, \n",
"and a [text file](http://crcv.ucf.edu/THUMOS14/Class%20Index.txt) containing the list of action \n",
"We are going to use the UCF101 dataset to allow users to follow along from\n",
"their laptop. We downloaded a [zipped file](http://storage.googleapis.com/thumos14_files/UCF101_videos.zip)\n",
"containing 13320 trimmed videos, each including one action,\n",
"and a [text file](http://crcv.ucf.edu/THUMOS14/Class%20Index.txt) containing the list of action\n",
"classes and their numerical index.\n",
"\n",
"After downloading and unzipping the data, set the `VIDEO_DIR` environment variable to the folder containing the video \n",
"After downloading and unzipping the data, set the `VIDEO_DIR` environment variable to the folder containing the video\n",
"files.\n"
]
},
Expand All @@ -274,15 +277,15 @@
"id": "f6b34733",
"metadata": {},
"source": [
"### Convert .avi files to .mp4"
"### Convert .avi files to .mp4\n"
]
},
{
"cell_type": "markdown",
"id": "7d1a2759",
"metadata": {},
"source": [
"There is better support for `.mp4` files, so we will convert the `.avi` files to `.mp4` using `ffmpeg`. The code below requires that your machine have `ffmpeg` installed."
"There is better support for `.mp4` files, so we will convert the `.avi` files to `.mp4` using `ffmpeg`. The code below requires that your machine have `ffmpeg` installed.\n"
]
},
{
Expand All @@ -294,17 +297,18 @@
"source": [
"import subprocess\n",
"\n",
"\n",
"def convert_from_avi_to_mp4(file_name):\n",
" outputfile = file_name.lower().replace(\".avi\", \".mp4\")\n",
" subprocess.call(['ffmpeg', '-i', file_name, outputfile])"
" subprocess.call([\"ffmpeg\", \"-i\", file_name, outputfile])"
]
},
{
"cell_type": "markdown",
"id": "73d68bcc",
"metadata": {},
"source": [
"The code below takes quite a while and could be sped up by using multi-processing:"
"The code below takes quite a while and could be sped up by using multi-processing:\n"
]
},
{
Expand All @@ -326,15 +330,15 @@
"id": "4c1c4259",
"metadata": {},
"source": [
"### Compute and send embeddings"
"### Compute and send embeddings\n"
]
},
{
"cell_type": "markdown",
"id": "suspended-supervision",
"metadata": {},
"source": [
"The function below assumes you have downloaded the UCF101 dataset, converted it to .mp4 and stored the resulting files in the `VIDEO_PATH` folder. It extracts frames from the video, compute image embeddings according to a CLIP model and send it to the Vespa app."
"The function below assumes you have downloaded the UCF101 dataset, converted it to .mp4 and stored the resulting files in the `VIDEO_PATH` folder. It extracts frames from the video, compute image embeddings according to a CLIP model and send it to the Vespa app.\n"
]
},
{
Expand All @@ -347,11 +351,11 @@
"from embedding import compute_and_send_video_embeddings\n",
"\n",
"compute_and_send_video_embeddings(\n",
" app=app, \n",
" batch_size=32, \n",
" clip_model_names=[\"ViT-B/32\"], \n",
" app=app,\n",
" batch_size=32,\n",
" clip_model_names=[\"ViT-B/32\"],\n",
" number_frames_per_video=4,\n",
" video_dir=os.environ[\"VIDEO_DIR\"]\n",
" video_dir=os.environ[\"VIDEO_DIR\"],\n",
")"
]
},
Expand All @@ -360,7 +364,7 @@
"id": "organizational-delta",
"metadata": {},
"source": [
"The function `compute_and_send_video_embeddings` is a more robust version of the following loop:"
"The function `compute_and_send_video_embeddings` is a more robust version of the following loop:\n"
]
},
{
Expand All @@ -371,15 +375,19 @@
"outputs": [],
"source": [
"for model_name in clip_model_names:\n",
" video_dataset = VideoFeedDataset( ## PyTorch Dataset that outputs pyvespa-compatible data \n",
" video_dir=os.environ[\"VIDEO_DIR\"], # Folder containing video files\n",
" model_name=model_name, # CLIP model name used to convert image into vector\n",
" number_frames_per_video=4 # Number of image frames to use per video\n",
" video_dataset = (\n",
" VideoFeedDataset( ## PyTorch Dataset that outputs pyvespa-compatible data\n",
" video_dir=os.environ[\"VIDEO_DIR\"], # Folder containing video files\n",
" model_name=model_name, # CLIP model name used to convert image into vector\n",
" number_frames_per_video=4, # Number of image frames to use per video\n",
" )\n",
" )\n",
" dataloader = DataLoader( ## PyTorch Dataloader to loop through the dataset\n",
" video_dataset, \n",
" dataloader = DataLoader( ## PyTorch Dataloader to loop through the dataset\n",
" video_dataset,\n",
" batch_size=batch_size,\n",
" collate_fn=lambda x: [item for sublist in x for item in sublist], # turn list of list into flat list\n",
" collate_fn=lambda x: [\n",
" item for sublist in x for item in sublist\n",
" ], # turn list of list into flat list\n",
" )\n",
" for idx, batch in enumerate(dataloader):\n",
" app.update_batch(batch=batch)"
Expand All @@ -390,15 +398,15 @@
"id": "eea78554",
"metadata": {},
"source": [
"## Query the application"
"## Query the application\n"
]
},
{
"cell_type": "markdown",
"id": "eff562f6",
"metadata": {},
"source": [
"We created a custom class `VideoSearchApp` that implements a `query` method that is specific to text-video use case that we are demonstrating here."
"We created a custom class `VideoSearchApp` that implements a `query` method that is specific to text-video use case that we are demonstrating here.\n"
]
},
{
Expand All @@ -418,7 +426,7 @@
"id": "191275fb",
"metadata": {},
"source": [
"It takes a `text` query, transform it into an embedding with the CLIP model, and for each video it takes the score of the frame of that video that is closest to the text in the joint embedding space to represent the score of the video. We can also select the number of videos that we want to retrieve."
"It takes a `text` query, transform it into an embedding with the CLIP model, and for each video it takes the score of the frame of that video that is closest to the text in the joint embedding space to represent the score of the video. We can also select the number of videos that we want to retrieve.\n"
]
},
{
Expand Down Expand Up @@ -521,7 +529,9 @@
"from IPython.display import Video, display\n",
"\n",
"for hit in result:\n",
" display(Video(os.path.join(os.environ[\"VIDEO_DIR\"], hit[\"video_file_name\"]), embed=True))"
" display(\n",
" Video(os.path.join(os.environ[\"VIDEO_DIR\"], hit[\"video_file_name\"]), embed=True)\n",
" )"
]
}
],
Expand Down
2 changes: 1 addition & 1 deletion text-video-search/src/python/embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ def create_text_video_app(model_info):
:return: A Vespa application package.
"""
app_package = ApplicationPackage(name="video_search")
app_package = ApplicationPackage(name="videosearch")

app_package.schema.add_fields(
Field(name="video_file_name", type="string", indexing=["summary", "attribute"]),
Expand Down
1 change: 1 addition & 0 deletions text-video-search/src/python/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ torch
torchvision
pyvespa
streamlit
setuptools
git+https://github.com/openai/CLIP.git

0 comments on commit c89b2cf

Please sign in to comment.