Skip to content

Commit

Permalink
Merge pull request #995 from vespa-engine/kkraune/language
Browse files Browse the repository at this point in the history
fix typos
  • Loading branch information
kkraune authored Dec 19, 2024
2 parents 7695bcf + 7603fb1 commit bfa14bc
Showing 1 changed file with 10 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
"\n",
"This notebook demonstrates how to reproduce the ColPali results on [DocVQA](https://huggingface.co/datasets/vidore/docvqa_test_subsampled) with Vespa. The dataset consists of PDF documents with questions and answers. \n",
"\n",
"We demonstrate how we can binarize the patch embeddings and replace the float float MaxSim scoring with a `hamming` based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reduce the memory (and storage) requirements by 32x.\n",
"We demonstrate how we can binarize the patch embeddings and replace the float MaxSim scoring with a `hamming` based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reducing the memory (and storage) requirements by 32x.\n",
"\n",
"In this notebook we represent one PDF page as one vespa document. See other notebooks for more information about using ColPali with Vespa:\n",
"In this notebook, we represent one PDF page as one vespa document. See other notebooks for more information about using ColPali with Vespa:\n",
"\n",
"- [Scaling ColPALI (VLM) Retrieval](simplified-retrieval-with-colpali-vlm_Vespa-cloud.ipynb)\n",
"- [Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models](colpali-document-retrieval-vision-language-models-cloud.ipynb)\n",
Expand Down Expand Up @@ -405,7 +405,7 @@
"metadata": {},
"source": [
"Now we have all the embeddings. We'll define two helper functions to perform binarization (BQ) and also packing float values\n",
"to shorter hex representation in JSON. Both saves bandwidth and improves feed performance. "
"to shorter hex representation in JSON. Both save bandwidth and improve feed performance. "
]
},
{
Expand Down Expand Up @@ -456,7 +456,7 @@
"### Patch Vector pooling\n",
"\n",
"This reduces the number of patch embeddings by a factor of 3, meaning that we go from 1030 patch vectors to 343 patch vectors. This reduces\n",
"both the memory and the number of dotproducts that we need to calculate. This function is not in use in this notebook, but it is included for reference."
"both the memory and the number of dotproducts we need to calculate. This function is not in use in this notebook, but it is included for reference."
]
},
{
Expand Down Expand Up @@ -515,7 +515,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Create the Vespa feed format, we use hex formats for mixed tensors [doc](https://docs.vespa.ai/en/reference/document-json-format.html#tensor).\n"
"Create the Vespa feed format. We use hex formats for mixed tensors [doc](https://docs.vespa.ai/en/reference/document-json-format.html#tensor).\n"
]
},
{
Expand Down Expand Up @@ -551,7 +551,7 @@
"A Vespa application package consists of configuration files, schemas, models, and code (plugins).\n",
"\n",
"First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. This is a simple\n",
"schema which is all we need to evaluate effectiveness of the model."
"schema, which is all we need to evaluate the effectiveness of the model."
]
},
{
Expand Down Expand Up @@ -619,7 +619,7 @@
"\n",
"colpali_profile = RankProfile(\n",
" name=\"float-float\",\n",
" # We define both the float and binary query inputs here, the rest of the profiles inherits these inputs\n",
" # We define both the float and binary query inputs here; the rest of the profiles inherit these inputs\n",
" inputs=[\n",
" (\"query(qtb)\", \"tensor<int8>(querytoken{}, v[16])\"),\n",
" (\"query(qt)\", \"tensor<float>(querytoken{}, v[128])\"),\n",
Expand Down Expand Up @@ -863,7 +863,7 @@
"metadata": {},
"source": [
"A simple routine for querying Vespa. Note that we send both vector representations in the query independently\n",
"of the ranking method used, this for simplicity. Not all the ranking models we evaluate needs both representations. "
"of the ranking method used, this for simplicity. Not all the ranking models we evaluate need both representations. "
]
},
{
Expand Down Expand Up @@ -1009,7 +1009,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is encouraging as the binary-binary representation is 4x faster than the float-float representation and saves 32x space. We can also largely retain the effectiveness of the float-binary representation by using the phased approach where we re-rank the top 20 pages from the hamming (binary-binary) version using the float-binary representation. Now we can explore the ranking depth and see how the phased approach performs with different ranking depths."
"This is encouraging as the binary-binary representation is 4x faster than the float-float representation and saves 32x space. We can also largely retain the effectiveness of the float-binary representation by using the phased approach, where we re-rank the top 20 pages from the hamming (binary-binary) version using the float-binary representation. Now we can explore the ranking depth and see how the phased approach performs with different ranking depths."
]
},
{
Expand Down Expand Up @@ -1072,7 +1072,7 @@
"metadata": {},
"source": [
"### Conclusion\n",
"The binary representation of the patch embeddings reduces the storage by 32x, and using hamming distance instead of dotproduc saves us about 4x in computation compared to the float-float model or the float-binary model (which only saves storage). Using a re-ranking step with only depth 10, we can improve the effectiveness of the binary-binary model to almost match the float-float MaxSim model. The additional re-ranking step only requires that we pass also the float query embedding version without any additional storage overhead. \n",
"The binary representation of the patch embeddings reduces the storage by 32x, and using hamming distance instead of dotproduct saves us about 4x in computation compared to the float-float model or the float-binary model (which only saves storage). Using a re-ranking step with only depth 10, we can improve the effectiveness of the binary-binary model to almost match the float-float MaxSim model. The additional re-ranking step only requires that we pass also the float query embedding version without any additional storage overhead. \n",
" "
]
},
Expand Down

0 comments on commit bfa14bc

Please sign in to comment.