From 7603fb11a1b3e2d58b1fa4a98d78539e44fc4eb1 Mon Sep 17 00:00:00 2001 From: Kristian Aune Date: Thu, 19 Dec 2024 11:36:03 +0100 Subject: [PATCH] fix typos --- ...olpali-benchmark-vqa-vlm_Vespa-cloud.ipynb | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/sphinx/source/examples/colpali-benchmark-vqa-vlm_Vespa-cloud.ipynb b/docs/sphinx/source/examples/colpali-benchmark-vqa-vlm_Vespa-cloud.ipynb index 483229b8..c793823c 100644 --- a/docs/sphinx/source/examples/colpali-benchmark-vqa-vlm_Vespa-cloud.ipynb +++ b/docs/sphinx/source/examples/colpali-benchmark-vqa-vlm_Vespa-cloud.ipynb @@ -17,9 +17,9 @@ "\n", "This notebook demonstrates how to reproduce the ColPali results on [DocVQA](https://huggingface.co/datasets/vidore/docvqa_test_subsampled) with Vespa. The dataset consists of PDF documents with questions and answers. \n", "\n", - "We demonstrate how we can binarize the patch embeddings and replace the float float MaxSim scoring with a `hamming` based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reduce the memory (and storage) requirements by 32x.\n", + "We demonstrate how we can binarize the patch embeddings and replace the float MaxSim scoring with a `hamming` based MaxSim without much loss in ranking accuracy but with a significant speedup (close to 4x) and reducing the memory (and storage) requirements by 32x.\n", "\n", - "In this notebook we represent one PDF page as one vespa document. See other notebooks for more information about using ColPali with Vespa:\n", + "In this notebook, we represent one PDF page as one vespa document. See other notebooks for more information about using ColPali with Vespa:\n", "\n", "- [Scaling ColPALI (VLM) Retrieval](simplified-retrieval-with-colpali-vlm_Vespa-cloud.ipynb)\n", "- [Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models](colpali-document-retrieval-vision-language-models-cloud.ipynb)\n", @@ -405,7 +405,7 @@ "metadata": {}, "source": [ "Now we have all the embeddings. We'll define two helper functions to perform binarization (BQ) and also packing float values\n", - "to shorter hex representation in JSON. Both saves bandwidth and improves feed performance. " + "to shorter hex representation in JSON. Both save bandwidth and improve feed performance. " ] }, { @@ -456,7 +456,7 @@ "### Patch Vector pooling\n", "\n", "This reduces the number of patch embeddings by a factor of 3, meaning that we go from 1030 patch vectors to 343 patch vectors. This reduces\n", - "both the memory and the number of dotproducts that we need to calculate. This function is not in use in this notebook, but it is included for reference." + "both the memory and the number of dotproducts we need to calculate. This function is not in use in this notebook, but it is included for reference." ] }, { @@ -515,7 +515,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Create the Vespa feed format, we use hex formats for mixed tensors [doc](https://docs.vespa.ai/en/reference/document-json-format.html#tensor).\n" + "Create the Vespa feed format. We use hex formats for mixed tensors [doc](https://docs.vespa.ai/en/reference/document-json-format.html#tensor).\n" ] }, { @@ -551,7 +551,7 @@ "A Vespa application package consists of configuration files, schemas, models, and code (plugins).\n", "\n", "First, we define a [Vespa schema](https://docs.vespa.ai/en/schemas.html) with the fields we want to store and their type. This is a simple\n", - "schema which is all we need to evaluate effectiveness of the model." + "schema, which is all we need to evaluate the effectiveness of the model." ] }, { @@ -619,7 +619,7 @@ "\n", "colpali_profile = RankProfile(\n", " name=\"float-float\",\n", - " # We define both the float and binary query inputs here, the rest of the profiles inherits these inputs\n", + " # We define both the float and binary query inputs here; the rest of the profiles inherit these inputs\n", " inputs=[\n", " (\"query(qtb)\", \"tensor(querytoken{}, v[16])\"),\n", " (\"query(qt)\", \"tensor(querytoken{}, v[128])\"),\n", @@ -863,7 +863,7 @@ "metadata": {}, "source": [ "A simple routine for querying Vespa. Note that we send both vector representations in the query independently\n", - "of the ranking method used, this for simplicity. Not all the ranking models we evaluate needs both representations. " + "of the ranking method used, this for simplicity. Not all the ranking models we evaluate need both representations. " ] }, { @@ -1009,7 +1009,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is encouraging as the binary-binary representation is 4x faster than the float-float representation and saves 32x space. We can also largely retain the effectiveness of the float-binary representation by using the phased approach where we re-rank the top 20 pages from the hamming (binary-binary) version using the float-binary representation. Now we can explore the ranking depth and see how the phased approach performs with different ranking depths." + "This is encouraging as the binary-binary representation is 4x faster than the float-float representation and saves 32x space. We can also largely retain the effectiveness of the float-binary representation by using the phased approach, where we re-rank the top 20 pages from the hamming (binary-binary) version using the float-binary representation. Now we can explore the ranking depth and see how the phased approach performs with different ranking depths." ] }, { @@ -1072,7 +1072,7 @@ "metadata": {}, "source": [ "### Conclusion\n", - "The binary representation of the patch embeddings reduces the storage by 32x, and using hamming distance instead of dotproduc saves us about 4x in computation compared to the float-float model or the float-binary model (which only saves storage). Using a re-ranking step with only depth 10, we can improve the effectiveness of the binary-binary model to almost match the float-float MaxSim model. The additional re-ranking step only requires that we pass also the float query embedding version without any additional storage overhead. \n", + "The binary representation of the patch embeddings reduces the storage by 32x, and using hamming distance instead of dotproduct saves us about 4x in computation compared to the float-float model or the float-binary model (which only saves storage). Using a re-ranking step with only depth 10, we can improve the effectiveness of the binary-binary model to almost match the float-float MaxSim model. The additional re-ranking step only requires that we pass also the float query embedding version without any additional storage overhead. \n", " " ] },