diff --git a/docs/sphinx/source/advanced-configuration.ipynb b/docs/sphinx/source/advanced-configuration.ipynb
new file mode 100644
index 00000000..4e7886ac
--- /dev/null
+++ b/docs/sphinx/source/advanced-configuration.ipynb
@@ -0,0 +1,458 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "given-adoption",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "# Advanced Configuration\n",
+ "\n",
+ "Vespa support a wide range of configuration options to customize the behavior of the system through the `services.xml`-[file](https://docs.vespa.ai/en/reference/services.html). Until pyvespa version 0.50.0, only a limited subset of these configurations were available in pyvespa.\n",
+ "\n",
+ "Now, we have added support for passing a `ServiceConfiguration` object to your `ApplicationPackage` that allows you to define any configuration you want. This notebook demonstrates how to use this new feature if you have the need for more advanced configurations.\n",
+ "\n",
+ "Note that it is not required to provide a `ServiceConfiguration` feature, and if not passed, the default configuration will still be created for you.\n",
+ "\n",
+ "There are some slight differences in which configuration options are available when running self-hosted (Docker) and when running on the cloud (Vespa Cloud). For details, see [Vespa Cloud services.xml-reference](https://cloud.vespa.ai/en/reference/services) This notebook demonstrates how to use the `ServiceConfiguration` object to configure a Vespa application for some common use cases, with options that are available in both environments.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8c967bd2",
+ "metadata": {},
+ "source": [
+ "
\n",
+ " Refer to troubleshooting\n",
+ " for any problem when running this guide.\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8345b2fe",
+ "metadata": {},
+ "source": [
+ "[Install pyvespa](https://pyvespa.readthedocs.io/) and start Docker Daemon, validate minimum 6G available:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "03f3d0f2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip3 install pyvespa\n",
+ "!docker info | grep \"Total Memory\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db637322",
+ "metadata": {},
+ "source": [
+ "## Setting up document-expiry\n",
+ "\n",
+ "As an example of a common use case for advanced configuration, we will configure document-expiry. This feature allows you to set a time-to-live for documents in your Vespa application. This is useful when you have documents that are only relevant for a certain period of time, and you want to avoid serving stale data.\n",
+ "\n",
+ "For reference, see the [docs on document-expiry](https://docs.vespa.ai/en/documents.html#document-expiry).\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a3619ad1",
+ "metadata": {},
+ "source": [
+ "### Define a schema\n",
+ "\n",
+ "We define a simple schema, with a timestamp field that we will use in the document selection expression to set the document-expiry.\n",
+ "\n",
+ "Note that the fields that are referenced in the selection expression should be attributes(in-memory).\n",
+ "\n",
+ "Also, either the fields should be set with `fast-access` or the number of searchable copies in the content cluster should be the same as the redundancy. Otherwise, the document selection maintenance will be slow and have a major performance impact on the system.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "bd5c2629",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.package import Document, Field, Schema, ApplicationPackage\n",
+ "\n",
+ "application_name = \"music\"\n",
+ "music_schema = Schema(\n",
+ " name=application_name,\n",
+ " document=Document(\n",
+ " fields=[\n",
+ " Field(\n",
+ " name=\"artist\",\n",
+ " type=\"string\",\n",
+ " indexing=[\"attribute\", \"summary\"],\n",
+ " ),\n",
+ " Field(\n",
+ " name=\"title\",\n",
+ " type=\"string\",\n",
+ " indexing=[\"attribute\", \"summary\"],\n",
+ " ),\n",
+ " Field(\n",
+ " name=\"timestamp\",\n",
+ " type=\"long\",\n",
+ " indexing=[\"attribute\", \"summary\"],\n",
+ " attribute=[\"fast-access\"],\n",
+ " ),\n",
+ " ]\n",
+ " ),\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5be18383",
+ "metadata": {},
+ "source": [
+ "## The `ServiceConfiguration` object\n",
+ "\n",
+ "The `ServiceConfiguration` object allows you to define any configuration you want in the `services.xml` file.\n",
+ "\n",
+ "The syntax is as follows:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "05318c34",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.package import ServicesConfiguration\n",
+ "from vespa.configuration.services import (\n",
+ " services,\n",
+ " container,\n",
+ " search,\n",
+ " document_api,\n",
+ " document_processing,\n",
+ " content,\n",
+ " redundancy,\n",
+ " documents,\n",
+ " document,\n",
+ " node,\n",
+ " nodes,\n",
+ ")\n",
+ "\n",
+ "# Create a ServicesConfiguration with document-expiry set to 1 day (timestamp > now() - 86400)\n",
+ "services_config = ServicesConfiguration(\n",
+ " application_name=application_name,\n",
+ " services_config=services(\n",
+ " container(\n",
+ " search(),\n",
+ " document_api(),\n",
+ " document_processing(),\n",
+ " id=f\"{application_name}_container\",\n",
+ " version=\"1.0\",\n",
+ " ),\n",
+ " content(\n",
+ " redundancy(\"1\"),\n",
+ " documents(\n",
+ " document(\n",
+ " type=application_name,\n",
+ " mode=\"index\",\n",
+ " # Note that the selection-expression does not need to be escaped, as it will be automatically escaped during xml-serialization\n",
+ " selection=\"music.timestamp > now() - 86400\",\n",
+ " ),\n",
+ " garbage_collection=\"true\",\n",
+ " ),\n",
+ " nodes(node(distribution_key=\"0\", hostalias=\"node1\")),\n",
+ " id=f\"{application_name}_content\",\n",
+ " version=\"1.0\",\n",
+ " ),\n",
+ " ),\n",
+ ")\n",
+ "application_package = ApplicationPackage(\n",
+ " name=application_name,\n",
+ " schema=[music_schema],\n",
+ " services_config=services_config,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aa40c0ce",
+ "metadata": {},
+ "source": [
+ "There are some useful gotchas to keep in mind when constructing the `ServiceConfiguration` object.\n",
+ "\n",
+ "First, let's establish a common vocabulary through an example. Consider the following `services.xml` file, which is what we are actually representing with the `ServiceConfiguration` object from the previous cell:\n",
+ "\n",
+ "```xml\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1\n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ "\n",
+ "```\n",
+ "\n",
+ "In this example, `services`, `container`, `search`, `document-api`, `document-processing`, `content`, `redundancy`, `documents`, `document`, and `nodes` are _tags_. The `id`, `version`, `type`, `mode`, `selection`, `distribution-key`, `hostalias`, and `garbage-collection` are _attributes_, with a corresponding _value_.\n",
+ "\n",
+ "### Tag names\n",
+ "\n",
+ "All tags as referenced in the [Vespa documentation](https://docs.vespa.ai/en/reference/services.html) are available in `vespa.configuration.services` module with the following modifications:\n",
+ "\n",
+ "- All `-` in the tag names are replaced by `_` to avoid conflicts with Python syntax.\n",
+ "- Some tags that are Python reserved words (or commonly used objects) are constructed by adding a `_` at the end of the tag name. These are:\n",
+ " - `type_`\n",
+ " - `class_`\n",
+ " - `for_`\n",
+ " - `time_`\n",
+ " - `io_`\n",
+ "\n",
+ "Only valid tags are exported by the `vespa.configuration.services` module.\n",
+ "\n",
+ "### Attributes\n",
+ "\n",
+ "- _any_ attribute can be passed to the tag constructor (no validation at construction time).\n",
+ "- The attribute name should be the same as in the Vespa documentation, but with `-` replaced by `_`. For example, the `garbage-collection` attribute in the `query` tag should be passed as `garbage_collection`.\n",
+ "- In case the attribute name is a Python reserved word, the same rule as for the tag names applies (add `_` at the end). An example of this is the `global` attribute which should be passed as `global_`.\n",
+ "- Some attributes, such as `id`, in the `container` tag, are mandatory and should be passed as positional arguments to the tag constructor.\n",
+ "\n",
+ "### Values\n",
+ "\n",
+ "- The value of an attribute can be a string, an integer, or a boolean. For types `bool` and `int`, the value is converted to a string (lowercased for `bool`). If you need to pass a float, you should convert it to a string before passing it to the tag constructor, e.g. `container(version=\"1.0\")`.\n",
+ "- Note that we are _not_ escaping the values. In the xml file, the value of the `selection` attribute in the `document` tag is `music.timestamp > now() - 86400`. (`>` is the escaped form of `>`.) When passing this value to the `document` tag constructor in python, we should _not_ escape the `>` character, i.e. `document(selection=\"music.timestamp > now() - 86400\")`.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "careful-savage",
+ "metadata": {},
+ "source": [
+ "## Deploy the Vespa application\n",
+ "\n",
+ "Deploy `package` on the local machine using Docker,\n",
+ "without leaving the notebook, by creating an instance of\n",
+ "[VespaDocker](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaDocker). `VespaDocker` connects\n",
+ "to the local Docker daemon socket and starts the [Vespa docker image](https://hub.docker.com/r/vespaengine/vespa/).\n",
+ "\n",
+ "If this step fails, please check\n",
+ "that the Docker daemon is running, and that the Docker daemon socket can be used by clients (Configurable under advanced settings in Docker Desktop).\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "canadian-blood",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Waiting for configuration server, 0/60 seconds...\n",
+ "Waiting for application to come up, 0/300 seconds.\n",
+ "Application is up!\n",
+ "Finished deployment.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from vespa.deployment import VespaDocker\n",
+ "\n",
+ "vespa_docker = VespaDocker()\n",
+ "app = vespa_docker.deploy(application_package=application_package)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aaae2f91",
+ "metadata": {},
+ "source": [
+ "`app` now holds a reference to a [Vespa](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa) instance.\n",
+ "see this [notebook](https://pyvespa.readthedocs.io/en/latest/authenticating-to-vespa-cloud.html) for details on authenticating to Vespa Cloud.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "sealed-mustang",
+ "metadata": {},
+ "source": [
+ "## Feeding documents to Vespa\n",
+ "\n",
+ "Now, let us feed some documents to Vespa. We will feed one document with a timestamp of 24 hours (+1 sec (86401)) ago and another document with a timestamp of the current time. We will then query the documents to check verify that the document-expiry is working as expected.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "e9d3facd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import time\n",
+ "\n",
+ "docs_to_feed = [\n",
+ " {\n",
+ " \"id\": \"1\",\n",
+ " \"fields\": {\n",
+ " \"artist\": \"Snoop Dogg\",\n",
+ " \"title\": \"Gin and Juice\",\n",
+ " \"timestamp\": int(time.time()) - 86401,\n",
+ " },\n",
+ " },\n",
+ " {\n",
+ " \"id\": \"2\",\n",
+ " \"fields\": {\n",
+ " \"artist\": \"Dr.Dre\",\n",
+ " \"title\": \"Still D.R.E\",\n",
+ " \"timestamp\": int(time.time()),\n",
+ " },\n",
+ " },\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "6185fbce",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from vespa.io import VespaResponse\n",
+ "\n",
+ "\n",
+ "def callback(response: VespaResponse, id: str):\n",
+ " if not response.is_successful():\n",
+ " print(f\"Error when feeding document {id}: {response.get_json()}\")\n",
+ "\n",
+ "\n",
+ "app.feed_iterable(docs_to_feed, schema=application_name, callback=callback)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8430dd98",
+ "metadata": {},
+ "source": [
+ "## Verify document expiry through visiting\n",
+ "\n",
+ "[Visiting](https://docs.vespa.ai/en/visiting.html) is a feature to efficiently get or process a set of documents, identified by a [document selection](https://docs.vespa.ai/en/reference/document-select-language.html) expression.\n",
+ "Here is how you can use visiting in pyvespa:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "450a925f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[{'pathId': '/document/v1/music/music/docid/',\n",
+ " 'documents': [{'id': 'id:music:music::2',\n",
+ " 'fields': {'artist': 'Dr.Dre',\n",
+ " 'title': 'Still D.R.E',\n",
+ " 'timestamp': 1727175623}}],\n",
+ " 'documentCount': 1}]"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "visit_results = []\n",
+ "for slice_ in app.visit(\n",
+ " schema=application_name,\n",
+ " content_cluster_name=f\"{application_name}_content\",\n",
+ " timeout=\"5s\",\n",
+ "):\n",
+ " for response in slice_:\n",
+ " visit_results.append(response.json)\n",
+ "visit_results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7e8fefc9",
+ "metadata": {},
+ "source": [
+ "We can see that the document with the timestamp of 24 hours ago is not returned by the query, while the document with the current timestamp is returned.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28591491",
+ "metadata": {},
+ "source": [
+ "## Cleanup\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "e5064bd2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vespa_docker.container.stop()\n",
+ "vespa_docker.container.remove()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1872b31",
+ "metadata": {},
+ "source": [
+ "## Next steps\n",
+ "\n",
+ "This is just an intro into to the advanced configuration options available in Vespa. For more details, see the [Vespa documentation](https://docs.vespa.ai/en/reference/services.html).\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3.11.4 64-bit",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.19"
+ },
+ "vscode": {
+ "interpreter": {
+ "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tests/integration/test_integration_docker.py b/tests/integration/test_integration_docker.py
index 7bfd6d0f..121fe3dc 100644
--- a/tests/integration/test_integration_docker.py
+++ b/tests/integration/test_integration_docker.py
@@ -25,7 +25,9 @@
QueryTypeField,
AuthClient,
Struct,
+ ServicesConfiguration,
)
+from vespa.configuration.services import *
from vespa.deployment import VespaDocker
from vespa.application import VespaSync
from vespa.exceptions import VespaError
@@ -1621,3 +1623,104 @@ def callback(response: VespaResponse, id: str):
def tearDown(self) -> None:
self.vespa_docker.container.stop(timeout=CONTAINER_STOP_TIMEOUT)
self.vespa_docker.container.remove()
+
+
+class TestDocumentExpiry(unittest.TestCase):
+ def setUp(self) -> None:
+ application_name = "music"
+ self.application_name = application_name
+ music_schema = Schema(
+ name=application_name,
+ document=Document(
+ fields=[
+ Field(
+ name="artist",
+ type="string",
+ indexing=["attribute", "summary"],
+ ),
+ Field(
+ name="title",
+ type="string",
+ indexing=["attribute", "summary"],
+ ),
+ Field(
+ name="timestamp",
+ type="long",
+ indexing=["attribute", "summary"],
+ attribute=["fast-access"],
+ ),
+ ]
+ ),
+ )
+ # Create a ServicesConfiguration with document-expiry set to 1 day (timestamp > now() - 86400)
+ services_config = ServicesConfiguration(
+ application_name=application_name,
+ services_config=services(
+ container(
+ search(),
+ document_api(),
+ document_processing(),
+ id=f"{application_name}_container",
+ version="1.0",
+ ),
+ content(
+ redundancy("1"),
+ documents(
+ document(
+ type=application_name,
+ mode="index",
+ selection="music.timestamp > now() - 86400",
+ ),
+ garbage_collection="true",
+ ),
+ nodes(node(distribution_key="0", hostalias="node1")),
+ id=f"{application_name}_content",
+ version="1.0",
+ ),
+ ),
+ )
+ self.application_package = ApplicationPackage(
+ name=application_name,
+ schema=[music_schema],
+ services_config=services_config,
+ )
+ self.vespa_docker = VespaDocker(port=8089)
+ self.app = self.vespa_docker.deploy(
+ application_package=self.application_package
+ )
+
+ def test_document_expiry(self):
+ docs_to_feed = [
+ {
+ "id": "1",
+ "fields": {
+ "artist": "Snoop Dogg",
+ "title": "Gin and Juice",
+ "timestamp": int(time.time()) - 86401,
+ },
+ },
+ {
+ "id": "2",
+ "fields": {
+ "artist": "Dr.Dre",
+ "title": "Still D.R.E",
+ "timestamp": int(time.time()),
+ },
+ },
+ ]
+ self.app.feed_iterable(docs_to_feed, schema=self.application_name)
+ visit_results = []
+ for slice_ in self.app.visit(
+ schema=self.application_name,
+ content_cluster_name=f"{self.application_name}_content",
+ timeout="5s",
+ ):
+ for response in slice_:
+ visit_results.append(response.json)
+ # Visit results: [{'pathId': '/document/v1/music/music/docid/', 'documents': [{'id': 'id:music:music::2', 'fields': {'artist': 'Dr. Dre', 'title': 'Still D.R.E', 'timestamp': 1726836495}}], 'documentCount': 1}]
+ self.assertEqual(len(visit_results), 1)
+ self.assertEqual(visit_results[0]["documentCount"], 1)
+
+ def tearDown(self) -> None:
+ self.vespa_docker.container.stop(timeout=CONTAINER_STOP_TIMEOUT)
+ self.vespa_docker.container.remove()
diff --git a/tests/unit/test_configuration.py b/tests/unit/test_configuration.py
index ca379fca..09db3ad4 100644
--- a/tests/unit/test_configuration.py
+++ b/tests/unit/test_configuration.py
@@ -1,7 +1,17 @@
import unittest
from lxml import etree
import xml.etree.ElementTree as ET
-from vespa.configuration.vt import *
+from vespa.configuration.vt import (
+ VT,
+ vt,
+ create_tag_function,
+ attrmap,
+ valmap,
+ to_xml,
+ compare_xml,
+ vt_escape,
+)
+
from vespa.configuration.services import *
@@ -186,18 +196,7 @@ def test_generate_colbert_services(self):
generated_xml = generated_services.to_xml()
# Validate against relaxng
self.assertTrue(validate_services(etree.fromstring(str(generated_xml))))
- # Check all nodes and attributes being equal
- tree_original = ET.fromstring(self.xml_schema.encode("utf-8"))
- tree_generated = ET.fromstring(str(generated_xml))
- for original, generated in zip(tree_original.iter(), tree_generated.iter()):
- # print(f"Original: {original.tag}, {original.attrib}, {original.text}")
- # print(f"Generated: {generated.tag}, {generated.attrib}, {generated.text}")
- self.assertEqual(original.tag, generated.tag)
- self.assertEqual(original.attrib, generated.attrib)
- self.assertEqual(
- original.text.strip() if original.text else None,
- generated.text.strip() if generated.text else None,
- )
+ self.assertTrue(compare_xml(self.xml_schema, str(generated_xml)))
class TestBillionscaleServiceConfiguration(unittest.TestCase):
@@ -426,21 +425,21 @@ def test_generate_billion_scale_services(self):
requestthreads(persearch("2")),
feeding(concurrency("1.0")),
summary(
- io(read("directio")),
+ io_(read("directio")),
store(
cache(
maxsize_percent("5"),
compression(
- vt_type("lz4")
- ), # Using vt_type as type is a reserved keyword
+ type_("lz4")
+ ), # Using type_ as type is a reserved keyword
),
logstore(
chunk(
maxsize("16384"),
compression(
- vt_type(
+ type_(
"zstd"
- ), # Using vt_type as type is a reserved keyword
+ ), # Using type_ as type is a reserved keyword
level("3"),
),
),
@@ -459,16 +458,154 @@ def test_generate_billion_scale_services(self):
# Validate against relaxng
self.assertTrue(validate_services(etree.fromstring(str(generated_xml))))
# Check all nodes and attributes being equal
- tree_original = ET.fromstring(self.xml_schema.encode("utf-8"))
- tree_generated = ET.fromstring(str(generated_xml))
- for original, generated in zip(tree_original.iter(), tree_generated.iter()):
- # print(f"Original: {original.tag}, {original.attrib}, {original.text}")
- # print(f"Generated: {generated.tag}, {generated.attrib}, {generated.text}")
- self.assertEqual(original.tag, generated.tag)
- self.assertEqual(original.attrib, generated.attrib)
- orig_text = original.text or ""
- gen_text = generated.text or ""
- self.assertEqual(orig_text.strip(), gen_text.strip())
+ self.assertTrue(compare_xml(self.xml_schema, str(generated_xml)))
+
+
+class TestValidateServices(unittest.TestCase):
+ def setUp(self):
+ # Prepare some sample valid and invalid XML data
+ self.valid_xml_content = """
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
+
+"""
+ self.invalid_xml_content = """
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
+
+"""
+
+ # Create temporary files with valid and invalid XML content
+ self.valid_xml_file = "valid_test.xml"
+ self.invalid_xml_file = "invalid_test.xml"
+
+ with open(self.valid_xml_file, "w") as f:
+ f.write(self.valid_xml_content)
+
+ with open(self.invalid_xml_file, "w") as f:
+ f.write(self.invalid_xml_content)
+
+ # Create etree.Element from valid XML content
+ self.valid_xml_element = etree.fromstring(self.valid_xml_content)
+
+ def tearDown(self):
+ # Clean up temporary files
+ os.remove(self.valid_xml_file)
+ os.remove(self.invalid_xml_file)
+
+ def test_validate_valid_xml_content(self):
+ # Test with valid XML content as string
+ result = validate_services(self.valid_xml_content)
+ self.assertTrue(result)
+
+ def test_validate_invalid_xml_content(self):
+ # Test with invalid XML content as string
+ result = validate_services(self.invalid_xml_content)
+ self.assertFalse(result)
+
+ def test_validate_valid_xml_file(self):
+ # Test with valid XML file path
+ result = validate_services(self.valid_xml_file)
+ self.assertTrue(result)
+
+ def test_validate_invalid_xml_file(self):
+ # Test with invalid XML file path
+ result = validate_services(self.invalid_xml_file)
+ self.assertFalse(result)
+
+ def test_validate_valid_xml_element(self):
+ # Test with valid etree.Element
+ result = validate_services(self.valid_xml_element)
+ self.assertTrue(result)
+
+ def test_validate_nonexistent_file(self):
+ # Test with a non-existent file path
+ result = validate_services("nonexistent.xml")
+ self.assertFalse(result)
+
+ def test_validate_invalid_input_type(self):
+ # Test with invalid input type
+ result = validate_services(123)
+ self.assertFalse(result)
+
+
+class TestDocumentExpiry(unittest.TestCase):
+ def setUp(self):
+ self.xml_schema = """
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
+
+
+"""
+
+ def test_xml_validation(self):
+ to_validate = etree.fromstring(self.xml_schema.encode("utf-8"))
+ # Validate against relaxng
+ self.assertTrue(validate_services(to_validate))
+
+ def test_document_expiry(self):
+ application_name = "music"
+ generated = services(
+ container(
+ search(),
+ document_api(),
+ document_processing(),
+ id=f"{application_name}_container",
+ version="1.0",
+ ),
+ content(
+ redundancy("1"),
+ documents(
+ document(
+ type=application_name,
+ mode="index",
+ selection="music.timestamp > now() - 86400",
+ ),
+ garbage_collection="true",
+ ),
+ nodes(node(distribution_key="0", hostalias="node1")),
+ id=f"{application_name}_content",
+ version="1.0",
+ ),
+ )
+ generated_xml = generated.to_xml()
+ # Validate against relaxng
+ self.assertTrue(validate_services(etree.fromstring(str(generated_xml))))
+ # Compare the generated XML with the schema
+ self.assertTrue(compare_xml(self.xml_schema, str(generated_xml)))
if __name__ == "__main__":
diff --git a/tests/unit/test_package.py b/tests/unit/test_package.py
index fb6f7cfe..ea5f6fb1 100644
--- a/tests/unit/test_package.py
+++ b/tests/unit/test_package.py
@@ -35,6 +35,7 @@
ApplicationConfiguration,
)
from vespa.configuration.vt import compare_xml
+from vespa.configuration.services import *
class TestField(unittest.TestCase):
@@ -1805,3 +1806,83 @@ def test_default_service_config_to_text(self):
self.assertTrue(
compare_xml(app_package.services_to_text_vt, expected_result),
)
+
+ def test_document_expiry(self):
+ # Create a Schema with name music and a field with name artist, title and timestamp
+ # Ref https://docs.vespa.ai/en/documents.html#document-expiry
+ application_name = "music"
+ music_schema = Schema(
+ name=application_name,
+ document=Document(
+ fields=[
+ Field(
+ name="artist",
+ type="string",
+ indexing=["attribute", "summary"],
+ ),
+ Field(
+ name="title",
+ type="string",
+ indexing=["attribute", "summary"],
+ ),
+ Field(
+ name="timestamp",
+ type="long",
+ indexing=["attribute", "summary"],
+ attribute=["fast-access"],
+ ),
+ ]
+ ),
+ )
+ # Create a ServicesConfiguration with document-expiry set to 1 day (timestamp > now() - 86400)
+ services_config = ServicesConfiguration(
+ application_name=application_name,
+ services_config=services(
+ container(
+ search(),
+ document_api(),
+ document_processing(),
+ id=f"{application_name}_container",
+ version="1.0",
+ ),
+ content(
+ redundancy("1"),
+ documents(
+ document(
+ type=application_name,
+ mode="index",
+ selection="music.timestamp > now() - 86400",
+ ),
+ garbage_collection="true",
+ ),
+ nodes(node(distribution_key="0", hostalias="node1")),
+ id=f"{application_name}_content",
+ version="1.0",
+ ),
+ ),
+ )
+ application_package = ApplicationPackage(
+ name=application_name,
+ schema=[music_schema],
+ services_config=services_config,
+ )
+ expected = """
+
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
+
+
+"""
+ self.assertEqual(expected, application_package.services_to_text)
+ self.assertTrue(validate_services(application_package.services_to_text))
diff --git a/vespa/application.py b/vespa/application.py
index f57848e4..a22dcd57 100644
--- a/vespa/application.py
+++ b/vespa/application.py
@@ -37,6 +37,9 @@
import gzip
from requests.models import PreparedRequest
from io import BytesIO
+import logging
+
+logging.getLogger("urllib3").setLevel(logging.ERROR)
VESPA_CLOUD_SECRET_TOKEN: str = "VESPA_CLOUD_SECRET_TOKEN"
diff --git a/vespa/configuration/services.py b/vespa/configuration/services.py
index 653c38b2..115d399c 100644
--- a/vespa/configuration/services.py
+++ b/vespa/configuration/services.py
@@ -1,5 +1,9 @@
from vespa.configuration.vt import VT, create_tag_function, voids
from vespa.configuration.relaxng import RELAXNG
+from lxml import etree
+from pathlib import Path
+import os
+from typing import Union
# List of XML tags (customized for Vespa configuration)
services_tags = [
@@ -165,14 +169,37 @@
_g[sanitized_name] = create_tag_function(tag, tag in voids)
-def validate_services(xml_schema: str) -> bool:
+def validate_services(xml_input: Union[Path, str, etree.Element]) -> bool:
"""
- Validate an XML schema against the RelaxNG schema file for services.xml
+ Validate an XML input against the RelaxNG schema file for services.xml
Args:
- xml_schema (str): XML schema to validate
-
+ xml_input (Path or str or etree.Element): The XML input to validate.
Returns:
- bool: True if the XML schema is valid, False otherwise
+ True if the XML input is valid according to the RelaxNG schema, False otherwise.
"""
- return RELAXNG["services"].validate(xml_schema)
+ try:
+ if isinstance(xml_input, etree._Element):
+ xml_tree = etree.ElementTree(xml_input)
+ elif isinstance(xml_input, etree._ElementTree):
+ xml_tree = xml_input
+ elif isinstance(xml_input, (str, Path)):
+ if isinstance(xml_input, Path) or os.path.exists(xml_input):
+ # Assume it's a file path
+ xml_tree = etree.parse(str(xml_input))
+ elif isinstance(xml_input, str):
+ # May hav unicode string with encoding declaration
+ if "encoding" in xml_input:
+ xml_tree = etree.ElementTree(etree.fromstring(xml_input.encode()))
+ else:
+ # Assume it's a string containing XML content
+ xml_tree = etree.ElementTree(etree.fromstring(xml_input))
+ else:
+ raise TypeError("xml_input must be a Path, str, or etree.Element.")
+ except Exception as e:
+ # Handle parsing exceptions
+ print(f"Error parsing XML input: {e}")
+ return False
+
+ is_valid = RELAXNG["services"].validate(xml_tree)
+ return is_valid
diff --git a/vespa/configuration/vt.py b/vespa/configuration/vt.py
index b2a645c2..74ba1265 100644
--- a/vespa/configuration/vt.py
+++ b/vespa/configuration/vt.py
@@ -4,11 +4,14 @@
from fastcore.utils import patch
import xml.etree.ElementTree as ET
-# If the vespa tags correspond to reserved Python keywords, they are replaced with the following:
+# If the vespa tags correspond to reserved Python keywords or commonly used names,
+# they are replaced with the following:
replace_reserved = {
- "type": "vt_type",
- "class": "cls",
- "for": "fr",
+ "type": "type_",
+ "class": "class_",
+ "for": "for_",
+ "time": "time_",
+ "io": "io_",
}
restore_reserved = {v: k for k, v in replace_reserved.items()}
@@ -65,11 +68,17 @@ def __iter__(self):
def attrmap(o):
+ """This maps the attributes that we don't want to be Python keywords or commonly used names to the replacement names."""
o = dict(_global="global").get(o, o)
return o.lstrip("_").replace("_", "-")
def valmap(o):
+ """Convert values to the string representation for xml. integers to strings and booleans to 'true' or 'false'"""
+ if isinstance(o, bool):
+ return str(o).lower()
+ elif isinstance(o, int):
+ return str(o)
return o if isinstance(o, str) else " ".join(map(str, o))
@@ -84,8 +93,22 @@ def _flatten_tuple(tup):
def _preproc(c, kw, attrmap=attrmap, valmap=valmap):
+ """
+ Preprocess the children and attributes of a VT structure.
+
+ :param c: Children of the VT structure
+ :param kw: Attributes of the VT structure
+ :param attrmap: Dict to map attribute names
+ :param valmap: Dict to map attribute values
+
+ :return: Tuple of children and attributes
+ """
+
+ # If the children are a single generator, map, or filter, convert it to a tuple
if len(c) == 1 and isinstance(c[0], (types.GeneratorType, map, filter)):
c = tuple(c[0])
+ # Create the attributes dictionary by mapping the keys and values
+ # TODO: Check if any of Vespa supported attributes are camelCase
attrs = {attrmap(k.lower()): valmap(v) for k, v in kw.items() if v is not None}
return _flatten_tuple(c), attrs
@@ -99,7 +122,7 @@ def vt(
valmap: callable = valmap,
**kw,
):
- "Create an `VT` structure for `to_xml()`"
+ "Create a VT structure with `tag`, `children` and `attrs`"
# NB! fastcore.xml uses tag.lower() for tag names. This is not done here.
return VT(tag, *_preproc(c, kw, attrmap=attrmap, valmap=valmap), void_=void_)
@@ -110,7 +133,7 @@ def vt(
def Xml(*c, version="1.0", encoding="UTF-8", **kwargs) -> VT:
- "An top level XML tag, with `encoding` and children `c`"
+ "A top level XML tag, with `encoding` and children `c`"
res = vt("?xml", *c, version=version, encoding=encoding, void_="?")
return res
@@ -173,8 +196,10 @@ def _to_xml(elm, lvl, indent, do_escape):
# Handle the case where children are text or elements
res = f"{sp}<{stag}{attr_str}>"
- # If the children are just text, don't introduce newlines
- if len(cs) == 1 and isinstance(cs[0], str):
+ # If the children are just text or int, don't introduce newlines
+ if len(cs) == 1 and (isinstance(cs[0], str) or isinstance(cs[0], int)):
+ if isinstance(cs[0], int):
+ cs = str(cs[0])
res += f"{esc_fn(cs[0].strip())}{stag}>{nl if indent else ''}"
else:
# If there are multiple children, properly indent them