diff --git a/docs/sphinx/source/application-packages.ipynb b/docs/sphinx/source/application-packages.ipynb new file mode 100644 index 00000000..c25d49f4 --- /dev/null +++ b/docs/sphinx/source/application-packages.ipynb @@ -0,0 +1,319 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e05d0811", + "metadata": {}, + "source": [ + "![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)\n", + "\n", + "# Application packages\n", + "\n", + "Vespa is configured using an [application package](https://docs.vespa.ai/en/application-packages.html).\n", + "Pyvespa provides an API to generate a deployable application package.\n", + "\n", + "An application package has at a minimum a [schema](https://docs.vespa.ai/en/schemas.html)\n", + "and [services.xml](https://docs.vespa.ai/en/reference/services.html).\n", + "\n", + "Example - create an empty application package:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7e3477a6", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.package import ApplicationPackage\n", + "\n", + "app_package = ApplicationPackage(name=\"myschema\")" + ] + }, + { + "cell_type": "markdown", + "id": "e3f1e7d5", + "metadata": {}, + "source": [ + "To inspect an application package, dump it to disk using\n", + "[to_files](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage.to_files):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d05523a8", + "metadata": {}, + "outputs": [], + "source": [ + "import tempfile, os\n", + "\n", + "temp_dir = tempfile.TemporaryDirectory()\n", + "os.environ[\"TMP_APP_DIR\"] = temp_dir.name\n", + "app_package.to_files(temp_dir.name)\n", + "print(temp_dir.name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e3a4dc05", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "./services.xml\r\n", + "./schemas/myschema.sd\r\n", + "./search/query-profiles/types/root.xml\r\n", + "./search/query-profiles/default.xml\r\n" + ] + } + ], + "source": [ + "!cd $TMP_APP_DIR && find . -type f" + ] + }, + { + "cell_type": "markdown", + "id": "c038d33a", + "metadata": {}, + "source": [ + "Ignore these files for now:\n", + "\n", + " ./search/query-profiles/types/root.xml\n", + " ./search/query-profiles/default.xml" + ] + }, + { + "cell_type": "markdown", + "id": "7b01cd09", + "metadata": {}, + "source": [ + "## Schema\n", + "\n", + "Use a schema to Create fields, fieldsets and a ranking function - dump the empty schema (An empty schema is created, with the same name as the application package):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "923edec8", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "schema myschema {\r\n", + " document myschema {\r\n", + " }\r\n", + "}" + ] + } + ], + "source": [ + "!cat $TMP_APP_DIR/schemas/myschema.sd" + ] + }, + { + "cell_type": "markdown", + "id": "5a1cbaf2", + "metadata": {}, + "source": [ + "Add fields, a fieldset and a ranking function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c83c1945", + "metadata": {}, + "outputs": [], + "source": [ + "from vespa.package import Field, FieldSet, RankProfile\n", + "\n", + "app_package.schema.add_fields(\n", + " Field(name = \"id\", type = \"string\", indexing = [\"attribute\", \"summary\"]),\n", + " Field(name = \"title\", type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\"),\n", + " Field(name = \"body\", type = \"string\", indexing = [\"index\", \"summary\"], index = \"enable-bm25\")\n", + ")\n", + "\n", + "app_package.schema.add_field_set(\n", + " FieldSet(name = \"default\", fields = [\"title\", \"body\"])\n", + ")\n", + "\n", + "app_package.schema.add_rank_profile(\n", + " RankProfile(name = \"default\", first_phase = \"bm25(title) + bm25(body)\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f721bdfd", + "metadata": {}, + "source": [ + "Dump application package again, show schema:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fcd3de2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "schema myschema {\r\n", + " document myschema {\r\n", + " field id type string {\r\n", + " indexing: attribute | summary\r\n", + " }\r\n", + " field title type string {\r\n", + " indexing: index | summary\r\n", + " index: enable-bm25\r\n", + " }\r\n", + " field body type string {\r\n", + " indexing: index | summary\r\n", + " index: enable-bm25\r\n", + " }\r\n", + " }\r\n", + " fieldset default {\r\n", + " fields: title, body\r\n", + " }\r\n", + " rank-profile default {\r\n", + " first-phase {\r\n", + " expression: bm25(title) + bm25(body)\r\n", + " }\r\n", + " }\r\n", + "}" + ] + } + ], + "source": [ + "app_package.to_files(temp_dir.name)\n", + "!cat $TMP_APP_DIR/schemas/myschema.sd" + ] + }, + { + "cell_type": "markdown", + "id": "7cc78157", + "metadata": {}, + "source": [ + "Note how the indexing settings are written to the schema.\n", + "\n", + "> **_Pyvespa generally does not support all indexing options in Vespa - it is made for easy experimentation.\n", + " To configure setting an unsupported indexing option (or any other unsupported option),\n", + " dump the application package, modify the schema file\n", + " and deploy the application package from the directory, or as a zipped file.\n", + " [Read more](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)_**" + ] + }, + { + "cell_type": "markdown", + "id": "cfd73872", + "metadata": {}, + "source": [ + "At this point, review the Vespa documentation:\n", + "* [field](https://docs.vespa.ai/en/schemas.html#field)\n", + "* [fieldset](https://docs.vespa.ai/en/schemas.html#fieldset)\n", + "* [rank-profile](https://docs.vespa.ai/en/ranking.html#rank-profiles)" + ] + }, + { + "cell_type": "markdown", + "id": "a51353a4", + "metadata": {}, + "source": [ + "## Services\n", + "\n", + "In `services.xml` you will find a container and content cluster -\n", + "see the [Vespa Overview](https://docs.vespa.ai/en/overview.html).\n", + "This is a file you will normally not change or need to know much about - dump the default file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4abae84e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\r\n", + "\r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + " 1\r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + " \r\n", + "" + ] + } + ], + "source": [ + "!cat $TMP_APP_DIR/services.xml" + ] + }, + { + "cell_type": "markdown", + "id": "d6477c44", + "metadata": {}, + "source": [ + "Observe:\n", + "* A content cluster (this is where the index is stored) called `myschema_content` is created.\n", + " This is information not normally needed, unless using\n", + " [delete_all_docs](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.delete_all_docs)\n", + " to quickly remove all documents from a schema\n", + "\n", + "Remove the temporary application package file dump:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "84ce16e8", + "metadata": {}, + "outputs": [], + "source": [ + "temp_dir.cleanup()" + ] + }, + { + "cell_type": "markdown", + "id": "e242ac80", + "metadata": {}, + "source": [ + "## Next step: Deploy, feed and query\n", + "\n", + "Once the schema is ready for deployment, decide deployment option and deploy the application package:\n", + "* [Deploy to local container](https://pyvespa.readthedocs.io/en/latest/deploy-docker.html)\n", + "* [Deploy to Vespa Cloud](https://pyvespa.readthedocs.io/en/latest/deploy-vespa-cloud.html)\n", + "\n", + "Use the guides on the pyvespa site to feed and query data." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "python3", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb b/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb index 748b02dc..d7fb498b 100644 --- a/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb +++ b/docs/sphinx/source/examples/semantic-retrieval-for-question-answering-applications.ipynb @@ -923,4 +923,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +}