From 7668fcb6667ad2d108879de743b3d7a3ce8907a2 Mon Sep 17 00:00:00 2001 From: Kristian Aune Date: Wed, 27 Sep 2023 09:36:50 +0200 Subject: [PATCH] Move routing stuff to one document --- _data/sidebar.yml | 2 - en/document-processing.html | 2 +- en/indexing.html | 519 +----------- en/operations-selfhosted/routing.html | 785 +++++++++++++++++- en/operations/reindexing.html | 2 +- .../using-kubernetes-with-vespa.html | 2 +- en/performance/sizing-feeding.html | 2 +- en/reads-and-writes.html | 4 +- en/reference/routingpolicies.html | 272 ------ en/reference/services-routing.html | 295 ------- en/reference/vespa-cmdline-tools.html | 2 +- 11 files changed, 796 insertions(+), 1091 deletions(-) delete mode 100644 en/reference/routingpolicies.html delete mode 100644 en/reference/services-routing.html diff --git a/_data/sidebar.yml b/_data/sidebar.yml index 597c439e23..a914a42b32 100644 --- a/_data/sidebar.yml +++ b/_data/sidebar.yml @@ -370,8 +370,6 @@ docs: url: /en/reference/config-files.html - page: mTLS Reference url: /en/reference/mtls.html - - page: Routingpolicies Reference - url: /en/reference/routingpolicies.html - page: Internal Configuration File Reference url: /en/reference/internal-config-files.html - page: Healthchecks Reference diff --git a/en/document-processing.html b/en/document-processing.html index 8f2d2fd146..2d9b0fb648 100644 --- a/en/document-processing.html +++ b/en/document-processing.html @@ -92,7 +92,7 @@

Deploying a Document Processor

For example, the route default/chain.my-chain indexing would route feed operations through the chain "my-chain" in the "default" container cluster, and then to the "indexing" hop, which resolves to the specified indexing chain for each content cluster the document should be sent to. - More details can be found in indexing: + More details can be found in indexing:

diff --git a/en/indexing.html b/en/indexing.html index eef91fccf6..71633e241d 100644 --- a/en/indexing.html +++ b/en/indexing.html @@ -6,340 +6,27 @@ ---

-Indexing is the process of routing document writes to indexing processors, -processing (indexing) documents and writing the documents to content clusters. -

-Refer to the overview. -The primary index configuration is the schema. -services.xml configures how indexing is distributed to the nodes. -

-This article documents the default indexing, how to configure indexing for different clusters -and how to add custom document processing. -

-See #13193 -for a discussion on using default as a name. -

- -document-processing is an example of custom document processing, and useful for testing routing. -

- - - -

Routing

-

- A normal Vespa configuration has container and content cluster(s), - with one or more document types defined in schemas. - Routing document writes means routing documents to the indexing container cluster, - then the right content cluster. + Indexing is the process of routing + document writes to indexing processors, + processing (indexing) documents and writing the documents to content clusters.

- The indexing cluster is a container cluster - - see multiple container clusters for variants. - Add the document-api - feed endpoint to this cluster. - The mapping from document type to content cluster is in - document in the content cluster. - From - album-recommendation: + Refer to the overview. + The primary index configuration is the schema. + services.xml configures how indexing is distributed to the nodes.

-
-<services version="1.0">
-
-    <container id="container" version="1.0">
-        <document-api />
-        <search />
-        <nodes>
-            <node hostalias="node1" />
-        </nodes>
-    </container>
-
-    <content id="music" version="1.0">
-        <redundancy>1</redundancy>
-        <documents>
-            <document type="music" mode="index" />
-        </documents>
-        <nodes>
-            <node hostalias="node1" distribution-key="0" />
-        </nodes>
-    </content>
-
-</services>
-
-

-Given this configuration, Vespa knows which is the container cluster used for indexing, -and which content cluster that stores the music document type. -Use vespa-route -to display routing generated from this configuration: -

-
-$ vespa-route
-There are 6 route(s):
-    1. default
-    2. default-get
-    3. music
-    4. music-direct
-    5. music-index
-    6. storage/cluster.music
-
-There are 2 hop(s):
-    1. container/chain.indexing
-    2. indexing
-

-Note the default route. This route is auto-generated by Vespa, -and is used when no other route is used when using /document/v1. -default points to indexing: + This article documents the default indexing, how to configure indexing for different clusters + and how to add custom document processing.

-
-$ vespa-route --route default
-The route 'default' has 1 hop(s):
-    1. indexing
-
-
-$ vespa-route --hop indexing
-The hop 'indexing' has selector:
-       [DocumentRouteSelector]
-And 1 recipient(s):
-    1. music
-
-
-$ vespa-route --route music
-The route 'music' has 1 hop(s):
-    1. [MessageType:music]
-

-In short, the default route handles documents of type music. -Vespa will route to the container cluster with document-api - -note the chain.indexing above. -This is a set of built-in document processors that does the indexing (below). -

-Refer to the trace appendix for routing details. + See #13193 + for a discussion on using default as a name.

- - - -

chain.indexing

-This indexing chain is set up on the container once a content cluster has mode="index". -

-The -IndexingProcessor -annotates the document based on the indexing script -generated from the schema. Example: + + document-processing is an example of custom document processing, and useful for testing routing.

-
-$ vespa-get-config -n vespa.configdefinition.ilscripts \
-  -i container/docprocchains/chain/indexing/component/com.yahoo.docprocs.indexing.IndexingProcessor
-
-maxtermoccurrences 100
-fieldmatchmaxlength 1000000
-ilscript[0].doctype "music"
-ilscript[0].docfield[0] "artist"
-ilscript[0].docfield[1] "artistId"
-ilscript[0].docfield[2] "title"
-ilscript[0].docfield[3] "album"
-ilscript[0].docfield[4] "duration"
-ilscript[0].docfield[5] "year"
-ilscript[0].docfield[6] "popularity"
-ilscript[0].content[0] "clear_state | guard { input artist | tokenize normalize stem:"BEST" | summary artist | index artist; }"
-ilscript[0].content[1] "clear_state | guard { input artistId | summary artistId | attribute artistId; }"
-ilscript[0].content[2] "clear_state | guard { input title | tokenize normalize stem:"BEST" | summary title | index title; }"
-ilscript[0].content[3] "clear_state | guard { input album | tokenize normalize stem:"BEST" | index album; }"
-ilscript[0].content[4] "clear_state | guard { input duration | summary duration; }"
-ilscript[0].content[5] "clear_state | guard { input year | summary year | attribute year; }"
-ilscript[0].content[6] "clear_state | guard { input popularity | summary popularity | attribute popularity; }"
-
-

- Refer to linguistics for more details. -

-

-By default, the indexing chain is set up on the first container cluster in services.xml. -When having multiple container clusters, it is recommended to configure this explicitly, see -multiple container clusters. -

- - - -

Document selection

-

-The document -can have a selection string, -normally used to expire documents. -This is also evaluated during feeding, so documents that would immediately expire are dropped. -This is not an error, the document API will report 200 - but can be confusing. -

-The evaluation is done in the - -DocumentRouteSelector at the feeding endpoint - before any processing/indexing. -I.e. the document is evaluated using the selection string (drop it or not), -then where to route it, based on document type. -

-Example: the selection is configured to not match the document being fed: -

-
-<content id="music" version="1.0">
-    <redundancy>1</redundancy>
-    <documents>
-        <document type="music" mode="index" selection='music.album == "thisstringwillnotmatch"'/>
-
- -
-$ vespa-feeder --trace 6 doc.json
-
-<trace>
-    [1564576570.693] Source session accepted a 4096 byte message. 1 message(s) now pending.
-    [1564576570.713] Sequencer sending message with sequence id '-1163801147'.
-    [1564576570.721] Recognized 'default' as route 'indexing'.
-    [1564576570.727] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
-    [1564576570.811] Running routing policy 'DocumentRouteSelector'.
-    [1564576570.822] Policy 'DocumentRouteSelector' assigned a reply to this branch.
-    [1564576570.828] Sequencer received reply with sequence id '-1163801147'.
-    [1564576570.828] Source session received reply. 0 message(s) now pending.
-</trace>
-
-Messages sent to vespa (route default) :
-----------------------------------------
-PutDocument:	ok: 0 msgs/sec: 0.00 failed: 0 ignored: 1 latency(min, max, avg): 9223372036854775807, -9223372036854775808, 0
-
-

-Without the selection (i.e. everything matches): -

-
-$ vespa-feeder --trace 6 doc.json
-
-<trace>
-    [1564576637.147] Source session accepted a 4096 byte message. 1 message(s) now pending.
-    [1564576637.168] Sequencer sending message with sequence id '-1163801147'.
-    [1564576637.176] Recognized 'default' as route 'indexing'.
-    [1564576637.180] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
-    [1564576637.256] Running routing policy 'DocumentRouteSelector'.
-    [1564576637.268] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'.
-    ...
-</trace>
-
-Messages sent to vespa (route default) :
-----------------------------------------
-PutDocument:	ok: 1 msgs/sec: 1.05 failed: 0 ignored: 0 latency(min, max, avg): 845, 845, 845
-
-

-In the last case, in the -DocumentRouteSelector routing policy, -the document matched the selection string / there was no selection string, -and the document was forward to the nex hop in the route. -

- - - -

Document processing

-

-Add custom processing of documents using document processing. -The normal use case is to add document processors in the default route, before indexing. Example: -

-
-<services version="1.0">
-
-    <container id="container" version="1.0">
-        <document-api />
-        <search />
-        <document-processing>
-            <chain id="default">
-                <documentprocessor
-                    id="com.mydomain.example.Rot13DocumentProcessor"
-                    bundle="album-recommendation-docproc" />
-            </chain>
-        </document-processing>
-        <nodes>
-            <node hostalias="node1" />
-        </nodes>
-    </container>
-
-    <content id="music" version="1.0">
-        <redundancy>1</redundancy>
-        <documents>
-            <document >type="music" mode="index" />
-        </documents>
-        <nodes>
-            <node hostalias="node1" distribution-key="0" />
-        </nodes>
-    </content>
-
-</services>
-
-

-Note that a new hop default/chain.default is added, -and the default route is changed to include this: -

-
-$ vespa-route
-
-There are 6 route(s):
-    1. default
-    2. default-get
-    3. music
-    4. music-direct
-    5. music-index
-    6. storage/cluster.music
-
-There are 3 hop(s):
-    1. default/chain.default
-    2. default/chain.indexing
-    3. indexing
-
-
-$ vespa-route --route default
-
-The route 'default' has 2 hop(s):
-    1. default/chain.default
-    2. indexing
-
-

-Note that the document processing chain must be called default -to automatically be included in the default route. -

- - -

Inherit indexing chain

-

- An alternative to the above is inheriting the indexing chain - use this when getting this error: -

-
-Indexing cluster 'XX' specifies the chain 'default' as indexing chain.
-As the 'default' chain is run by default, using it as the indexing chain will run it twice.
-Use a different name for the indexing chain.
-
-

- Call the chain something else than default, and let it inherit indexing: -

-
-<services version="1.0">
-
-    <container id="container" version="1.0">
-        <document-api />
-        <search />
-        <document-processing>
-            <chain id="offer-processing" inherits="indexing">
-                <documentprocessor id="processor.OfferDocumentProcessor"/>
-            </chain>
-        </document-processing>
-        <nodes>
-            <node hostalias="node1" />
-        </nodes>
-    </container>
-
-    <content id="music" version="1.0">
-        <redundancy>1</redundancy>
-        <documents>
-            <document type="offer" mode="index"/>
-            <document-processing cluster="default" chain="offer-processing"/>
-        </documents>
-        <nodes>
-            <node hostalias="node1" distribution-key="0" />
-        </nodes>
-    </content>
-
-</services>
-
-

See #13193 for details.

@@ -378,185 +65,3 @@

Date indexing

{% include note.html content='The date field above is placed outside the document section, as its content is generated from the document input.' %} - - - -

Multiple container clusters

-

-Vespa can be configured to use more than one container cluster. -Use cases can be to separate search and document processing -or having different document processing clusters due to capacity constraints or dependencies. -Example with separate search and feeding/indexing container clusters: -

-
-<services version="1.0">
-
-    <container id="container-search" version="1.0">
-        <search />
-        <nodes>
-            <node hostalias="node1" />
-        </nodes>
-    </container>
-
-    <container id="container-indexing" version="1.0">
-        <http>
-            <server id="httpServer2" port="8081" />
-        </http>
-        <document-api />
-        <document-processing />
-        <nodes>
-            <node hostalias="node1" />
-        </nodes>
-    </container>
-
-    <content id="music" version="1.0">
-        <redundancy>1</redundancy>
-        <documents>
-            <document type="music" mode="index" />
-            <document-processing cluster="container-indexing" />
-        </documents>
-        <nodes>
-            <node hostalias="node1" distribution-key="0" />
-        </nodes>
-    </content>
-
-</services>
-
-

Notes:

- -

-Observe the container-indexing/chain.indexing hop, -and the indexing chain is set up on the container-indexing cluster: -

-
-$ vespa-route
-
-There are 6 route(s):
-    1. default
-    2. default-get
-    3. music
-    4. music-direct
-    5. music-index
-    6. storage/cluster.music
-
-There are 2 hop(s):
-    1. container-indexing/chain.indexing
-    2. indexing
-
-
-$ curl -s http://localhost:8081 | python -m json.tool | grep -C 3 chain.indexing
-
-        {
-            "bundle": "container-disc:7.0.0",
-            "class": "com.yahoo.messagebus.jdisc.MbusClient",
-            "id": "chain.indexing@MbusClient",
-            "serverBindings": []
-        },
-        {
---
-            "class": "com.yahoo.docproc.jdisc.DocumentProcessingHandler",
-            "id": "com.yahoo.docproc.jdisc.DocumentProcessingHandler",
-            "serverBindings": [
-                "mbus://*/chain.indexing"
-            ]
-        },
-        {
-
-
- - - -

Appendix: trace

-

-Below is a trace example, no selection string: -

-
-$ cat doc.json
-[
-{
-    "put": "id:mynamespace:music::123",
-    "fields": {
-         "album": "Bad",
-         "artist": "Michael Jackson",
-         "title": "Bad",
-         "year": 1987,
-         "duration": 247
-    }
-}
-]
-
-$ vespa-feeder --trace 6 doc.json
-<trace>
-    [1564571762.403] Source session accepted a 4096 byte message. 1 message(s) now pending.
-    [1564571762.420] Sequencer sending message with sequence id '-1163801147'.
-    [1564571762.426] Recognized 'default' as route 'indexing'.
-    [1564571762.429] Recognized 'indexing' as HopBlueprint(selector = { '[DocumentRouteSelector]' }, recipients = { 'music' }, ignoreResult = false).
-    [1564571762.489] Running routing policy 'DocumentRouteSelector'.
-    [1564571762.493] Component '[MessageType:music]' selected by policy 'DocumentRouteSelector'.
-    [1564571762.493] Resolving '[MessageType:music]'.
-    [1564571762.520] Running routing policy 'MessageType'.
-    [1564571762.520] Component 'music-index' selected by policy 'MessageType'.
-    [1564571762.520] Resolving 'music-index'.
-    [1564571762.520] Recognized 'music-index' as route 'container/chain.indexing [Content:cluster=music]'.
-    [1564571762.520] Recognized 'container/chain.indexing' as HopBlueprint(selector = { '[LoadBalancer:cluster=container;session=chain.indexing]' }, recipients = {  }, ignoreResult = false).
-    [1564571762.526] Running routing policy 'LoadBalancer'.
-    [1564571762.538] Component 'tcp/vespa-container:19101/chain.indexing' selected by policy 'LoadBalancer'.
-    [1564571762.538] Resolving 'tcp/vespa-container:19101/chain.indexing [Content:cluster=music]'.
-    [1564571762.580] Sending message (version 7.83.27) from client to 'tcp/vespa-container:19101/chain.indexing' with 179.853 seconds timeout.
-    [1564571762.581] Message (type 100004) received at 'container/container.0' for session 'chain.indexing'.
-    [1564571762.581] Message received by MbusServer.
-    [1564571762.582] Request received by MbusClient.
-    [1564571762.582] Running routing policy 'Content'.
-    [1564571762.582] Selecting route
-    [1564571762.582] No cluster state cached. Sending to random distributor.
-    [1564571762.582] Too few nodes seen up in state. Sending totally random.
-    [1564571762.582] Component 'tcp/vespa-container:19114/default' selected by policy 'Content'.
-    [1564571762.582] Resolving 'tcp/vespa-container:19114/default'.
-    [1564571762.586] Sending message (version 7.83.27) from 'container/container.0' to 'tcp/vespa-container:19114/default' with 179.995 seconds timeout.
-    [1564571762.587181] Message (type 100004) received at 'storage/cluster.music/distributor/0' for session 'default'.
-    [1564571762.587245] music/distributor/0 CommunicationManager: Received message from message bus
-    [1564571762.587510] Communication manager: Sending Put(BucketId(0x2000000000000020), id:mynamespace:music::123, timestamp 1564571762000000, size 275)
-    [1564571762.587529] Communication manager: Passing message to source session
-    [1564571762.587547] Source session accepted a 1 byte message. 1 message(s) now pending.
-    [1564571762.587681] Sending message (version 7.83.27) from 'storage/cluster.music/distributor/0' to 'storage/cluster.music/storage/0/default' with 180.00 seconds timeout.
-    [1564571762.587960] Message (type 10) received at 'storage/cluster.music/storage/0' for session 'default'.
-    [1564571762.588052] music/storage/0 CommunicationManager: Received message from message bus
-    [1564571762.588263] PersistenceThread: Processing message in persistence layer
-    [1564571762.588953] Communication manager: Sending PutReply(id:mynamespace:music::123, BucketId(0x2000000000000020), timestamp 1564571762000000)
-    [1564571762.589023] Sending reply (version 7.83.27) from 'storage/cluster.music/storage/0'.
-    [1564571762.589332] Reply (type 11) received at 'storage/cluster.music/distributor/0'.
-    [1564571762.589448] Source session received reply. 0 message(s) now pending.
-    [1564571762.589459] music/distributor/0Communication manager: Received reply from message bus
-    [1564571762.589679] Communication manager: Sending PutReply(id:music:music::123, BucketId(0x0000000000000000), timestamp 1564571762000000)
-    [1564571762.589807] Sending reply (version 7.83.27) from 'storage/cluster.music/distributor/0'.
-    [1564571762.590] Reply (type 200004) received at 'container/container.0'.
-    [1564571762.590] Routing policy 'Content' merging replies.
-    [1564571762.590] Reply received by MbusClient.
-    [1564571762.590] Sending reply from MbusServer.
-    [1564571762.590] Sending reply (version 7.83.27) from 'container/container.0'.
-    [1564571762.612] Reply (type 200004) received at client.
-    [1564571762.613] Routing policy 'LoadBalancer' merging replies.
-    [1564571762.613] Routing policy 'MessageType' merging replies.
-    [1564571762.615] Routing policy 'DocumentRouteSelector' merging replies.
-    [1564571762.622] Sequencer received reply with sequence id '-1163801147'.
-    [1564571762.622] Source session received reply. 0 message(s) now pending.
-</trace>
-
-Messages sent to vespa (route default) :
-----------------------------------------
-PutDocument:	ok: 1 msgs/sec: 3.30 failed: 0 ignored: 0 latency(min, max, avg): 225, 225, 225
-
diff --git a/en/operations-selfhosted/routing.html b/en/operations-selfhosted/routing.html index f3f8286643..36439d68c2 100644 --- a/en/operations-selfhosted/routing.html +++ b/en/operations-selfhosted/routing.html @@ -5,6 +5,7 @@ - /documentation/routing.html - /en/routing.html - /en/reference/services-routing.html +- /en/reference/routingpolicies.html ---

@@ -13,16 +14,16 @@ configuration which is appropriate for most cases, so no explicit routing configuration is necessary. However, explicit routing can be used in advanced use cases such as sending different document streams to different document processing -clusters, or through multiple consecutive clusters etc.

- -

There are other, more in-depth, articles on routing: +clusters, or through multiple consecutive clusters etc. +

+

There are other, more in-depth, articles on routing: