diff --git a/en/operations/batch-delete.html b/en/operations/batch-delete.html index cc31dcbeb4..f5b404cee9 100644 --- a/en/operations/batch-delete.html +++ b/en/operations/batch-delete.html @@ -104,3 +104,94 @@
+ + + ++ This is an end-to-end example on how to track number of documents, and delete a subset using a + selection string. +
++ Feed a batch of documents, e.g. using the vector-search + sample application:
++$ vespa feed <(python3 feed.py 100000 3) ++
+ See number of documents for a node using the + + content.proton.documentdb.documents.total metric (here 100,000): +
++$ docker exec vespa curl -s http://localhost:19092/prometheus/v1/values | grep ^content.proton.documentdb.documents.total + + content_proton_documentdb_documents_total_max{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 + + content_proton_documentdb_documents_total_last{metrictype="standard",instance="searchnode",documenttype="vector",clustername="vectors",vespa_service="vespa_searchnode",} 100000.0 1695383025000 ++
+ Using the metric above is useful while feeding this example. + Another alternative is visiting all documents to print the ID: +
++$ vespa visit --field-set "[id]" | wc -l + 100000 ++
At this point, there are 100,000 document in the index.
+ + ++ Define the subset of documents to delete - e.g. by age or other criteria. + In this example, select random 1%. Do a test run: +
++$ vespa visit --field-set "[id]" --selection 'id.hash().abs() % 100 == 0' | wc -l + 1016 ++
+ Hence, the selection string id.hash().abs() % 100 == 0
hits 1,016 documents.
+
+ Delete documents, see the number of documents deleted in the response: +
++$ curl -X DELETE \ + "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors" + + { + "pathId":"/document/v1/mynamespace/vector/docid", + "documentCount":1016 + } ++
In case of a large result set, a continuation token might be returned in the response, too:
++"continuation": "AAAAEAAAA" ++
If so, add the token and redo the request:
++$ curl -X DELETE \ + "http://localhost:8080/document/v1/mynamespace/vector/docid?selection=id.hash%28%29.abs%28%29+%25+100+%3D%3D+0&cluster=vectors&continuation=AAAAEAAAA" ++
+ Repeat as long as there are tokens in the output. + The token changes in every response. +
+ + +Check that all documents matching the selection criterion are deleted:
++$ vespa visit --selection 'id.hash().abs() % 100 == 0' --field-set "[id]" | wc -l + 0 ++
List remaining documents:
++$ vespa visit --field-set "[id]" | wc -l + 98984 +diff --git a/en/vespa-cli.html b/en/vespa-cli.html index 05d2a8d5f9..8df2ff6bc9 100644 --- a/en/vespa-cli.html +++ b/en/vespa-cli.html @@ -144,7 +144,7 @@