Skip to content

Latest commit

 

History

History

search-as-you-type

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
#Vespa

Vespa sample application - search as you type

Uses N-grams to simulate substring search.

Quick Start

Requirements:

  • Docker Desktop installed and running. 4GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
  • Alternatively, deploy using Vespa Cloud
  • Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
  • Architecture: x86_64 or arm64
  • Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
  • Java 17 installed.
  • Apache Maven This sample app uses custom Java components and Maven is used to build the application.

Validate environment, must be minimum 4GB:

$ docker info | grep "Total Memory"
or
$ podman info | grep "memTotal"

Install Vespa CLI:

$ brew install vespa-cli

For local deployment using docker image:

$ vespa config set target local

Pull and start the vespa docker container image:

$ docker pull vespaengine/vespa
$ docker run --detach --name vespa --hostname vespa-container \
  --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \
  vespaengine/vespa

Download this sample application:

$ vespa clone incremental-search/search-as-you-type myapp && cd myapp

Build the application package:

$ mvn clean package -U

Download feed file:

$ curl -L -o search-as-you-type-index.jsonl \
  https://data.vespa-cloud.com/sample-apps-data/search-as-you-type-index.jsonl

Verify that configuration service (deploy api) is ready:

$ vespa status deploy --wait 300

Deploy the application:

$ vespa deploy --wait 300

Deployment note

It is possible to deploy this app to Vespa Cloud.

Wait for the application endpoint to become available:

$ vespa status --wait 300

Running Vespa System Tests which runs a set of basic tests to verify that the application is working as expected:

$ vespa test src/test/application/tests/system-test/search-as-you-type-test.json

Feed documents:

$ while read -r line; do echo $line > tmp.json; vespa document tmp.json; done < search-as-you-type-index.jsonl
$ vespa query \
 'yql=select * from doc where ([{"defaultIndex":"grams"}]userInput(@query))'\
 'hits=10' \
 'query=xgb'

Check out the website - open http://localhost:8080/site/ in a browser:

$ curl -s http://localhost:8080/site/

Shutdown and remove the Docker container:

$ docker rm -f vespa

Details

N-grams

Substring searches are slow when working on large amounts of data. However, an N-gram search can be used as a faster but less precise substring-like search. The fields title and content are re-indexed to create the fields gram_title and gram_content with an N-gram index. In this example the gram size is set to 3, but any value can be used. A lower gram size will get more hits, but may also find more irrelevant hits.

Weighted combination of searches

If we can get a hit on a whole word, this is most likely a more relevant hit than a hit on only a part of a word. Therefore, we search through both the default fieldset and the grams fieldset, and we weight hits on the default fieldset higher than other hits. These weights can be seen in the weighted_doc_rank rank profile.

Highlighting

Text highlights are generated by including summary: dynamic in a field. As searches on default and grams match with different parts of the text, the highlights of these matches will also be different. The line contentScore*highlightWeight >= gramContentScore in src/main/resources/site/js/main.js decides which of these highlights should be shown on the website. The variable highlightWeight can be tweaked to prioritize default highlighting or grams highlighting.