Skip to content

Commit

Permalink
Adds significance models documentation (#3231)
Browse files Browse the repository at this point in the history
  • Loading branch information
MariusArhaug authored Oct 18, 2024
1 parent 725da6e commit 1e07543
Show file tree
Hide file tree
Showing 8 changed files with 361 additions and 5 deletions.
4 changes: 3 additions & 1 deletion _data/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,12 @@ docs:
url: /en/xgboost.html
- page: Ranking With LightGBM Models
url: /en/lightgbm.html
- page: Stateless model evaluation
- page: Stateless Model Evaluation
url: /en/stateless-model-evaluation.html
- page: Ranking With BM25
url: /en/reference/bm25.html
- page: Significance Model
url: /en/significance.html
- page: Ranking With nativeRank
url: /en/nativerank.html
- page: Accelerated OR search using the WAND algorithm
Expand Down
71 changes: 71 additions & 0 deletions en/operations-selfhosted/vespa-cmdline-tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@
use-cases we recommend the <a href="../vespa-cli.html">Vespa CLI</a> which
should work against most Vespa applications regardless of how they are deployed.' %}

<p>
You can run these tools in <a href="https://hub.docker.com/r/vespaengine/vespa/tags">Vespa Docker image</a>:
<p>
<pre>
docker run --entrypoint bash vespaengine/vespa ./opt/vespa/bin/[tool] [args]
</pre>
</p>


<!--h2 id="vespa-config-ctl">vespa-config-ctl</h2-->
<!--h2 id="vespa-config-loadtester">vespa-config-loadtester</h2-->
Expand Down Expand Up @@ -1908,6 +1916,69 @@ <h2 id="vespa-set-node-state">vespa-set-node-state</h2>

<!--h2 id="vespa-slobrok-cmd">vespa-slobrok-cmd</h2-->

<h2 id="vespa-significance">vespa-significance</h2>
<p>
Generates a <a href="../significance.html#significance-model-file">significance model file</a> from Vespa documents.
Available in Vespa as of version 8.426.8.
</p>
<p>
The generated model uses the same tokenizer as the default query processor, see <a href="../linguistics.html">linguistics in Vespa</a> for details.
When using a custom tokenizer, the model generator needs to be modified accordingly.
Tokens are converted to lower-case without stemming.
This corresponds to how the model is applied to query terms.
</p>
<p>Synopsis: <code>vespa-significance generate [options]</code></p>
<p>Example:</p>
<pre>
$ vespa-significance generate --in vespa-dump.jsonl --out en_model.json --field text --language en
</pre>
<p>When running in Docker, it is useful to mount a folder with vespa-feed documents and to store the model file, e.g.:</p>
<pre>
$ podman run -it --entrypoint bash -v $PWD/data:/data -w /data vespaengine/vespa:latest /opt/vespa/bin/vespa-significance generate --in docs.jsonl --out model.zst --field text --language en
</pre>
<table class="table">
<thead>
<tr>
<th>Option</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>-h, --help</th>
<td>Help text</td>
</tr>
<tr>
<th>-i, --in &lt;input file&gt;</th>
<td>
JSON Lines (JSONL) file where each line is a <a href="../reference/document-json-format.html">Vespa document in JSON format</a>.
</td>
</tr><tr>
<th>-o, --out &lt;output file&gt;</th>
<td>
<a href="../significance.html#significance-model-file">Significance model file</a> in JSON format.
</td>
</tr><tr>
<th> -f, --field &lt;field&gt;</th>
<td>
Name of the text field to use for significance model.
</td>
</tr><tr>
<th> -l, --language &lt;language&gt;</th>
<td>
Language of the text field specified as a code, e.g. <code>en</code> for English.</br>
It is used by OpenNLP tokenizer; see supported languages with codes <a href="../linguistics.html#default-languages">here</a>.
</td>
</tr>
<tr>
<th> --zst &lt;compression&gt;</th>
<td>
If set to <code>true</code> compresses the output file with <a href="https://facebook.github.io/zstd/">zstandard</a>.
Default <code>false</code>.
</td>
</tr>
</tbody>
</table>


<h2 id="vespa-start-configserver">vespa-start-configserver</h2>
Expand Down
13 changes: 13 additions & 0 deletions en/reference/query-api-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ <h2 id="parameters">Parameters</h2>
<li><a href="#ranking.globalphase.rerankcount">ranking.globalPhase.rerankCount</a></li>
<li><a href="#ranking.matching">ranking.matching</a></li>
<li><a href="#ranking.matchPhase">ranking.matchPhase</a></li>
<li><a href="#ranking.significance.useModel">ranking.significance.useModel</a></li>
</ul>
</dd>

Expand Down Expand Up @@ -697,6 +698,18 @@ <h2 id="ranking">Ranking</h2>
</p>
</td>
</tr>
<tr>
<th>ranking.significance.useModel</th>
<td></td>
<td>Boolean</td>
<td>false</td>
<td>
<p id="ranking.significance.useModel">
Enables or disables the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.
Overrides <a href="schema-reference.html#significance">use-model</a> set in the rank profile.
</p>
</td>
</tr>
<tr>
<th>ranking.freshness</th>
<td></td>
Expand Down
36 changes: 36 additions & 0 deletions en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ <h2 id="elements">Elements</h2>
<a href="#inputs">inputs</a>
<a href="#constants">constants</a>
<a href="#onnx-model">onnx-model</a>
<a href="#significance">significance</a>
<a href="#rank-properties">rank-properties</a>
<a href="#match-features">match-features</a>
<a href="#mutate">mutate</a>
Expand Down Expand Up @@ -1493,6 +1494,10 @@ <h2 id="rank-profile">rank-profile</h2>
<td>Zero or many</td>
<td>An onnx model to make available in this profile.</td>
</tr>
<tr><td><a href="#significance">significance</a></td>
<td>Zero or one</td>
<td>To enable the use of significance models defined in the service.xml config.</td>
</tr>
<tr><td><a href="#rank-properties">rank-properties</a></td>
<td>Zero or one</td>
<td>List of any rank property key-values to be used by rank features.</td>
Expand Down Expand Up @@ -2484,6 +2489,37 @@ <h2 id="onnx-model">onnx-model</h2>
</table>
<p>For more details including examples, see <a href="../onnx.html">ranking with ONNX models.</a></p>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#rank-profile">rank-profile</a>.
Configures a <a href="../significance.html">significance model</a>.
<pre>
significance {
use-model: true
}
</pre>
</p>

<p>
The body must contain:
<table class="table">
<thead>
<tr><th>name</th>
<th>occurrence</th>
<th>description</th></tr>
</thead>
<tbody>
<tr>
<td>use-model</td>
<td>One</td>
<td>Enable or disable the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.</td>
</tr>
</tbody>
</table>
</p>
<p>
For more details see <a href="../significance.html">Significance Model.</a>
</p>


<h2 id="document-summary">document-summary</h2>
Expand Down
1 change: 1 addition & 0 deletions en/reference/services-container.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
<a href="services-processing.html#chain">chain</a>
<a href="services-search.html#renderer">renderer</a>
<a href="services-search.html#threadpool">threadpool</a>
<a href="services-search.html#significance">significance</a>
<a href="services-docproc.html">document-processing</a>
<a href="#include">include [dir]</a>
<a href="services-docproc.html#documentprocessor">documentprocessor</a>
Expand Down
58 changes: 57 additions & 1 deletion en/reference/services-search.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
<a href="#source">source [id]</a>
<a href="#searcher">searcher [id, class, bundle, provides, before, after]</a>
<a href="#renderer">renderer [id, class, bundle]</a>
<a href="#significance">significance</a>
<a href="#threadpool">threadpool</a>
<a href="#threadpool-threads">threads [ boost ]</a>
<a href="#threadpool-queue">queue</a>
Expand Down Expand Up @@ -328,7 +329,62 @@ <h2 id="renderer">renderer</h2>
bundle="the name in &lt;artifactId&gt; in pom.xml" /&gt;
</pre>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#searcher">searcher</a>.
Specifies one or more global significance <a href="#model">models</a>.
</p>

<pre data-test="file" data-path="my-app/src/main/application/services.xml">
&lt;significance&gt;
&lt;model model-id="significance-en-wikipedia-v1"/&gt;
&lt;model url="https://some/uri/my-model.model.multilingual.json"/&gt;
&lt;model path="models/my-model.no.json.zst"/&gt;
&lt;/significance&gt;
</pre>

<p>
The models are either provided by <em>Vespa</em> or generated with <a href="../operations-selfhosted/vespa-cmdline-tools.html#vespa-significance">vespa-signficance tool</a>.
The order determines model precedence - with the last element having the highest priority.
To use these models, schema needs to <a href="schema-reference.html#significance">enable significance models in the rank-profile</a>.
</p>

<p>
Sub-elements:
<ul>
<li><a href="#model">model</a> (required, one or more)</li>
</ul>
</p>

<h2 id="model">model</h2>
<p>
Contained in <a href="#significance">significance</a>.
Specifies <a href="../significance.html#global-significance-model">global significance model</a>.
Models are identified by <code>model-id</code> or by providing <code>url</code> or <code>path</code> to a model file in the application package.
</p>
<p>
Models with <code>model-id</code> are provided by <em>Vespa</em> and listed <a href="https://cloud.vespa.ai/en/model-hub#significance-models">here</a>.
Example with <code>model-id</code>:
<pre>
&lt;model model-id="significance-en-wikipedia-v1"/&gt;
</pre>
</p>

<p>
A model specified with <code>url</code> and <code>path</code> are JSON files, which can be also compressed with <a href="https://facebook.github.io/zstd/">zstandard</a>.
Model files can be generated using <a href="../operations-selfhosted/vespa-cmdline-tools.html#vespa-significance">vespa-signficance tool</a>.
Example with <code>url</code>:
<pre>
&lt;model url="https://some/uri/mymodel.multilingual.json"/&gt;
</pre>

Models with <code>path</code> should be placed in the application package.
The path is relative to the application package root.
Example with <code>path</code>:
<pre>
&lt;model path="models/mymodel.no.json.zst"/&gt;
</pre>
</p>

<h2 id="chain">chain</h2>
<p>
Expand All @@ -339,7 +395,7 @@ <h2 id="chain">chain</h2>
Note that <a href="#provider">provider</a> and <a href="#source">source</a> elements are also chains.
Specify a search chain in a query using <a href="query-api-reference.html#searchchain">searchChain</a>.
</p>
<p>Example which inherits from the built in <em>vespa</em> chain so that
<p>Example which inherits from the built in <em>vespa</em> chain so that
the searcher can dispatch queries to the content clusters:</p>
<pre>
&lt;chain id="common" inherits="vespa"&gt;
Expand Down
Loading

0 comments on commit 1e07543

Please sign in to comment.