Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add significance documentation #3231

Merged
merged 18 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion _data/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,12 @@ docs:
url: /en/xgboost.html
- page: Ranking With LightGBM Models
url: /en/lightgbm.html
- page: Stateless model evaluation
- page: Stateless Model Evaluation
url: /en/stateless-model-evaluation.html
- page: Ranking With BM25
url: /en/reference/bm25.html
- page: Significance Model
url: /en/significance.html
- page: Ranking With nativeRank
url: /en/nativerank.html
- page: Accelerated OR search using the WAND algorithm
Expand Down
71 changes: 71 additions & 0 deletions en/operations-selfhosted/vespa-cmdline-tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@
use-cases we recommend the <a href="../vespa-cli.html">Vespa CLI</a> which
should work against most Vespa applications regardless of how they are deployed.' %}

<p>
You can run these tools in <a href="https://hub.docker.com/r/vespaengine/vespa/tags">Vespa Docker image</a>:
<p>
<pre>
docker run --entrypoint bash vespaengine/vespa ./opt/vespa/bin/[tool] [args]
</pre>
</p>


<!--h2 id="vespa-config-ctl">vespa-config-ctl</h2-->
<!--h2 id="vespa-config-loadtester">vespa-config-loadtester</h2-->
Expand Down Expand Up @@ -1908,6 +1916,69 @@ <h2 id="vespa-set-node-state">vespa-set-node-state</h2>

<!--h2 id="vespa-slobrok-cmd">vespa-slobrok-cmd</h2-->

<h2 id="vespa-significance">vespa-significance</h2>
<p>
Generates a <a href="../significance.html#significance-model-file">significance model file</a> from Vespa documents.
Available in Vespa as of version 8.426.8.
</p>
<p>
The generated model uses the same tokenizer as the default query processor, see <a href="../linguistics.html">linguistics in Vespa</a> for details.
When using a custom tokenizer, the model generator needs to be modified accordingly.
Tokens are converted to lower-case without stemming.
This corresponds to how the model is applied to query terms.
</p>
<p>Synopsis: <code>vespa-significance generate [options]</code></p>
<p>Example:</p>
<pre>
$ vespa-significance generate --in vespa-dump.jsonl --out en_model.json --field text --language en
</pre>
<p>When running in Docker, it is useful to mount a folder with vespa-feed documents and to store the model file, e.g.:</p>
<pre>
$ podman run -it --entrypoint bash -v $PWD/data:/data -w /data vespaengine/vespa:latest /opt/vespa/bin/vespa-significance generate --in docs.jsonl --out model.zst --field text --language en
</pre>
<table class="table">
<thead>
<tr>
<th>Option</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>-h, --help</th>
<td>Help text</td>
</tr>
<tr>
<th>-i, --in &lt;input file&gt;</th>
<td>
JSON Lines (JSONL) file where each line is a <a href="../reference/document-json-format.html">Vespa document in JSON format</a>.
</td>
</tr><tr>
<th>-o, --out &lt;output file&gt;</th>
<td>
<a href="../significance.html#significance-model-file">Significance model file</a> in JSON format.
</td>
</tr><tr>
<th> -f, --field &lt;field&gt;</th>
<td>
Name of the text field to use for significance model.
</td>
</tr><tr>
<th> -l, --language &lt;language&gt;</th>
<td>
Language of the text field specified as a code, e.g. <code>en</code> for English.</br>
It is used by OpenNLP tokenizer; see supported languages with codes <a href="../linguistics.html#default-languages">here</a>.
</td>
</tr>
<tr>
<th> --zst &lt;compression&gt;</th>
<td>
If set to <code>true</code> compresses the output file with <a href="https://facebook.github.io/zstd/">zstandard</a>.
Default <code>false</code>.
</td>
</tr>
</tbody>
</table>


<h2 id="vespa-start-configserver">vespa-start-configserver</h2>
Expand Down
13 changes: 13 additions & 0 deletions en/reference/query-api-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ <h2 id="parameters">Parameters</h2>
<li><a href="#ranking.globalphase.rerankcount">ranking.globalPhase.rerankCount</a></li>
<li><a href="#ranking.matching">ranking.matching</a></li>
<li><a href="#ranking.matchPhase">ranking.matchPhase</a></li>
<li><a href="#ranking.significance.useModel">ranking.significance.useModel</a></li>
</ul>
</dd>

Expand Down Expand Up @@ -697,6 +698,18 @@ <h2 id="ranking">Ranking</h2>
</p>
</td>
</tr>
<tr>
<th>ranking.significance.useModel</th>
<td></td>
<td>Boolean</td>
<td>false</td>
<td>
<p id="ranking.significance.useModel">
Enables or disables the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.
Overrides <a href="schema-reference.html#significance">use-model</a> set in the rank profile.
</p>
</td>
</tr>
<tr>
<th>ranking.freshness</th>
<td></td>
Expand Down
36 changes: 36 additions & 0 deletions en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ <h2 id="elements">Elements</h2>
<a href="#inputs">inputs</a>
<a href="#constants">constants</a>
<a href="#onnx-model">onnx-model</a>
<a href="#significance">significance</a>
<a href="#rank-properties">rank-properties</a>
<a href="#match-features">match-features</a>
<a href="#mutate">mutate</a>
Expand Down Expand Up @@ -1490,6 +1491,10 @@ <h2 id="rank-profile">rank-profile</h2>
<td>Zero or many</td>
<td>An onnx model to make available in this profile.</td>
</tr>
<tr><td><a href="#significance">significance</a></td>
<td>Zero or one</td>
<td>To enable the use of significance models defined in the service.xml config.</td>
</tr>
<tr><td><a href="#rank-properties">rank-properties</a></td>
<td>Zero or one</td>
<td>List of any rank property key-values to be used by rank features.</td>
Expand Down Expand Up @@ -2481,6 +2486,37 @@ <h2 id="onnx-model">onnx-model</h2>
</table>
<p>For more details including examples, see <a href="../onnx.html">ranking with ONNX models.</a></p>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#rank-profile">rank-profile</a>.
Configures a <a href="../significance.html">significance model</a>.
<pre>
significance {
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
use-model: true
}
</pre>
</p>

<p>
The body must contain:
<table class="table">
<thead>
<tr><th>name</th>
<th>occurrence</th>
<th>description</th></tr>
</thead>
<tbody>
<tr>
<td>use-model</td>
<td>One</td>
<td>Enable or disable the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.</td>
</tr>
</tbody>
</table>
</p>
<p>
For more details see <a href="../significance.html">Significance Model.</a>
</p>


<h2 id="document-summary">document-summary</h2>
Expand Down
1 change: 1 addition & 0 deletions en/reference/services-container.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
<a href="services-processing.html#chain">chain</a>
<a href="services-search.html#renderer">renderer</a>
<a href="services-search.html#threadpool">threadpool</a>
<a href="services-search.html#significance">significance</a>
<a href="services-docproc.html">document-processing</a>
<a href="#include">include [dir]</a>
<a href="services-docproc.html#documentprocessor">documentprocessor</a>
Expand Down
58 changes: 57 additions & 1 deletion en/reference/services-search.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
<a href="#source">source [id]</a>
<a href="#searcher">searcher [id, class, bundle, provides, before, after]</a>
<a href="#renderer">renderer [id, class, bundle]</a>
<a href="#significance">significance</a>
<a href="#threadpool">threadpool</a>
<a href="#threadpool-threads">threads [ boost ]</a>
<a href="#threadpool-queue">queue</a>
Expand Down Expand Up @@ -328,7 +329,62 @@ <h2 id="renderer">renderer</h2>
bundle="the name in &lt;artifactId&gt; in pom.xml" /&gt;
</pre>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#searcher">searcher</a>.
Specifies one or more global significance <a href="#model">models</a>.
</p>

<pre data-test="file" data-path="my-app/src/main/application/services.xml">
&lt;significance&gt;
&lt;model model-id="significance-en-wikipedia-v1"/&gt;
&lt;model url="https://some/uri/my-model.model.multilingual.json"/&gt;
&lt;model path="models/my-model.no.json.zst"/&gt;
&lt;/significance&gt;
</pre>

<p>
The models are either provided by <em>Vespa</em> or generated with <a href="../operations-selfhosted/vespa-cmdline-tools.html#vespa-significance">vespa-signficance tool</a>.
The order determines model precedence - with the last element having the highest priority.
To use these models, schema needs to <a href="schema-reference.html#significance">enable significance models in the rank-profile</a>.
</p>

<p>
Sub-elements:
<ul>
<li><a href="#model">model</a> (required, one or more)</li>
</ul>
</p>

<h2 id="model">model</h2>
<p>
Contained in <a href="#significance">significance</a>.
Specifies <a href="../significance.html#global-significance-model">global significance model</a>.
Models are identified by <code>model-id</code> or by providing <code>url</code> or <code>path</code> to a model file in the application package.
</p>
<p>
Models with <code>model-id</code> are provided by <em>Vespa</em> and listed <a href="https://cloud.vespa.ai/en/model-hub#significance-models">here</a>.
Example with <code>model-id</code>:
<pre>
&lt;model model-id="significance-en-wikipedia-v1"/&gt;
</pre>
</p>

<p>
A model specified with <code>url</code> and <code>path</code> are JSON files, which can be also compressed with <a href="https://facebook.github.io/zstd/">zstandard</a>.
Model files can be generated using <a href="../operations-selfhosted/vespa-cmdline-tools.html#vespa-significance">vespa-signficance tool</a>.
Example with <code>url</code>:
<pre>
&lt;model url="https://some/uri/mymodel.multilingual.json"/&gt;
</pre>

Models with <code>path</code> should be placed in the application package.
The path is relative to the application package root.
Example with <code>path</code>:
<pre>
&lt;model path="models/mymodel.no.json.zst"/&gt;
</pre>
</p>

<h2 id="chain">chain</h2>
<p>
Expand All @@ -339,7 +395,7 @@ <h2 id="chain">chain</h2>
Note that <a href="#provider">provider</a> and <a href="#source">source</a> elements are also chains.
Specify a search chain in a query using <a href="query-api-reference.html#searchchain">searchChain</a>.
</p>
<p>Example which inherits from the built in <em>vespa</em> chain so that
<p>Example which inherits from the built in <em>vespa</em> chain so that
the searcher can dispatch queries to the content clusters:</p>
<pre>
&lt;chain id="common" inherits="vespa"&gt;
Expand Down
Loading