Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add significance documentation #3231

Merged
merged 18 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
PHONY: serve
glebashnik marked this conversation as resolved.
Show resolved Hide resolved

serve:
bundle exec jekyll serve
4 changes: 3 additions & 1 deletion _data/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,12 @@ docs:
url: /en/xgboost.html
- page: Ranking With LightGBM Models
url: /en/lightgbm.html
- page: Stateless model evaluation
- page: Stateless Model Evaluation
url: /en/stateless-model-evaluation.html
- page: Ranking With BM25
url: /en/reference/bm25.html
- page: Significance Model
url: /en/significance.html
- page: Ranking With nativeRank
url: /en/nativerank.html
- page: Accelerated OR search using the WAND algorithm
Expand Down
58 changes: 58 additions & 0 deletions en/operations-selfhosted/vespa-cmdline-tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@
use-cases we recommend the <a href="../vespa-cli.html">Vespa CLI</a> which
should work against most Vespa applications regardless of how they are deployed.' %}

<p>
You can run these tools in <a href="https://hub.docker.com/r/vespaengine/vespa/tags">Vespa Docker image</a>:
<p>
<pre>
docker run --entrypoint bash vespaengine/vespa ./opt/vespa/bin/[tool] [args]
</pre>
</p>


<!--h2 id="vespa-config-ctl">vespa-config-ctl</h2-->
<!--h2 id="vespa-config-loadtester">vespa-config-loadtester</h2-->
Expand Down Expand Up @@ -1908,6 +1916,56 @@ <h2 id="vespa-set-node-state">vespa-set-node-state</h2>

<!--h2 id="vespa-slobrok-cmd">vespa-slobrok-cmd</h2-->

<h2 id="vespa-significance">vespa-significance</h2>
<p>
Generates a <a href="../significance.html#significance-model-file">significance model file</a>.
</p>
<p>Synopsis: <code>vespa-significance generate [options]</code></p>
<p>Example:</p>
<pre>
$ vespa-significance generate --in vespa-dump.jsonl --out en_model.json --field text --language en
</pre>
<p>When running in Docker, it is useful to mount a folder with vespa-feed documents and to store the model file, e.g.:</p>
<pre>
$ podman run -it --entrypoint bash -v $PWD/data:/data -w /data vespaengine/vespa:latest /opt/vespa/bin/vespa-significance generate --in docs.jsonl --out model.zst --field text --language en
</pre>
<table class="table">
<thead>
<tr>
<th>Option</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<th>-h, --help</th>
<td>Help text</td>
</tr>
<tr>
<th>-i, --in &lt;input file&gt;</th>
<td>
JSON Lines (JSONL) file where each line is a <a href="../reference/document-json-format.html">Vespa document in JSON format</a>.
</td>
</tr><tr>
<th>-o, --out &lt;output file&gt;</th>
<td>
<a href="../significance.html#significance-model-file">Significance model file</a> in JSON format.
</td>
</tr><tr>
<th> -f, --field &lt;field&gt;</th>
<td>
Name of the text field to use for significance model.
</td>
</tr><tr>
<th> -l, --language &lt;language&gt;</th>
<td>
<p>
Language of the text field specified as a code, e.g. <code>en</code> for English.</br>
It is used by OpenNLP tokenizer; see supported languages with codes <a href="../linguistics.html#default-languages">here</a>.
</td>
</tr>
</tbody>
</table>


<h2 id="vespa-start-configserver">vespa-start-configserver</h2>
Expand Down
13 changes: 13 additions & 0 deletions en/reference/query-api-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ <h2 id="parameters">Parameters</h2>
<li><a href="#ranking.globalphase.rerankcount">ranking.globalPhase.rerankCount</a></li>
<li><a href="#ranking.matching">ranking.matching</a></li>
<li><a href="#ranking.matchPhase">ranking.matchPhase</a></li>
<li><a href="#ranking.significance.useModel">ranking.significance.useModel</a></li>
</ul>
</dd>

Expand Down Expand Up @@ -697,6 +698,18 @@ <h2 id="ranking">Ranking</h2>
</p>
</td>
</tr>
<tr>
<th>ranking.significance.useModel</th>
<td></td>
<td>Boolean</td>
<td>false</td>
<td>
<p id="ranking.significance.useModel">
Enables or disables the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.
Overrides <a href="schema-reference.html#significance">use-model</a> set in the rank profile.
</p>
</td>
</tr>
<tr>
<th>ranking.freshness</th>
<td></td>
Expand Down
36 changes: 36 additions & 0 deletions en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ <h2 id="elements">Elements</h2>
<a href="#inputs">inputs</a>
<a href="#constants">constants</a>
<a href="#onnx-model">onnx-model</a>
<a href="#significance">significance</a>
<a href="#rank-properties">rank-properties</a>
<a href="#match-features">match-features</a>
<a href="#mutate">mutate</a>
Expand Down Expand Up @@ -1490,6 +1491,10 @@ <h2 id="rank-profile">rank-profile</h2>
<td>Zero or many</td>
<td>An onnx model to make available in this profile.</td>
</tr>
<tr><td><a href="#significance">significance</a></td>
<td>Zero or one</td>
<td>To enable the use of significance models defined in the service.xml config.</td>
</tr>
<tr><td><a href="#rank-properties">rank-properties</a></td>
<td>Zero or one</td>
<td>List of any rank property key-values to be used by rank features.</td>
Expand Down Expand Up @@ -2481,6 +2486,37 @@ <h2 id="onnx-model">onnx-model</h2>
</table>
<p>For more details including examples, see <a href="../onnx.html">ranking with ONNX models.</a></p>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#rank-profile">rank-profile</a>.
Configures a <a href="../significance.html">significance model</a>.
<pre>
significance {
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
use-model: true
}
</pre>
</p>

<p>
The body must contain:
<table class="table">
<thead>
<tr><th>name</th>
<th>occurrence</th>
<th>description</th></tr>
</thead>
<tbody>
<tr>
<td>use-model</td>
<td>One</td>
<td>Enable or disable the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.</td>
</tr>
</tbody>
</table>
</p>
<p>
For more details see <a href="../significance.html">Significance Model.</a>
</p>


<h2 id="document-summary">document-summary</h2>
Expand Down
1 change: 1 addition & 0 deletions en/reference/services-container.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
<a href="services-processing.html#chain">chain</a>
<a href="services-search.html#renderer">renderer</a>
<a href="services-search.html#threadpool">threadpool</a>
<a href="services-search.html#significance">siginficance</a>
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
<a href="services-docproc.html">document-processing</a>
<a href="#include">include [dir]</a>
<a href="services-docproc.html#documentprocessor">documentprocessor</a>
Expand Down
59 changes: 58 additions & 1 deletion en/reference/services-search.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
<a href="#source">source [id]</a>
<a href="#searcher">searcher [id, class, bundle, provides, before, after]</a>
<a href="#renderer">renderer [id, class, bundle]</a>
<a href="#significance">significance</a>
<a href="#threadpool">threadpool</a>
<a href="#threadpool-threads">threads [ boost ]</a>
<a href="#threadpool-queue">queue</a>
Expand Down Expand Up @@ -328,7 +329,63 @@ <h2 id="renderer">renderer</h2>
bundle="the name in &lt;artifactId&gt; in pom.xml" /&gt;
</pre>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#searcher">searcher</a>.
Specifies one or more global significance <a href="#model">model</a>.
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
</p>

<pre data-test="file" data-path="my-app/src/main/application/services.xml">
&lt;significance&gt;
&lt;model model-id="wikimedia"/&gt;
&lt;model url="https://some/uri/my-model.model.multilingual.json"/&gt;
&lt;model path="models/my-model.no.json.zst"/&gt;
&lt;/significance&gt;
</pre>

<p>
The models are either provided by <em>Vespa</em> or generated with <a href="vespa-cmdline-tools.html#vespa-significance">vespa-signficance</a> CLI.
The order determines model precedence - with the last element having the highest priority.
To use these models, schema needs to <a href="schema-reference.html#significance">enable significance models in the rank-profile</a>.
</p>

<p>
Sub-elements:
<ul>
<li><a href="#model">model</a> (required, one or more)</li>
</ul>
</p>

<h2 id="model">model</h2>
<p>
Contained in <a href="#siginficance">significance</a>.
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
Specifies <a href="../significance.html#global-significance-model">global significance model</a>.
Models are identified by <code>model-id</code> or by providing <code>url</code> or <code>path</code> to a model file in the application package.
</p>
<p>
Models with <code>model-id</code> are provided by <em>Vespa</em>.
So far the only model available is <em>wikimedia</em>, which is generated from English Wikipedia.
glebashnik marked this conversation as resolved.
Show resolved Hide resolved
Example with <code>model-id</code>:
<pre>
&lt;model model-id="wikimedia"/&gt;
</pre>
</p>

<p>
A model specified with <code>url</code> and <code>path</code> are JSON files, which can be also <a href="https://facebook.github.io/zstd/">zstandard</a> compressed.
Model files can be generated using <a href="vespa-cmdline-tools.html#vespa-significance">vespa-signficance</a> tool.
Example with <code>url</code>:
<pre>
&lt;model url="https://some/uri/mymodel.multilingual.json"/&gt;
</pre>

Models with <code>path</code> should be placed in the application package.
The path is relative to the application package root.
Example with <code>path</code>:
<pre>
&lt;model path="models/mymodel.no.json.zst"/&gt;
</pre>
</p>

<h2 id="chain">chain</h2>
<p>
Expand All @@ -339,7 +396,7 @@ <h2 id="chain">chain</h2>
Note that <a href="#provider">provider</a> and <a href="#source">source</a> elements are also chains.
Specify a search chain in a query using <a href="query-api-reference.html#searchchain">searchChain</a>.
</p>
<p>Example which inherits from the built in <em>vespa</em> chain so that
<p>Example which inherits from the built in <em>vespa</em> chain so that
the searcher can dispatch queries to the content clusters:</p>
<pre>
&lt;chain id="common" inherits="vespa"&gt;
Expand Down
Loading
Loading