Skip to content

Commit

Permalink
Revised documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
glebashnik committed Oct 4, 2024
1 parent 588e36a commit 130f6d7
Show file tree
Hide file tree
Showing 8 changed files with 256 additions and 161 deletions.
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
PHONY: serve

serve:
bundle exec jekyll serve
4 changes: 2 additions & 2 deletions _data/sidebar.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,11 @@ docs:
url: /en/xgboost.html
- page: Ranking With LightGBM Models
url: /en/lightgbm.html
- page: Stateless model evaluation
- page: Stateless Model Evaluation
url: /en/stateless-model-evaluation.html
- page: Ranking With BM25
url: /en/reference/bm25.html
- page: Using Significance Model
- page: Significance Model
url: /en/significance.html
- page: Ranking With nativeRank
url: /en/nativerank.html
Expand Down
37 changes: 24 additions & 13 deletions en/operations-selfhosted/vespa-cmdline-tools.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@
use-cases we recommend the <a href="../vespa-cli.html">Vespa CLI</a> which
should work against most Vespa applications regardless of how they are deployed.' %}

<p>
You can run these tools in <a href="https://hub.docker.com/r/vespaengine/vespa/tags">Vespa Docker image</a>:
<p>
<pre>
docker run --entrypoint bash vespaengine/vespa ./opt/vespa/bin/[tool] [args]
</pre>
</p>


<!--h2 id="vespa-config-ctl">vespa-config-ctl</h2-->
<!--h2 id="vespa-config-loadtester">vespa-config-loadtester</h2-->
Expand Down Expand Up @@ -1909,12 +1917,18 @@ <h2 id="vespa-set-node-state">vespa-set-node-state</h2>
<!--h2 id="vespa-slobrok-cmd">vespa-slobrok-cmd</h2-->

<h2 id="vespa-significance">vespa-significance</h2>
<p>The <em>vespa-signficance</em> cli is a tool that generates a significance model <a href="../reference/significance-reference.html#significance-file-format">file</a>. Its input is a <a href="../reference/document-json-format.html"><em>vespa-feed</em></a> file.
<p>
<em>vespa-signficance</em> generates a <a href="../significance.html#significance-model-file">significance model file</a>
from documents in a <a href="../reference/document-json-format.html">vespa-feed file</a>.
</p>
<p>Synopsis: <code>vespa-significance [options]</code></p>
<p>Example</p>
<p>Synopsis: <code>vespa-significance generate [options]</code></p>
<p>Example:</p>
<pre>
$ vespa-significance --in vespa-dump.jsonl --out en_model.json --field text --language EN --doc-type "en"
$ vespa-significance generate --in vespa-dump.jsonl --out en_model.json --field text --language en --doc-type en
</pre>
<p>When running in Docker, it is useful to mount a folder with vespa-feed documents and to store the model file, e.g.:</p>
<pre>
$ podman run -it --entrypoint bash -v $PWD/data:/data -w /data vespaengine/vespa /opt/vespa/bin/vespa-significance generate -i docs.jsonl -o model.zst -f text -l en -d en
</pre>
<table class="table">
<thead>
Expand All @@ -1931,33 +1945,30 @@ <h2 id="vespa-significance">vespa-significance</h2>
<tr>
<th>-i, --in &lt;input file&gt;</th>
<td>
<a href="../reference/document-json-format.html">Vespa-feed</a> file to be used for generating the significance model
<a href="../reference/document-json-format.html">vespa-feed file</a> with documents in JSON or JSONL format.
</td>
</tr><tr>
<th>-o, --out &lt;output file&gt;</th>
<td>
Output file for the significance model, with <a href="../significance.html#significance-file-format">this</a> JSON file format
<a href="../significance.html#significance-model-file">Significance model file</a> in JSON format.
</td>
</tr><tr>
<th> -f, --field &lt;field&gt;</th>
<td>
Name of the text field to be used for significance model
Name of the text field to use for significance model.
</td>
</tr><tr>
<th> -l, --language &lt;language&gt;</th>
<td>
<p>
Language of the text field, must be a valid language code from the <a href="https://www.rfc-editor.org/rfc/rfc5646">RFC5646</a> standard.
<br >
It is used with
OpenNLP's tokenizer to tokenize the text field based on that language's rules.
</p>
Language of the text field specified as a code, e.g. <code>en</code> for English.</br>
It is used by OpenNLP tokenizer; see supported languages with codes <a href="../linguistics.html#default-languages">here</a>.
</td>
</tr><tr>
<th> -d, --doc-type &lt;doc-id&gt;</th>
<td>
<p>Document type identifier for the vespa dump file. <br>
It becomes a part of the id for <a href="../reference/document-json-format.html#put">put</a> operations in the vespa-feed file. <code>&#123; "put": "id::&lt;doc-id&gt;::1" &#125; </code>
It becomes part of the id for <a href="../reference/document-json-format.html#put">put</a> operations in the vespa-feed file. <code>&#123; "put": "id::&lt;doc-id&gt;::1" &#125; </code>
</p>
</td>
</tr>
Expand Down
5 changes: 3 additions & 2 deletions en/reference/query-api-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -704,8 +704,9 @@ <h2 id="ranking">Ranking</h2>
<td>Boolean</td>
<td>false</td>
<td>
<p id="ranking.sorting">
Override query to enable or disable to use user provided <a href="services-search.html#significance">significance model</a>.
<p id="ranking.significance.useModel">
Enables or disables the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.
Overrides <a href="schema-reference.html#significance">use-model</a> set in the rank profile.
</p>
</td>
</tr>
Expand Down
30 changes: 17 additions & 13 deletions en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -2488,7 +2488,8 @@ <h2 id="onnx-model">onnx-model</h2>

<h2 id="significance">significance</h2>
<p>
Contained in <a href="#rank-profile">rank-profile</a>. True or false. By default this is false. When enabled Vespa will use the significance calculation based on provided <a href="../significance.html#example">significance models</a> in the service.xml for the rank-profile it is defined in.
Contained in <a href="#rank-profile">rank-profile</a>.
Configures a <a href="../significance.html">significance model</a>.
<pre>
significance {
use-model: true
Expand All @@ -2497,22 +2498,25 @@ <h2 id="significance">significance</h2>
</p>

<p>
The body of an significance field must contain:
<table class="table">
The body must contain:
<table class="table">
<thead>
<tr><th>Name</th><th>Occurrence</th><th>Description</th></tr>
<tr><th>name</th>
<th>occurrence</th>
<th>description</th></tr>
</thead>
<tbody>
<tr>
<td>use-model</td>
<td>One</td>
<td>
Tell the rank-profile to use the significance model defined in the service.xml.
</td>
</tr>
<tr>
<td>use-model</td>
<td>One</td>
<td>Enable or disable the use of significance models specified in <a href="services-search.html#significance">service.xml</a>.</td>
</tr>
</tbody>
</table>
<p>For more details including examples, see <a href="../significance.html">using Significance Model.</a></p>
</table>
</p>
<p>
For more details see <a href="../significance.html">Significance Model.</a>
</p>


<h2 id="document-summary">document-summary</h2>
Expand Down
74 changes: 46 additions & 28 deletions en/reference/services-search.html
Original file line number Diff line number Diff line change
Expand Up @@ -331,43 +331,61 @@ <h2 id="renderer">renderer</h2>

<h2 id="significance">significance</h2>
<p>
To use a specifically generated <a href="../significance.html">significance model</a>, a significance element is added. This element can include multiple models. Their order determines the model precedence for a given language, with the last element having the highest. The models' document frequency is used to set a token's significance. To enable the use of these models, the schema needs to have a rank-profile with the <a href="schema-reference.html#significance"><em>significance</em> element</a>, with its <em>use-model</em> field set to <em>true</em>.
Contained in <a href="#searcher">searcher</a>.
Specifies one or more global significance <a href="#model">model</a>.
</p>

<p>Example with multiple <a href="config-files.html#model">model</a> files. These models are either provided by <em>Vespa</em> or can be generated with the <a href="vespa-cmdline-tools.html#vespa-significance">vespa-signficance</a> cli. </p>
<pre data-test="file" data-path="my-app/src/main/application/services.xml">
&lt;significance&gt;
&lt;significance&gt;
&lt;model model-id="wikimedia"/&gt;
&lt;model url="https://some/uri/bibel-multilingual.json" /&gt;
&lt;model path="models/reddit-norge.no.json.zst" /&gt;
&lt;model url="https://some/uri/my-model.model.multilingual.json"/&gt;
&lt;model path="models/my-model.no.json.zst"/&gt;
&lt;/significance&gt;
</pre>

<p>
The models are either provided by <em>Vespa</em> or generated with <a href="vespa-cmdline-tools.html#vespa-significance">vespa-signficance</a> CLI.
The order determines model precedence - with the last element having the highest priority.
To use these models, schema needs to <a href="schema-reference.html#significance">enable significance models in the rank-profile</a>.
</p>

<p>
Sub-elements:
<ul>
<li><a href="#model">model</a> (required, one or more)</li>
</ul>
</p>

<h3 id="significance-reference-config">significance reference config</h3>
<table class="table">
<thead>
<tr>
<th>Name</th>
<th>Occurrence</th>
<th>Description</th>
<th>Type</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr>
<td>model</td>
<td>One To Many</td>
<td>Use to point to the significance model file</td>
<td><a href="#model-config-reference">model-type</a></td>
<td>N/A</td>
</tr>

</tbody>
</table>
<h2 id="model">model</h2>
<p>
Contained in <a href="#siginficance">significance</a>.
Specifies <a href="../significance.html#global-significance-model">global significance model</a>.
Models are identified by <code>model-id</code> or by providing <code>url</code> or <code>path</code> to a model file in the application package.
</p>
<p>
Models with <code>model-id</code> are provided by <em>Vespa</em>.
So far the only model available is <em>wikimedia</em>, which is generated from English Wikipedia.
Example with <code>model-id</code>:
<pre>
&lt;model model-id="wikimedia"/&gt;
</pre>
</p>

<p>
A model specified with <code>url</code> and <code>path</code> are JSON files, which can be also <a href="https://facebook.github.io/zstd/">zstandard</a> compressed.
Model files can be generated using <a href="vespa-cmdline-tools.html#vespa-significance">vespa-signficance</a> tool.
Example with <code>url</code>:
<pre>
&lt;model url="https://some/uri/mymodel.multilingual.json"/&gt;
</pre>

Models with <code>path</code> should be placed in the application package.
The path is relative to the application package root.
Example with <code>path</code>:
<pre>
&lt;model path="models/mymodel.no.json.zst"/&gt;
</pre>
</p>

<h2 id="chain">chain</h2>
<p>
Expand All @@ -378,7 +396,7 @@ <h2 id="chain">chain</h2>
Note that <a href="#provider">provider</a> and <a href="#source">source</a> elements are also chains.
Specify a search chain in a query using <a href="query-api-reference.html#searchchain">searchChain</a>.
</p>
<p>Example which inherits from the built in <em>vespa</em> chain so that
<p>Example which inherits from the built in <em>vespa</em> chain so that
the searcher can dispatch queries to the content clusters:</p>
<pre>
&lt;chain id="common" inherits="vespa"&gt;
Expand Down
103 changes: 0 additions & 103 deletions en/significance.html

This file was deleted.

Loading

0 comments on commit 130f6d7

Please sign in to comment.