Skip to content

Commit

Permalink
Merge pull request #3445 from vespa-engine/bratseth/pack-and_binarize
Browse files Browse the repository at this point in the history
Document pack_bits and binarize
  • Loading branch information
kkraune authored Oct 30, 2024
2 parents 8ec224a + 9c2dd13 commit 18d19a9
Showing 1 changed file with 62 additions and 21 deletions.
83 changes: 62 additions & 21 deletions en/reference/indexing-language-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -200,12 +200,9 @@ <h3 id="arithmetics">Arithmetics</h3>
</p>



<h3 id="converters">Converters</h3>

<p>There are several expressions that allow you to convert from one data type to another.
These are often used within a <code>for_each</code> to convert
e.g. an array of strings to an array of integers.</p>
<p>These expressions lets you convert from one data type to another.</p>

<table class="table">
<thead>
Expand All @@ -216,21 +213,65 @@ <h3 id="converters">Converters</h3>
<th>Description</th>
</tr>
</thead><tbody>
<tr>
<td><code>embed</code></td>
<td>String</td>
<td>A tensor of the type of the receiving field</td>
<td><p id="embed">Invokes an <a href="../embedding.html">embedder</a> to convert a text to a point in a tensor space.
Arguments are given space separated, as in <code>embed colbert chunk</code>.
The first argument is the id of the embedder, and can be omitted when only one is configured.
Any additional arguments are passed to the embedder implementation.</p></td>
</tr>
<tr>
<td><code>hash</code></td>
<td>String</td>
<td>Any string</td>
<td><p id="hash">Converts the input to a hash value (using SipHash).
<tr>
<td><code>binarize [threshold]</code></td>
<td>Any tensor</td>
<td>Any tensor</td>
<td>
<p id="binarize">
Replaces all values in a tensor by 0 or 1.
This takes an optional argument specifying the threshold a value needs to be larger than to be
replaced by 1 instead of 0. The default threshold is 0.
This is useful to create a suitable input to <a href="#pack_bits">pack_bits</a>.
</p>
</td>
</tr>
<tr>
<td><code>embed [id]</code></td>
<td>String</td>
<td>A tensor</td>
<td><p id="embed">Invokes an <a href="../embedding.html">embedder</a> to convert a text to one or more vector embeddings.
The type of the output tensor is what is required by the following expression (as supported by the specific embedder).
Arguments are given space separated, as in <code>embed colbert chunk</code>.
The first argument is the id of the embedder, and can be omitted when only one is configured.
Any additional arguments are passed to the embedder implementation.</p></td>
</tr>
<tr>
<td><code>hash</code></td>
<td>String</td>
<td>int or long</td>
<td><p id="hash">Converts the input to a hash value (using SipHash).
The hash will be int or long depending on the target field.</p></td>
</tr>
<tr>
<td><code>pack_bits</code></td>
<td>A tensor</td>
<td>A tensor</td>
<td>
<p id="pack_bits">
Packs the values of a binary tensor into bytes with 1 bit per value in big-endian order.
</p>
<p>
The input tensor must:
<ul>
<li>Only have values that are 0 or 1</li>
<li>Have a single dense dimension</li>
</ul>
It can have any value type and any number of sparse dimensions.
</p>
<p>
The output tensor will have:
<ul>
<li><code>int8</code> as the value type.</li>
<li>The dense dimension size divided by 8 (rounded upwards to integer).</li>
<li>The same sparse dimensions as before.</li>
</ul>
The resulting tensor can be unpacked during ranking using
<a href="ranking-expressions.html#unpack-bits">unpack_bits</a>.
A tensor can be converted to binary form suitable as input to this by the
<a href="#binarize">binarize function</a>.
</p>
</td>
</tr>
<tr>
<td><code>to_array</code></td>
Expand Down Expand Up @@ -540,9 +581,9 @@ <h3 id="other-expressions">Other expressions</h3>
<td><code>random [ &lt;max&gt; ]</code></td>
<td>
<p id="random">
Returns a random integer value.
Lowest value is 0 and the highest value is determined either by the argument or,
if no argument is given, the execution value.
Returns a random integer value.
Lowest value is 0 and the highest value is determined either by the argument or,
if no argument is given, the execution value.
</p>
</td>
</tr>
Expand Down

0 comments on commit 18d19a9

Please sign in to comment.