Skip to content

Commit

Permalink
Merge pull request #2947 from vespa-engine/kkraune/field-size
Browse files Browse the repository at this point in the history
Add field size guide entry
  • Loading branch information
kkraune authored Oct 20, 2023
2 parents f5c8b8e + f21e8df commit 1373ba2
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 25 deletions.
11 changes: 5 additions & 6 deletions en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,11 @@ of a double. This can happen in two cases:
### Documents

#### What limits apply to json document size?
There is no hard limit.
Vespa requires a document to be able to load into memory in serialized form.
Vespa is not optimized for huge documents.
There is no hard limit, see [field size](/en/schemas.html#field-size).

#### Is there any size limitation in multivalued fields?
No enforced limit, except resource usage (memory).
See [field size](/en/schemas.html#field-size).

#### Can a document have lists (key value pairs)?
E.g. a product is offered in a list of stores with a quantity per store.
Expand All @@ -174,9 +176,6 @@ Workaround would be to use maps to store the wildcard fields.
Map needs to be defined with <code>indexing: attribute</code> and hence will be stored in memory.
Refer to [map](reference/schema-reference.html#map).

#### Is there any size limitation in multivalued fields?
No enforced limit, except resource usage (memory).

#### Can we set a limit for the number of elements that can be stored in an array?
Implement a [document processor](document-processing.html) for this.

Expand Down
5 changes: 3 additions & 2 deletions en/reference/schema-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -3547,8 +3547,9 @@ <h2 id="match">match</h2>
<tr><td>max-length</td>
<td>index</td>
<td><p id="max-length">
Limit the length of the field that will be used for matching. By default, only the first 1M characters
are indexed.
Limit the length of the field that will be used for matching.
By default, only the first 1M characters are indexed.
<a href="/en/schemas.html#field-size">Example</a>.
</p></td>
</tr>

Expand Down
61 changes: 44 additions & 17 deletions en/schemas.html
Original file line number Diff line number Diff line change
Expand Up @@ -75,33 +75,26 @@



<h2 id="schema-concepts">Schema concepts</h2>

<p>An overview of the most important schema concepts, see the
<a href="reference/schema-reference.html">schema reference</a> for a complete list.</p>


<h3 id="document">document</h3>

<p>A <a href="documents.html">document</a> is the unit the rank-profile evaluates, and is returned in query results.
Documents have fields - reads and writes updates full documents or some fields of documents.
Refer to the <a href="reference/schema-reference.html#document">schema reference</a>.</p>

<p>Documents can have relations, field values can be imported from
<a href="parent-child.html">parent documents</a>.</p>

<h2 id="document">document</h2>
<p>
A <a href="documents.html">document</a> is the unit the rank-profile evaluates, and is returned in query results.
Documents have fields - reads and writes updates full documents or some fields of documents.
Refer to the <a href="reference/schema-reference.html#document">schema reference</a>.
</p>
<p>Documents can have relations, field values can be imported from <a href="parent-child.html">parent documents</a>.</p>
<p>Note that the document id is not a field of the document - add this explicitly if needed.</p>


<h3 id="field">field</h3>

<h2 id="field">field</h2>
<p>A field has a type, see <a href="reference/schema-reference.html#field">field reference</a> for a full list.</p>

<p>A field contained in a document can be written to, read from and queried - this is the normal field use.
A field can also be generated (i.e. a <em>synthetic field</em>) -
in this case, the field definition is <em>outside</em> the document.
See <a href="operations/reindexing.html">reindexing</a> for examples.</p>


<h3 id="multivalue-field">Multivalue field</h3>
<p>
A field can be <em>single value</em>, like a string, or <em>multivalue</em>, like an array of strings -
see the <a href="reference/schema-reference.html#field">field type list</a>.
Expand All @@ -117,6 +110,40 @@ <h3 id="field">field</h3>
</p>


<h3 id="field-size">Field size</h3>
<p>
There is no general setting for max field size in terms of size in bytes.
Example of fields with potentially large value includes
<a href="/en/reference/schema-reference.html#string">string</a>
and <a href="/en/reference/schema-reference.html#raw">raw</a> fields.
Other large values include multivalue fields with many elements,
like an <a href="/en/reference/schema-reference.html#array">array</a>,
<a href="/en/reference/schema-reference.html#weightedset">weightedset</a> or
<a href="/en/reference/schema-reference.html#tensor">tensor</a>.
This is relevant when the field is returned in query responses -
large result sets and parallel queries requires the Container with the query endpoint
to keep many field instances in memory simultaneously.
Use a <a href="/en/document-summaries.html">summary class</a> to tune which fields to return in query responses,
and keep result sets smaller using <a href="/en/reference/query-language-reference.html#limit-offset">limit</a>
or <a href="/en/reference/query-api-reference.html#hits">hits</a>.
</p>
<p>
Vespa requires a document to be able to load into memory in serialized form.
A document in json format is serialized in the Container hosting the
<a href="/en/document-v1-api-guide.html">document-api</a> endpoint,
and persisted in the content node <a href="/en/proton.html#document-store">document store</a>.
</p>
<p>
A text field is capped at <a href="/en/reference/schema-reference.html#max-length">max-length</a>
characters when indexing.
Increase this to index all terms in large string fields, example:
</p>
<pre>
match {
max-length: 15000000
}
</pre>


<h2 id="indexing">indexing</h2>
<p>
Expand Down

0 comments on commit 1373ba2

Please sign in to comment.