-
Notifications
You must be signed in to change notification settings - Fork 111
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve lucene-linguistics no-java app (#1402)
* Add testing of non-java Lucene linguistic sample app * remove tuning
- Loading branch information
Jo Kristian Bergum
authored
Mar 8, 2024
1 parent
ea537bc
commit 3e0a71f
Showing
9 changed files
with
122 additions
and
60 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,56 @@ | ||
# Lucene Linguistics in non-Java Vespa applications | ||
|
||
In non-java projects it is possible to use Lucene Linguistics as a jar bundle. | ||
<!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. --> | ||
|
||
Download and add the Vespa bundle jar into the `components` directory: | ||
```shell | ||
(mkdir -p components && cd components && curl -L https://github.com/dainiusjocas/vespa-lucene-linguistics-bundle/releases/download/v.0.0.3/lucene-linguistics-bundle-0.0.3-deploy.jar --output lucene-linguistics-bundle-0.0.3-deploy.jar) | ||
``` | ||
<picture> | ||
<source media="(prefers-color-scheme: dark)" srcset="https://vespa.ai/assets/vespa-ai-logo-heather.svg"> | ||
<source media="(prefers-color-scheme: light)" srcset="https://vespa.ai/assets/vespa-ai-logo-rock.svg"> | ||
<img alt="#Vespa" width="200" src="https://vespa.ai/assets/vespa-ai-logo-rock.svg" style="margin-bottom: 25px;"> | ||
</picture> | ||
|
||
Deploy the application package: | ||
```shell | ||
vespa deploy -w 100 | ||
``` | ||
# Vespa sample applications - Lucene Linguistics | ||
|
||
Run a query: | ||
```shell | ||
vespa query 'query=Vespa' 'language=lt' | ||
``` | ||
This app demonstrates using [Lucene Linguistics](https://docs.vespa.ai/en/lucene-linguistics.html). | ||
|
||
The logs should contain record: | ||
```text | ||
[2023-08-16 11:21:04.847] INFO container Container.com.yahoo.language.lucene.AnalyzerFactory Analyzer for language=lt is from a list of default language analyzers. | ||
``` | ||
|
||
Profit. | ||
<p data-test="run-macro init-deploy examples/lucene-linguistics/non-java"> | ||
Requires at least Vespa 8.315.19 | ||
</p> | ||
|
||
## To try this application | ||
|
||
Follow [Vespa getting started](https://cloud.vespa.ai/en/getting-started) | ||
through the <code>vespa deploy</code> step, cloning `examples/lucene-linguistics/non-java` instead of `album-recommendation`. | ||
|
||
Feed 3 sample documents in Norwegian, Swedish, and Finnish: | ||
|
||
<pre data-test="exec"> | ||
vespa feed ext/*.json | ||
</pre> | ||
|
||
Example queries: | ||
|
||
<pre data-test="exec" data-test-assert-contains="id:no:doc::1"> | ||
vespa query 'yql=select * from doc where userQuery()'\ | ||
'language=no' 'summary=debug-text-tokens' \ | ||
'query=tips til utendørsaktiviteter' | ||
</pre> | ||
|
||
<pre data-test="exec" data-test-assert-contains="id:sv:doc::1"> | ||
vespa query 'yql=select * from doc where userQuery()'\ | ||
'language=sv' 'summary=debug-text-tokens' \ | ||
'query=tips til utomhusaktiviteter' | ||
</pre> | ||
|
||
<pre data-test="exec" data-test-assert-contains="id:fi:doc::1"> | ||
vespa query 'yql=select * from doc where userQuery()'\ | ||
'language=fi' 'summary=debug-text-tokens' \ | ||
'query=vinkkejä ulkoilma-aktiviteetteihin' | ||
</pre> | ||
|
||
### Terminate container | ||
|
||
Remove the container after use (Only relevant for local deployments) | ||
<pre data-test="exec"> | ||
$ docker rm -f vespa | ||
</pre> | ||
|
||
The jar is hosted on [Github](https://github.com/dainiusjocas/vespa-lucene-linguistics-bundle/releases). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"put": "id:fi:doc::1", | ||
"fields": { | ||
"text": "Tervetuloa retkeilemään! Tässä oppaassa jaamme vinkkejä retkeilyreitin suunnitteluun ja valmistautumiseen. Olipa suunnitelmissasi päiväretki lähiluontoon tai pidempi vaellusreissu kansallispuistossa, löydät täältä tarvittavat tiedot ja neuvoja unohtumattoman retken järjestämiseksi.", | ||
"language": "fi" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"put": "id:no:doc::1", | ||
"fields": { | ||
"text": "Velkommen til naturopplevelser! I denne guiden deler vi tips om planlegging og forberedelser til utendørsaktiviteter. Enten du planlegger en dagstur i nærområdet eller en lengre fjelltur i nasjonalparken, finner du her nødvendig informasjon og råd for å arrangere en minneverdig tur.", | ||
"language": "no" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ | ||
"put": "id:sv:doc::1", | ||
"fields": { | ||
"text": "Välkommen till naturäventyr! I den här guiden delar vi tips om planering och förberedelser inför utomhusaktiviteter. Oavsett om du planerar en dagsutflykt i närområdet eller en längre vandringsresa i nationalparken, hittar du här nödvändig information och råd för att arrangera en minnesvärd tur.", | ||
"language": "sv" | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
schema doc { | ||
|
||
document doc { | ||
field language type string { | ||
indexing: set_language | summary | index | ||
match: word | ||
} | ||
field text type string { | ||
indexing: summary | index | ||
index: enable-bm25 | ||
} | ||
} | ||
|
||
fieldset default { | ||
fields: text | ||
} | ||
document-summary debug-text-tokens { | ||
summary documentid {} | ||
summary language {} | ||
summary text {} | ||
summary text_tokens { | ||
source: text | ||
tokens | ||
} | ||
from-disk | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters