diff --git a/examples/README.md b/examples/README.md
index e2b7ef2c8..6a4048333 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -43,6 +43,11 @@ Generic [request-response](https://docs.vespa.ai/en/jdisc/processing.html) proce
+### Lucene Linguistics
+The [lucene-linguistics](lucene-linguistics) contains two sample application packages:
+1. A bare minimal app.
+2. Shows advanced configuration of the Lucene based `Linguistics` implementation.
+
----
Note: Applications with _pom.xml_ are Java/Maven projects and must be built before being deployed.
diff --git a/examples/lucene-linguistics/README.md b/examples/lucene-linguistics/README.md
new file mode 100644
index 000000000..413c2d527
--- /dev/null
+++ b/examples/lucene-linguistics/README.md
@@ -0,0 +1,195 @@
+
+
+![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)
+
+# Vespa LuceneLinguistics Demos
+
+A couple of example of how to get started with the `lucene-linguistics`:
+
+- `non-java`: an absolute minimum to get started;
+- `minimal`: minimal Java based project using Lucene Linguistics;
+- `advanced-configuration`: demonstrates the configurability;
+- `going-crazy`: demonstrates the advanced setup;
+
+## Getting started
+
+For all application packages the procedure is the same:
+go to the application package directory and play with the following commands:
+
+```shell
+# Of course make sure that your Docker daemon is running
+# make sure that Vespa CLI is installed
+brew install vespa-cli
+# Maven must be 3.6+
+brew install maven
+
+docker run --rm --detach \
+ --name vespa \
+ --hostname vespa-container \
+ --publish 8080:8080 \
+ --publish 19071:19071 \
+ --publish 19050:19050 \
+ vespaengine/vespa:8.224.19
+
+# To observe the logs from LuceneLinguistics run in a separate terminal
+docker logs vespa -f | grep -i "lucene"
+
+vespa status deploy --wait 300
+
+(mvn clean package && vespa deploy -w 100)
+
+vespa feed src/main/application/ext/document.json
+vespa query 'yql=select * from lucene where default contains "dogs"' \
+ 'model.locale=en'
+
+# after this query log entry like this should appear:
+[2023-08-02 19:57:12.106] INFO container Container.com.yahoo.language.lucene.AnalyzerFactory Analyzer for language=en is from a list of default language analyzers.
+```
+
+The query should return:
+```json
+{
+ "root": {
+ "id": "toplevel",
+ "relevance": 1.0,
+ "fields": {
+ "totalCount": 1
+ },
+ "coverage": {
+ "coverage": 100,
+ "documents": 1,
+ "full": true,
+ "nodes": 1,
+ "results": 1,
+ "resultsFull": 1
+ },
+ "children": [
+ {
+ "id": "id:mynamespace:lucene::mydocid",
+ "relevance": 0.16343879032006287,
+ "source": "content",
+ "fields": {
+ "sddocname": "lucene",
+ "documentid": "id:mynamespace:lucene::mydocid",
+ "mytext": "Cats and Dogs"
+ }
+ }
+ ]
+ }
+}
+```
+
+### Observing query rewrites
+
+```shell
+vespa query 'yql=select * from lucene where default contains "dogs"' \
+ 'model.locale=en' \
+ 'trace.level=2' | jq '.trace.children | last | .children[] | select(.message) | select(.message | test("YQL.*")) | .message'
+```
+Output
+```shell
+"YQL+ query parsed: [select * from lucene where default contains \"dog\" timeout 10000]"
+```
+See that the `dogs` rewritten as `dog`.
+
+Change the `model.locale` to other language, change the query, and observe the analysis differences.
+
+### Observing the indexed tokens
+
+It is possible to explore the tokens directly in the index.
+To do that you can run these commands **inside** the running Vespa Docker container.
+
+```shell
+# Into the Vespa docker
+docker exec -it vespa bash
+# Trigger the flushing to the disk
+vespa-proton-cmd --local triggerFlush
+
+# Show the posting lists
+vespa-index-inspect showpostings \
+ --indexdir /opt/vespa/var/db/vespa/search/cluster.content/n0/documents/lucene/0.ready/index/$(ls /opt/vespa/var/db/vespa/search/cluster.content/n0/documents/lucene/0.ready/index/)/ \
+ --field mytext --transpose
+# =>
+# docId = 1
+# field = 0 "mytext"
+# element = 0, elementLen = 2, elementWeight = 1
+# pos = 0, word = "cat"
+# pos = 1, word = "dog"
+
+# Show the tokens
+vespa-index-inspect dumpwords \
+ --indexdir /opt/vespa/var/db/vespa/search/cluster.content/n0/documents/lucene/0.ready/index/$(ls /opt/vespa/var/db/vespa/search/cluster.content/n0/documents/lucene/0.ready/index/)/ \
+ --wordnum \
+ --field mytext
+# =>
+# 1 cat 1
+# 2 dog 1
+```
+
+Have fun!
+
+## Common Issues
+
+The `lucene-linguistics` component is highly configurable.
+It has an optional `configDir` configuration parameter of type `path`.
+`configDir` is a directory to store linguistics resources, e.g. dictionaries with stopwords, etc., and is relative to the VAP root directory.
+
+There are several known problems that might happen when `configDir` is misconfigured.
+
+### `configDir` is specified but doesn't exist
+
+If the `configDir` doesn't exist then `vespa deploy` would fail with such error:
+
+```shell
+Uploading application package ... failed
+Error: invalid application package (400 Bad Request)
+Invalid application:
+Unable to send file specified in com.yahoo.language.lucene.lucene-analysis:
+/opt/vespa/var/db/vespa/config_server/serverdb/tenants/default/sessions/4/lucene (No such file or directory)
+```
+
+### Empty directory can't be referred
+
+If the `configDir` is set with `foo` which is empty then during deployment you get a misleading error message:
+```shell
+Uploading application package ... failed
+Error: invalid application package (400 Bad Request)
+Invalid application:
+Unable to send file specified in com.yahoo.language.lucene.lucene-analysis:
+/opt/vespa/var/db/vespa/config_server/serverdb/tenants/default/sessions/8/foo (No such file or directory)
+```
+
+### Application package root cannot be used as `configDir`
+
+If you try to be clever and set `.` then application package would be deployed(!) BUT
+not converge with the following error:
+```shell
+Uploading application package ... done
+
+Success: Deployed target/application.zip
+WARNING Jar file 'vespa-lucene-linguistics-poc-0.0.1-deploy.jar' uses non-public Vespa APIs: [com.yahoo.language.simple]
+
+Waiting up to 1m40s for query service to become available ...
+Error: service 'query' is unavailable: services have not converged
+```
+
+And Vespa logs would be filled with such warnings:
+```shell
+[2023-08-02 20:30:47.675] WARNING configproxy stderr Exception in thread "Rpc executorpool-6-thread-5" java.lang.RuntimeException: More than one file reference found for file 'fbcf5c3dc81d9540'
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.filedistribution.FileDownloader.getFileFromFileSystem(FileDownloader.java:109)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.filedistribution.FileDownloader.getFileFromFileSystem(FileDownloader.java:100)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.filedistribution.FileDownloader.getFutureFile(FileDownloader.java:80)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.filedistribution.FileDownloader.getFile(FileDownloader.java:70)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.config.proxy.filedistribution.FileDistributionRpcServer.downloadFile(FileDistributionRpcServer.java:109)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat com.yahoo.vespa.config.proxy.filedistribution.FileDistributionRpcServer.lambda$getFile$0(FileDistributionRpcServer.java:84)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
+[2023-08-02 20:30:47.675] WARNING configproxy stderr \tat java.base/java.lang.Thread.run(Thread.java:833)
+```
+
+### Harmless warning
+`vespa deploy` always warns with:
+```shell
+WARNING Jar file 'vespa-lucene-linguistics-poc-0.0.1-deploy.jar' uses non-public Vespa APIs: [com.yahoo.language.simple]
+```
+You can ignore this warning.
diff --git a/examples/lucene-linguistics/advanced-configuration/README.md b/examples/lucene-linguistics/advanced-configuration/README.md
new file mode 100644
index 000000000..be49d1ceb
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/README.md
@@ -0,0 +1,66 @@
+# Vespa Lucene Linguistics
+
+This Vespa application package (VAP) previews the configuration options of the `lucene-linguistics` package.
+Probably the main benefit of the `LuceneLinguistics` is the configurability when compared to other `Linguistics` implementations.
+
+## Custom Lucene Analyzers
+
+There are multiple ways to use a Lucene `Analyzer` for a language.
+Each analyzer is identified by a language key, e.g. 'en' for English language.
+These are Analyzer types in the order of descending priority:
+1. Created through the `Linguistics` component configuration.
+2. An `Analyzer` wrapped into a Vespa ``.
+3. A list of [default Analyzers](https://github.com/vespa-engine/vespa/blob/5d26801bc63c35705e708d3cc7086f0b0103e909/lucene-linguistics/src/main/java/com/yahoo/language/lucene/DefaultAnalyzers.java) per language.
+4. The `StandardAnalyzer`.
+
+### Add a Lucene Analyzer component
+
+Vespa provides a `ComponentRegistry` mechanism.
+The `LuceneLinguistics` accepts a `ComponentRegistry` into the constructor.
+Basically, the Vespa container at start time collects all the components that are of the `Analyzer` type automagically.
+
+To declare such components:
+```xml
+
+```
+Where:
+- `id` should contain a language code.
+- `class` should be the implementing class.
+Note that it is a class straight from the Lucene library.
+Also, you can create an `Analyzer` class just inside your VAP and refer it.
+- `bundle` must be your application package `artifactId` as specified in the `pom.xml`.
+
+Here are two types of `Analyzer` components:
+1. That doesn't require any setup.
+2. That requires a setup (e.g. constructor with arguments).
+
+The previous component declaration example is of type (1).
+
+The (2) type requires a bit more work.
+
+Create a class (e.g. for the Polish language):
+```java
+package ai.vespa.linguistics.pl;
+
+import com.yahoo.container.di.componentgraph.Provider;
+import org.apache.lucene.analysis.Analyzer;
+
+public class PolishAnalyzer implements Provider {
+ @Override
+ public Analyzer get() {
+ return new org.apache.lucene.analysis.pl.PolishAnalyzer();
+ }
+ @Override
+ public void deconstruct() {}
+}
+```
+
+Add a component declaration into the `services.xml` file:
+```xml
+
+```
+And now you have the handling of the Polish language available.
diff --git a/examples/lucene-linguistics/advanced-configuration/pom.xml b/examples/lucene-linguistics/advanced-configuration/pom.xml
new file mode 100644
index 000000000..c98a31c90
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/pom.xml
@@ -0,0 +1,85 @@
+
+
+
+ 4.0.0
+ ai.vespa
+ vespa-lucene-linguistics-poc
+ 0.0.2
+ container-plugin
+
+ false
+ UTF-8
+ true
+ 8.227.41
+ 5.7.1
+
+
+
+
+ org.apache.lucene
+ lucene-analysis-stempel
+ 9.7.0
+
+
+ com.yahoo.vespa
+ lucene-linguistics
+ ${vespa.version}
+
+
+ com.yahoo.vespa
+ linguistics
+ ${vespa.version}
+ provided
+
+
+ com.yahoo.vespa
+ application
+ ${vespa.version}
+ provided
+
+
+ org.junit.jupiter
+ junit-jupiter
+ ${junit.version}
+ test
+
+
+
+
+
+
+ com.yahoo.vespa
+ bundle-plugin
+ ${vespa.version}
+ true
+
+ false
+
+
+
+ com.yahoo.vespa
+ vespa-application-maven-plugin
+ ${vespa.version}
+
+
+
+ packageApplication
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-compiler-plugin
+ 3.11.0
+
+
+ 17
+
+
+
+
+
diff --git a/examples/lucene-linguistics/advanced-configuration/src/main/application/ext/document.json b/examples/lucene-linguistics/advanced-configuration/src/main/application/ext/document.json
new file mode 100644
index 000000000..f7733dd3a
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/src/main/application/ext/document.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::mydocid",
+ "fields": {
+ "language": "en",
+ "mytext": "Cats and Dogs"
+ }
+}
diff --git a/examples/lucene-linguistics/advanced-configuration/src/main/application/schemas/lucene.sd b/examples/lucene-linguistics/advanced-configuration/src/main/application/schemas/lucene.sd
new file mode 100644
index 000000000..86f7b7c3c
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/src/main/application/schemas/lucene.sd
@@ -0,0 +1,15 @@
+schema lucene {
+
+ document lucene {
+ field language type string {
+ indexing: set_language
+ }
+ field mytext type string {
+ indexing: summary | index
+ }
+ }
+
+ fieldset default {
+ fields: mytext
+ }
+}
diff --git a/examples/lucene-linguistics/advanced-configuration/src/main/application/services.xml b/examples/lucene-linguistics/advanced-configuration/src/main/application/services.xml
new file mode 100644
index 000000000..8df239609
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/src/main/application/services.xml
@@ -0,0 +1,25 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
diff --git a/examples/lucene-linguistics/advanced-configuration/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java b/examples/lucene-linguistics/advanced-configuration/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java
new file mode 100644
index 000000000..f2697b4bc
--- /dev/null
+++ b/examples/lucene-linguistics/advanced-configuration/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java
@@ -0,0 +1,14 @@
+package ai.vespa.linguistics.pl;
+
+import com.yahoo.container.di.componentgraph.Provider;
+import org.apache.lucene.analysis.Analyzer;
+
+public class PolishAnalyzer implements Provider {
+ @Override
+ public Analyzer get() {
+ return new org.apache.lucene.analysis.pl.PolishAnalyzer();
+ }
+
+ @Override
+ public void deconstruct() {}
+}
diff --git a/examples/lucene-linguistics/going-crazy/README.md b/examples/lucene-linguistics/going-crazy/README.md
new file mode 100644
index 000000000..6d5b77282
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/README.md
@@ -0,0 +1,32 @@
+# Vespa Lucene Linguistics: Going Crazy
+
+## TL;DR
+
+Search problems get really complicated when you need to deal with multilingual aspects.
+Lucene has a battle-tested and standards compliant set of available libraries to help you solve your problems.
+
+## Context
+
+The goals of this application package are:
+- set up OpenNLP tokenizers;
+- set up Lemmagen token filters with sample resource files;
+- construct an analyzer entirely in Java code and register it as a component;
+
+## Analysis components
+
+Lucene has plenty of components [available](https://lucene.apache.org/core/9_7_0/index.html).
+One of which is [`analysis-opennlp`](https://lucene.apache.org/core/9_7_0/analysis/opennlp/index.html).
+
+### OpenNLP
+
+The OpenNLP library adds 1 tokenizer identified with `openNlp`, and 3 token filters:
+`openNlpLemmatizer`, `openNlpChunker`, `openNlppos`.
+
+Let's set a `org.apache.lucene.analysis.opennlp.OpenNLPTokenizerFactory` and
+`org.apache.lucene.analysis.snowball.SnowballPorterFilterFactory`.
+
+### Feed Documents
+
+```shell
+vespa feed src/main/application/ext/documents/*
+```
diff --git a/examples/lucene-linguistics/going-crazy/pom.xml b/examples/lucene-linguistics/going-crazy/pom.xml
new file mode 100644
index 000000000..38792c9a0
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/pom.xml
@@ -0,0 +1,112 @@
+
+
+
+ 4.0.0
+ ai.vespa
+ vespa-lucene-linguistics-crazy
+ 0.0.2
+ container-plugin
+
+ false
+ UTF-8
+ true
+ 8.227.41
+ 5.7.1
+ 9.7.0
+
+
+
+
+ com.yahoo.vespa
+ lucene-linguistics
+ ${vespa.version}
+
+
+ org.apache.lucene
+ lucene-core
+ ${lucene.version}
+
+
+ org.apache.lucene
+ lucene-analysis-common
+ ${lucene.version}
+
+
+ org.apache.lucene
+ lucene-analysis-opennlp
+ ${lucene.version}
+
+
+ org.apache.lucene
+ lucene-analysis-stempel
+ ${lucene.version}
+
+
+ eu.hlavki.text
+ jlemmagen
+ 1.0
+
+
+ org.slf4j
+ slf4j-api
+
+
+
+
+ com.yahoo.vespa
+ linguistics
+ ${vespa.version}
+ provided
+
+
+ com.yahoo.vespa
+ application
+ ${vespa.version}
+ provided
+
+
+ org.junit.jupiter
+ junit-jupiter
+ ${junit.version}
+ test
+
+
+
+
+
+
+ com.yahoo.vespa
+ bundle-plugin
+ ${vespa.version}
+ true
+
+ false
+
+
+
+ com.yahoo.vespa
+ vespa-application-maven-plugin
+ ${vespa.version}
+
+
+
+ packageApplication
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-compiler-plugin
+ 3.11.0
+
+
+ 17
+
+
+
+
+
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/de-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/de-doc.json
new file mode 100644
index 000000000..c65025fd6
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/de-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::de-doc",
+ "fields": {
+ "language": "de",
+ "mytext": "Katzen und Hunde"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/en-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/en-doc.json
new file mode 100644
index 000000000..a8dbb4e03
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/en-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::en-doc",
+ "fields": {
+ "language": "en",
+ "mytext": "Cats and Dogs"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/fr-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/fr-doc.json
new file mode 100644
index 000000000..f7f1d4303
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/fr-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::fr-doc",
+ "fields": {
+ "language": "fr",
+ "mytext": "Les chats et les chiens"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/it-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/it-doc.json
new file mode 100644
index 000000000..bad2d467f
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/it-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::it-doc",
+ "fields": {
+ "language": "it",
+ "mytext": "Cani e gatti"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/nl-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/nl-doc.json
new file mode 100644
index 000000000..0257c0509
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/nl-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::nl-doc",
+ "fields": {
+ "language": "nl",
+ "mytext": "Katten en honden"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/pl-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/pl-doc.json
new file mode 100644
index 000000000..950d858bd
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/pl-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::pl-doc",
+ "fields": {
+ "language": "pl",
+ "mytext": "Koty i psy"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/sk-doc.json b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/sk-doc.json
new file mode 100644
index 000000000..7fc79ad6b
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/ext/documents/sk-doc.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::sk-doc",
+ "fields": {
+ "language": "sk",
+ "mytext": "Mačky a psy"
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-sentence-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-sentence-1.0-1.9.3.bin
new file mode 100644
index 000000000..9e8dfa5bc
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-sentence-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-tokens-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-tokens-1.0-1.9.3.bin
new file mode 100644
index 000000000..eb7d7708f
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/de/opennlp-de-ud-gsd-tokens-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin
new file mode 100644
index 000000000..d3a277923
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin
new file mode 100644
index 000000000..10c7d02d2
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/en/opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-1.0-1.9.3fr-ud-ftb-sentence-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-1.0-1.9.3fr-ud-ftb-sentence-1.0-1.9.3.bin
new file mode 100644
index 000000000..7ca04d3d2
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-1.0-1.9.3fr-ud-ftb-sentence-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-fr-ud-ftb-tokens-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-fr-ud-ftb-tokens-1.0-1.9.3.bin
new file mode 100644
index 000000000..3343de95a
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/fr/opennlp-fr-ud-ftb-tokens-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-sentence-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-sentence-1.0-1.9.3.bin
new file mode 100644
index 000000000..446a3a4ec
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-sentence-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-tokens-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-tokens-1.0-1.9.3.bin
new file mode 100644
index 000000000..9f58f8d61
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/it/opennlp-it-ud-vit-tokens-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-sentence-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-sentence-1.0-1.9.3.bin
new file mode 100644
index 000000000..f5f28f0f6
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-sentence-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-tokens-1.0-1.9.3.bin b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-tokens-1.0-1.9.3.bin
new file mode 100644
index 000000000..b721a04c8
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/nl/opennlp-nl-ud-alpino-tokens-1.0-1.9.3.bin differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/sk/mlteast-sk.lem b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/sk/mlteast-sk.lem
new file mode 100644
index 000000000..dc1d57820
Binary files /dev/null and b/examples/lucene-linguistics/going-crazy/src/main/application/linguistics/sk/mlteast-sk.lem differ
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/schemas/lucene.sd b/examples/lucene-linguistics/going-crazy/src/main/application/schemas/lucene.sd
new file mode 100644
index 000000000..86f7b7c3c
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/schemas/lucene.sd
@@ -0,0 +1,15 @@
+schema lucene {
+
+ document lucene {
+ field language type string {
+ indexing: set_language
+ }
+ field mytext type string {
+ indexing: summary | index
+ }
+ }
+
+ fieldset default {
+ fields: mytext
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/application/services.xml b/examples/lucene-linguistics/going-crazy/src/main/application/services.xml
new file mode 100644
index 000000000..1240c8236
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/application/services.xml
@@ -0,0 +1,125 @@
+
+
+
+
+
+
+
+ linguistics
+
+ -
+
+ openNLP
+
+
- de/opennlp-de-ud-gsd-sentence-1.0-1.9.3.bin
+ - de/opennlp-de-ud-gsd-tokens-1.0-1.9.3.bin
+
+
+
+ -
+ snowballPorter
+
+
- German2
+
+
+
+
+ -
+
+ openNLP
+
+
- en/opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin
+ - en/opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin
+
+
+
+ -
+ snowballPorter
+
+
- English
+
+
+
+
+ -
+
+ openNLP
+
+
- fr/opennlp-1.0-1.9.3fr-ud-ftb-sentence-1.0-1.9.3.bin
+ - fr/opennlp-fr-ud-ftb-tokens-1.0-1.9.3.bin
+
+
+
+ -
+ snowballPorter
+
+
- French
+
+
+
+
+ -
+
+ openNLP
+
+
- it/opennlp-it-ud-vit-sentence-1.0-1.9.3.bin
+ - it/opennlp-it-ud-vit-tokens-1.0-1.9.3.bin
+
+
+
+ -
+ snowballPorter
+
+
- Italian
+
+
+
+
+ -
+
+ openNLP
+
+
- nl/opennlp-nl-ud-alpino-sentence-1.0-1.9.3.bin
+ - nl/opennlp-nl-ud-alpino-tokens-1.0-1.9.3.bin
+
+
+
+ -
+ snowballPorter
+
+
- Dutch
+
+
+ - reversestring
+
+
+ -
+
+
-
+ lemmagen
+
+
+
- sk/mlteast-sk.lem
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
diff --git a/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilter.java b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilter.java
new file mode 100644
index 000000000..754b34052
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilter.java
@@ -0,0 +1,49 @@
+package ai.vespa.linguistics.lemmagen;
+
+import eu.hlavki.text.lemmagen.api.Lemmatizer;
+import org.apache.lucene.analysis.TokenFilter;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
+import org.apache.lucene.analysis.tokenattributes.KeywordAttribute;
+
+import java.io.IOException;
+
+/**
+ * Code is loosely based on
+ * https://github.com/vhyza/elasticsearch-analysis-lemmagen/blob/master/src/main/java/org/elasticsearch/index/analysis/LemmagenFilter.java
+ */
+public final class LemmagenTokenFilter extends TokenFilter {
+
+ private final CharTermAttribute termAttr = addAttribute(CharTermAttribute.class);
+ private final KeywordAttribute keywordAttr = addAttribute(KeywordAttribute.class);
+ private final Lemmatizer lemmatizer;
+
+ public LemmagenTokenFilter(final TokenStream input, final Lemmatizer lemmatizer) {
+ super(input);
+ this.lemmatizer = lemmatizer;
+ }
+
+ public boolean incrementToken() throws IOException {
+ if (!input.incrementToken()) {
+ return false;
+ }
+ CharSequence lemma = lemmatizer.lemmatize(termAttr);
+ if (!keywordAttr.isKeyword() && !equalCharSequences(lemma, termAttr)) {
+ termAttr.setEmpty().append(lemma);
+ }
+ return true;
+ }
+
+ private boolean equalCharSequences(CharSequence s1, CharSequence s2) {
+ int len1 = s1.length();
+ int len2 = s2.length();
+ if (len1 != len2)
+ return false;
+ for (int i = len1; --i >= 0;) {
+ if (s1.charAt(i) != s2.charAt(i)) {
+ return false;
+ }
+ }
+ return true;
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilterFactory.java b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilterFactory.java
new file mode 100644
index 000000000..6eb8ef1c2
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/lemmagen/LemmagenTokenFilterFactory.java
@@ -0,0 +1,62 @@
+package ai.vespa.linguistics.lemmagen;
+
+import eu.hlavki.text.lemmagen.LemmatizerFactory;
+import eu.hlavki.text.lemmagen.api.Lemmatizer;
+import org.apache.lucene.analysis.TokenFilterFactory;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.util.ResourceLoader;
+import org.apache.lucene.util.ResourceLoaderAware;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Map;
+
+/**
+ * https://lucene.apache.org/core/9_7_0/
+ * https://github.com/vhyza/elasticsearch-analysis-lemmagen
+ * Loosely based on
+ * https://github.com/vhyza/elasticsearch-analysis-lemmagen/blob/master/src/main/java/org/elasticsearch/index/analysis/LemmagenFilterFactory.java
+ * Also inspired by
+ * https://github.com/hlavki/jlemmagen-lucene/blob/master/src/main/java/org/apache/lucene/analysis/lemmagen/LemmagenFilterFactory.java
+ */
+public class LemmagenTokenFilterFactory extends TokenFilterFactory
+ implements ResourceLoaderAware {
+
+ // SPI name
+ public static final String NAME = "lemmagen";
+
+ // Configuration key
+ private static final String LEXICON_KEY = "lexicon";
+ private Lemmatizer lemmatizer = null;
+ private final String lexiconPath;
+
+ /** Creates a new LemmagenTokenFilterFactory */
+ public LemmagenTokenFilterFactory(Map args) {
+ super(args);
+ lexiconPath = require(args, LEXICON_KEY);
+ if (!args.isEmpty()) {
+ throw new IllegalArgumentException("Unknown parameters: " + args);
+ }
+ }
+
+ private Lemmatizer createLemmatizer(InputStream lexiconInputStream) {
+ try {
+ return LemmatizerFactory.read(lexiconInputStream);
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public void inform(ResourceLoader loader) throws IOException {
+ this.lemmatizer = createLemmatizer(loader.openResource(lexiconPath));
+ }
+
+ public LemmagenTokenFilterFactory() {
+ throw defaultCtorException();
+ }
+
+ public TokenStream create(TokenStream input) {
+ return new LemmagenTokenFilter(input, lemmatizer);
+ }
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java
new file mode 100644
index 000000000..f2697b4bc
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/java/ai/vespa/linguistics/pl/PolishAnalyzer.java
@@ -0,0 +1,14 @@
+package ai.vespa.linguistics.pl;
+
+import com.yahoo.container.di.componentgraph.Provider;
+import org.apache.lucene.analysis.Analyzer;
+
+public class PolishAnalyzer implements Provider {
+ @Override
+ public Analyzer get() {
+ return new org.apache.lucene.analysis.pl.PolishAnalyzer();
+ }
+
+ @Override
+ public void deconstruct() {}
+}
diff --git a/examples/lucene-linguistics/going-crazy/src/main/resources/META-INF/services/org.apache.lucene.analysis.TokenFilterFactory b/examples/lucene-linguistics/going-crazy/src/main/resources/META-INF/services/org.apache.lucene.analysis.TokenFilterFactory
new file mode 100644
index 000000000..39ee6fc5d
--- /dev/null
+++ b/examples/lucene-linguistics/going-crazy/src/main/resources/META-INF/services/org.apache.lucene.analysis.TokenFilterFactory
@@ -0,0 +1 @@
+ai.vespa.linguistics.lemmagen.LemmagenTokenFilterFactory
diff --git a/examples/lucene-linguistics/minimal/README.md b/examples/lucene-linguistics/minimal/README.md
new file mode 100644
index 000000000..cdb2673cc
--- /dev/null
+++ b/examples/lucene-linguistics/minimal/README.md
@@ -0,0 +1,3 @@
+# Minimal `lucene-linguistics` setup
+
+This application package contains a bare minimal setup to get started with the `lucene-linguistics`.
diff --git a/examples/lucene-linguistics/minimal/pom.xml b/examples/lucene-linguistics/minimal/pom.xml
new file mode 100644
index 000000000..4ece49073
--- /dev/null
+++ b/examples/lucene-linguistics/minimal/pom.xml
@@ -0,0 +1,73 @@
+
+
+
+ 4.0.0
+ ai.vespa
+ lucene-linguistics-minimal
+ 0.0.1
+ container-plugin
+
+ false
+ UTF-8
+ true
+ 8.227.41
+
+
+
+
+ com.yahoo.vespa
+ lucene-linguistics
+ ${vespa.version}
+
+
+ com.yahoo.vespa
+ linguistics
+ ${vespa.version}
+ provided
+
+
+ com.yahoo.vespa
+ application
+ ${vespa.version}
+ provided
+
+
+
+
+
+
+ com.yahoo.vespa
+ bundle-plugin
+ ${vespa.version}
+ true
+
+ false
+
+
+
+ com.yahoo.vespa
+ vespa-application-maven-plugin
+ ${vespa.version}
+
+
+
+ packageApplication
+
+
+
+
+
+ org.apache.maven.plugins
+ maven-compiler-plugin
+ 3.11.0
+
+
+ 17
+
+
+
+
+
diff --git a/examples/lucene-linguistics/minimal/src/main/application/ext/document.json b/examples/lucene-linguistics/minimal/src/main/application/ext/document.json
new file mode 100644
index 000000000..f7733dd3a
--- /dev/null
+++ b/examples/lucene-linguistics/minimal/src/main/application/ext/document.json
@@ -0,0 +1,7 @@
+{
+ "put": "id:mynamespace:lucene::mydocid",
+ "fields": {
+ "language": "en",
+ "mytext": "Cats and Dogs"
+ }
+}
diff --git a/examples/lucene-linguistics/minimal/src/main/application/schemas/lucene.sd b/examples/lucene-linguistics/minimal/src/main/application/schemas/lucene.sd
new file mode 100644
index 000000000..86f7b7c3c
--- /dev/null
+++ b/examples/lucene-linguistics/minimal/src/main/application/schemas/lucene.sd
@@ -0,0 +1,15 @@
+schema lucene {
+
+ document lucene {
+ field language type string {
+ indexing: set_language
+ }
+ field mytext type string {
+ indexing: summary | index
+ }
+ }
+
+ fieldset default {
+ fields: mytext
+ }
+}
diff --git a/examples/lucene-linguistics/minimal/src/main/application/services.xml b/examples/lucene-linguistics/minimal/src/main/application/services.xml
new file mode 100644
index 000000000..9de5c9879
--- /dev/null
+++ b/examples/lucene-linguistics/minimal/src/main/application/services.xml
@@ -0,0 +1,22 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 1
+
+
+
+
+
+
diff --git a/examples/lucene-linguistics/non-java/.gitignore b/examples/lucene-linguistics/non-java/.gitignore
new file mode 100644
index 000000000..44be31f2f
--- /dev/null
+++ b/examples/lucene-linguistics/non-java/.gitignore
@@ -0,0 +1 @@
+components
diff --git a/examples/lucene-linguistics/non-java/README.md b/examples/lucene-linguistics/non-java/README.md
new file mode 100644
index 000000000..a17745528
--- /dev/null
+++ b/examples/lucene-linguistics/non-java/README.md
@@ -0,0 +1,27 @@
+# Lucene Linguistics in non-Java Vespa applications
+
+In non-java projects it is possible to use Lucene Linguistics as a jar bundle.
+
+Download and add the Vespa bundle jar into the `components` directory:
+```shell
+(mkdir -p components && cd components && curl -L https://github.com/dainiusjocas/vespa-lucene-linguistics-bundle/releases/download/v0.0.2/lucene-linguistics-bundle-0.0.2-deploy.jar --output lucene-linguistics-bundle-0.0.2-deploy.jar)
+```
+
+Deploy the application package:
+```shell
+vespa deploy -w 100
+```
+
+Run a query:
+```shell
+vespa query 'query=Vespa' 'language=lt'
+```
+
+The logs should contain record:
+```text
+[2023-08-16 11:21:04.847] INFO container Container.com.yahoo.language.lucene.AnalyzerFactory Analyzer for language=lt is from a list of default language analyzers.
+```
+
+Profit.
+
+The jar is hosted on [Github](https://github.com/dainiusjocas/vespa-lucene-linguistics-bundle/releases).
diff --git a/examples/lucene-linguistics/non-java/schemas/lucene.sd b/examples/lucene-linguistics/non-java/schemas/lucene.sd
new file mode 100644
index 000000000..86f7b7c3c
--- /dev/null
+++ b/examples/lucene-linguistics/non-java/schemas/lucene.sd
@@ -0,0 +1,15 @@
+schema lucene {
+
+ document lucene {
+ field language type string {
+ indexing: set_language
+ }
+ field mytext type string {
+ indexing: summary | index
+ }
+ }
+
+ fieldset default {
+ fields: mytext
+ }
+}
diff --git a/examples/lucene-linguistics/non-java/services.xml b/examples/lucene-linguistics/non-java/services.xml
new file mode 100644
index 000000000..662398418
--- /dev/null
+++ b/examples/lucene-linguistics/non-java/services.xml
@@ -0,0 +1,20 @@
+
+
+
+
+
+
+
+
+
+
+
+ 1
+
+
+
+
+
+