-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search is not correctly working for words containing non-common characters like Turkish "İ" #33003
Comments
Then saving the doc with
does not succeed either
Now in Turkish alphabet lowercased I (73) is ı (305) |
Hey, thanks for the detailed ticket. Attribute fields are not subject to linguistic processing at indexing or query time, so this is unrelated to language settings/set_language. This issue is related to case folding, as using
Tracing with tracelevel=9 using cased matching, avoids the faulty lowercasing in the container
Without case matching (default) you get the following trace:
So it looks like the lowercasing in the stateless container layer is the issue here. |
Describe the bug
While document is stored with a field value = "ÜRÜNLERİ" - it cannot be then found by exacly same keyword "ÜRÜNLERİ" (but gets found by "ÜRÜNLERI" )
To Reproduce
Given the schema as
And document indexed as :
Then the following search query does return the doc
but this one with "incorrect" "I" returns the doc:
Expected behavior
search returns the doc for search term "ÜRÜNLERİ'"
Environment
docker image: vespaengine/vespa:8.452.13
Vespa version
8.452.13
Additional context
Issue might be reproduced within the app package attached:
vespa_encoding_issue.zip
Indexing request:
Search request:
The text was updated successfully, but these errors were encountered: