You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Indexes of this type are highly sensitive to the choice of the index parameters (n, size_of_bloom_filter_in_bytes, number_of_hash_functions). If these constants are off, the index becomes ineffective.
I was helping a customer today tuning their n-gram filter indexes and I found that the documentation of the tuning-process is not "idiot-proof" enough. The current docs mention different UDFs to help calculate the parameters, but then they also mention 4300 as the number of ngrams per granule without explaining how this number can be calculated. (I found this comment in GitHub which helped me with that but it is really not obvious).
Can we please rewrite the entire tuning process in a more user-friendly manner?
EDIT: I would say we can remove the UDFs. Normal formulas would be just fine.
The text was updated successfully, but these errors were encountered:
ClickHouse provides different skip / secondary indexes types, for example "N-gram Bloom Filter" indexes. These are documented here:
https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes
Indexes of this type are highly sensitive to the choice of the index parameters
(n, size_of_bloom_filter_in_bytes, number_of_hash_functions)
. If these constants are off, the index becomes ineffective.I was helping a customer today tuning their n-gram filter indexes and I found that the documentation of the tuning-process is not "idiot-proof" enough. The current docs mention different UDFs to help calculate the parameters, but then they also mention 4300 as the number of ngrams per granule without explaining how this number can be calculated. (I found this comment in GitHub which helped me with that but it is really not obvious).
Can we please rewrite the entire tuning process in a more user-friendly manner?
EDIT: I would say we can remove the UDFs. Normal formulas would be just fine.
The text was updated successfully, but these errors were encountered: