Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

consider support for additional hll(p) register sizes (currently only 5 is used) #89

Open
tea-dragon opened this issue Mar 16, 2015 · 0 comments

Comments

@tea-dragon
Copy link
Contributor

(this issue takes over for #88 after clarification)

There is a trade off between bit usage and ultra-high cardinality accuracy. The current HLLP implementation uses 5 bits per register similar to HLL. The benefit is that it is an easier comparison between the two, and more efficient for most use cases. However, iirc, the original paper does recommend going up to 6 bits regardless, and it is not terribly difficult to modify the register size to be a non-constant (other than serialization format concerns).

Additionally, although the lower cardinality space is fairly well covered by the sparse set representation, it is also possible that there may be benefit to allowing an even lower register size. This may work even better if some kind of additional, secondary dynamic switch is supported. eg. "SPARSE -> NORMAL_4 -> NORMAL_5 -> NORMAL_6" or something. The runtime performance may be tricky to do well in that case though.

The easiest solution is to add a config parameter and somehow deal with serialization issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant