Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuckoo filter constructors overestimate insert success probability #9

Open
jbapple-cloudera opened this issue Dec 11, 2016 · 1 comment

Comments

@jbapple-cloudera
Copy link
Contributor

Making trees to hold one item per 12.5 bits often fails. For instance, the constructor makes a filter of size 3 mebibytes when asked to construct a filter to hold 2,013,265 items, but this is too ambitious and often fails. In my experience 16,106,127 is even more error-prone as a constructor argument if I expect to actually call Add() that many times.

I suspect the limit on Add()'s kickout rate should not be constant. "An Analysis of Random-Walk Cuckoo Hashing" shows a polylogairthmic upper bound on the length of kickouts in a different but related cuckoo hash table.

@dave-andersen
Copy link
Member

I suggest a quick hack change of reducing 0.96 (which is close to the bound) to 0.94; this is a fairly insignificant increase in memory consumption for nearly all workloads, but reduces substantially the tail failure probability.

If we switch to a true BFS-based insert as I suggested in the other issue, we can increase the maximum effective cuckoo count a bit without incurring as much of a slowdown. It'd then be worth doing a bit of an empirical analysis of what the constants should be with polylog scaling -- or we could just go big and feel safe for almost all workloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants