Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]false positive rate #36

Open
MainHanzo opened this issue Aug 29, 2018 · 4 comments
Open

[Question]false positive rate #36

MainHanzo opened this issue Aug 29, 2018 · 4 comments

Comments

@MainHanzo
Copy link

Hello everyone, i am doing some research on the performance of the bloom filter, cuckoo filter and simd-block etc. And when i tried to compare the performance i found that the false positive rate of the cuckoo filter is not mentioned. So I would like to ask how much is the false positive rate of the cuckoo filter, how is the fp rate defined?

@dnbaker
Copy link
Contributor

dnbaker commented Aug 29, 2018

I recommend looking at the manuscript.

The key difference regarding false positive rates is that bloom filters' false positive rate approaches 1 as a function of the number of elements, while a cuckoo filter's error rate caps out at a relatively small rate (often 1-5%) regardless of how full the structure becomes.
This comes at the cost of insertion increasing as the structure becomes more full.

The other disadvantage is that sketches cannot be merged or compared as concise representations of sets the way a bloom filter can, which eliminates a major use case for them.

@jbapple-cloudera
Copy link
Contributor

@MainHanzo , when you run bulk-insert-and-query.exe, the false positive rate is the column labeled "ε". When you run conext-table3.exe, the false positive rate is in the row labeled "false positive rate".

@MainHanzo
Copy link
Author

@jbapple-cloudera @dnbaker Thank you for all your help.
@dnbaker However, I didn't get the meaning of"the sketches", do they refer to the cuckoo filters? and "concise representation of sets the way a bloom filter can". May i ask you explain more of that?
@jbapple-cloudera I have another question: If the fp rate are changing when the elements inserted into the filter, what is the man-set fp rate for? Is that just the fp rate for the very beginning?
I'm sorry, this is from a very beginner. Thank you very much for your patience.

@dnbaker
Copy link
Contributor

dnbaker commented Aug 31, 2018

@MainHanzo Correct. Both cuckoo and bloom filters are sketch data structures, and a filled sketch data structure is often referred to as a sketch.

Both bloom filters and cuckoo filters provide inexact set membership queries. However, bloom filters can approximate set operations by using bitwise & and |s, while merging cuckoo filters is much more expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants