-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: find_or_insert? #2
Comments
Hi @d33tah I'm surprised it's not fast enough for your use case! Are you hashing the raw data and inserting the hash or is I just ran a quick check using xxhash (though in reality the choice of hash here has no impact): pub fn xxhash_bench(c: &mut Criterion) {
use std::hash::BuildHasher;
use std::hash::Hasher;
use twox_hash::*;
// Really any u64/hash bytes is fine, but this shows how to use xxhash.
let mut bloom = CompressedBitmap::new(FilterSize::KeyBytes2);
let mut hasher = RandomXxHashBuilder64::default().build_hasher();
hasher.write(&[42]); // Hash the raw data bytes
let hash = hasher.finish().to_le_bytes();
c.bench_function("xxhash_insert", |b| b.iter(|| bloom.insert_hash(hash)));
c.bench_function("xxhash_lookup hit", |b| {
b.iter(|| black_box(bloom.contains_hash(hash)))
});
c.bench_function("xxhash_lookup miss, partial match", |b| {
b.iter(|| black_box(bloom.contains_hash([1, 2, 3, 4, 5, 6, 7, 8])))
});
} And got:
I'm compiling with To answer your actual question: sounds like a great idea! Maybe something like: fn try_insert<T: AsRef<[u8]>>(&self, hash: T) -> bool Where the returned I think this crate could do with some love to be honest - adding a generic parameter for the default hasher (xxhash?) the user can override instead of requiring them to pass hashes directly, and adding more utility methods seems like an obvious improvement. My original use case was dealing with hashes, so I didn't want the bloom filter to then _re_hash the hash I was giving it, so that's why it wound up like this, but now it's public that is likely not the most useful form this crate could take... Dom |
@domodwyer thanks for the RUSTFLAGS hint! It speeded up the program twice. Now I guess that the next low hanging fruit is the try_insert function you proposed. If you're interested in adding it, please ping me once it's ready for testing. :) |
Hi! Thanks for the library! I played with it for a while, though unfortunately it wasn't fast enough for my use case. Does it use CPU's built-in hash capabilities to quickly create hashes?
Anyway, I might have a feature request: when counting how many entries are in a given set, I was trying to use the library but found that it's probably hashing twice the same entries. Here's the relevant code:
Would it make sense to add insert_if_not_exists() that returns information about whether the hash was already there?
The text was updated successfully, but these errors were encountered: