Implementations of Locality Sensitive Hashing schemes.
Implementation of Minwise Independent Hashing.
Example usage:
using LocalitySensitive
documents = readlines("resources/benchmark_data.csv")
mh = MinHash()
fingerprints = fingerprint_all(mh, [shingle(d, size=4) for d in documents])
estimate_jaccard(fingerprints[1], fingerprints[2])
mhind = MinHashIndex(minhash=mh, threshold=0.9)
for f in fingerprints
push!(mhind, f)
end
pairs = similar_pairs(mhind)
find_similar(mhind, fingerprint(mh, shingle(documents[210], size=4)))