Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 925 Bytes

README.md

File metadata and controls

24 lines (20 loc) · 925 Bytes

LocalitySensitive

Build Status Coveralls

Implementations of Locality Sensitive Hashing schemes.

MinHash

Implementation of Minwise Independent Hashing.

Example usage:

using LocalitySensitive

documents = readlines("resources/benchmark_data.csv")
mh = MinHash()
fingerprints = fingerprint_all(mh, [shingle(d, size=4) for d in documents])
estimate_jaccard(fingerprints[1], fingerprints[2])
mhind = MinHashIndex(minhash=mh, threshold=0.9)
for f in fingerprints
    push!(mhind, f)
end
pairs = similar_pairs(mhind)
find_similar(mhind, fingerprint(mh, shingle(documents[210], size=4)))