- Splink (Python, SQL, Spark) - Scalable Fellegi-Sunter and rule-based entity resolution using your choice of SQL or Spark backend.
- Zingg (Python, Java) - Scalable, active learning model for entity resolution.
- dedupe (Python) - Active learning and flexible Python tooling for entity resolution.
- PyJedAI (Python, Java) - State-of-the-art entity resolution clustering algorithms.
- DeepMatcher (Python) - Deep learning-based entity ersolution
- FastLink (R) - Easy, scalable Fellegi-Sunter entity resolution on your laptop.
- RecordLinkage (Python) - Toolkit for prototyping entity resolution systems.
- dblink (R, Spark) - Scalable Bayesian graphical entity resolution.
- exchanger (R, C++) - More flexible Bayesian graphical entity resolution on your laptop.
- RELAIS (R, SQL, Java) - Record linkage software used at the Italian National Statistics Institute.
- ER-Evaluation (Python) - End-to-End evaluation, including summary statistics for monitoring, principled performance metric estimators, and error analysis.
- clevr (R) - Performance metrics and error tables.
- jellyfish (Python, C) - Fast string distance and phonetic matching.
- py_stringmatching (Python, C) - Large set of string comparison functions and tokenizaztion methods.
- textdistance (Python) - Very large collection of sequence comparison functions, including token-based distances.
- SecondString (Java) - Java implementation of string comparison functions.
- StringCompare (Python, C++) - Time and space efficient implementation of common string distance functions. Architectured for maintainability and extendability.
- Comparator (R, C++) - Efficient string comparison functions in R.
- Entity Embed (Python, PyTorch) - Pytorch text embedding model for blocking.
- FaceNet-PyTorch (Python, PyTorch) - Embeddings for facial identity resolution.
- cleanco (Python) - Company name cleaning.
- libpostal (C, and bindings for Python, Java, Go, Ruby, PHP, and NodeJS) - Multinational address parsing.
- Ftfy (Python) - Fixes text (unicode artifacts) for you.
- PyJanitor (Python) - Clean code for clean data.
- ProbablePeople - Western name parser.
- python-nameparser (Python) - Separate names into individual components.
- Nominally - Name parser for record linkage.
- GreatExpectations (Python) - Data quality checks.
- validate (R) - Data quality checks in R.
- blocking (R) - Blocking based on approximate nearest neighbours.
- ElasticSearch - Search text.
- DeezyMatch (Python) - Deep embedding and approximate nearest-beighbor blocking for entity resolution.
- StarSpace (C++, Python) - Embedding model suitable for similarity learning.
- Automated Data Inc - AI driven, low-code.
- Tilores - Flexible entity resolution platform.
- Senzing - Pre-configured entity resolution and entity management for people and organizations.
- Match Data Pro - Batch entity resolution in the browser. Integrates with Senzing.
- Reltio - Cloud master data management with ER functionality.
- Quantexa - Entity resolution and graph analytics.
- Dataladders Data Match
- WinPure Clean and Match
- AWS Entity Resolution - Rule-based entity resolution
- Google Cloud Entity Reconciliation - Part of Enterprise Knowledge Graph.
- Syntini Data Matching
- Amperity
- Hands-On Entity Resolution with Splink - Practical entity resolution with Splink and cloud computing.
- Linking Sensitive Data - Introduction to privacy-preserving record linkage.
- The Four Generations of Entity Resolution - Review of academic research in the field.
- Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection