This repository contains Jupyter notebooks showcasing various approaches to anonymizing Personally Identifiable Information (PII) within log files. The approaches include Regular Expressions (Regex), Presidio Lib, an NLP Model, and Deep Learning (Custom NLP Model). Additionally, sample log files are provided for testing and experimentation.
Regex Approach: Utilizes regular expressions to identify and anonymize PII patterns within log files.
Presidio Lib Approach: Implements the Presidio library for PII detection and anonymization in log files.
NLP Model Approach: Develops an NLP model to detect and anonymize PII using techniques such as tokenization and sequence labeling.
Custom NLP Model Approach: Builds a custom NLP model using deep learning techniques tailored specifically for PII detection and anonymization in log files.