Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymize db #566

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Anonymize db #566

wants to merge 11 commits into from

Conversation

gadogado
Copy link
Contributor

@gadogado gadogado commented Mar 1, 2019

Overview

This PR adds postgres comments to all columns that should be anonymized when the database is dumped via pg_dump. The fake_pipe library will reflect on all pg comments when the database is dumped and will anonymize the data with Faker for the respective target type, e.g, sentence, word, bcrypt_password, md5, etc. (supported conversions)

Changes

There's a new migration that adds all of the comments. This needs to be run on the database before its dumped and anonymized. After this is complete you can use a new rake task to dump, restore, and even setup an anon user for testing.

How to test

  1. Load your database ( or use the existing one ) with whatever db you want to anonymize.
  2. Run the migrations
  3. Dump and and anonymize the output file: rake db:anon_dump ( Takes ~ 8 mins on my local )
  4. Restore with the anon dump file that was created: rake db:anon_restore
  5. Setup one of the users for dev testing: rake db:setup_anon_user
  6. Do a manual audit and make sure everything is anonymized. I usually use something like postico for this

Possible issues

  1. The dump process should ignore acls and owners. If there's any issues you can create a missing user with the usual: CREATE USER postgres SUPERUSER;

TODOS

  • fork fake_pipe to add a number mutator

@gadogado gadogado marked this pull request as ready for review March 4, 2019 19:22
@gadogado gadogado changed the title [WIP] Anonymize db Anonymize db Mar 5, 2019
@gadogado gadogado requested a review from sethherr April 5, 2019 00:23
@codeclimate
Copy link

codeclimate bot commented Apr 5, 2019

Code Climate has analyzed commit d40ac72 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (80% is the threshold).

This pull request will bring the total coverage in the repository to 87.4% (0.0% change).

View more on Code Climate.

@sethherr sethherr changed the base branch from master to main June 22, 2020 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant