Work toward containerizing tools and processes from the BitCurator Environment
Following ideas laid out here: https://docs.google.com/presentation/d/1RSWwAiHWogOHjsAuEH56XSP4R_jJMGiy0vfVwVH3JN0/edit#slide=id.ge91208f789_0_269
Additional context here: https://docs.google.com/document/d/1QVoQsSnWWXnq-_oThAkwksiAIkxs7EliPRK7Tx-1fpg/edit?usp=sharing
-
sample-data contains two directories:
- disk-images - two disk images from the M57.Biz corpus (https://downloads.digitalcorpora.org/corpora/scenarios/2009-m57-patents/), and a third disk image created for an introduction to Bitcurator workshop at BUF 2019 (h/t to Dianne Dietrich, Marty Gengenbach, and Amy Berish).
- logical-files - sample files for testing. Many of these came from the BUF 2019 workshop materials, and any sensitive PII included (e.g., SSNs, DOBs, GPAs) are entirely made up.
- email - sample MBOX files exported form Gmail. These are from farrell's personal email, but are the output of public newsletter subscriptions so aren't private
-
docker-files - as of this writing, consists of three Dockerfiles shared in September 2021, April 2022, and May 2022. These have been organized according to the base operating system, including Kali Linux, Ubuntu rolling, and Ubunutu 20.04 LTS (Focal). As of May 2022, we cannot build an image properly using 22.04 LTS (bulk_extractor issues).
-
testings.txt - notes from testing various parts of this.
More to come, I'm sure.