Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full LENZ ingest #63

Closed
wants to merge 33 commits into from
Closed

Full LENZ ingest #63

wants to merge 33 commits into from

Conversation

richardmatthewsdev
Copy link
Contributor

As an API user I want access to all LENZ data so I can build out demo experiences on the NZL website

Acceptance criteria

  • Ensure that security controls and steps are followed as per protocol document
  • Confirm appearance of S3 bucket that holds copy of LENZ source data
  • Consider the best way to store this source data for ongoing access
  • Consider the best way to store PDFs for ongoing access
  • NZL website demo confirms the fields needed for their search and access experiences
  • Mapping rules from source fields to API schema is confirmed with the Tech Lead before processing
  • All required data is available in the API

Notes

Acts SoD 79.4GB 3,625,900 Files, 39,346 Folders
Bills SoD 4.51GB 239,702 Files, 7,145 Folders
Regulations SoD 24.4GB 803,340 Files, 54,437 Folders
SOPs SoD 876MB 47,652 Files, 4,934 Folders

Plus Agency, Deemed Regs, etc, which are much smaller.

richardmatthewsdev and others added 30 commits March 5, 2024 13:46
…er as it is going to work with more formats than just PDF
…_harvester into rm/extract-data-from-documents
Remove need for checking if the file is a binary when writing it to the filesystem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants