Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

Merged
merged 25 commits into from
Nov 2, 2023

Conversation

richardmatthewsdev
Copy link
Contributor

@richardmatthewsdev richardmatthewsdev commented Oct 30, 2023

Acceptance Criteria

  • SJ handles large extractions efficiently and operators do not get frustrated with slow page loads

Solution Notes

What we have decided to do here is allow the harvest operator to split the resulting documents from an Extraction Definition. This means instead of a single large XML file full of records we end up with lots of little ones that can then flow into the normal use case. Currently we only support XML but we will look to change this in the future.

Since we have removed per page from the extraction definition these fields no longer work.
Initial idea for a sidekiq worker that can split large files into lots of small ones.
Initial working with XML format.
I think this could be refactored in the future if we move the record selector to the harvest block.
Add the ability for the operator to say if a file needs to be split or not.
Hide the extraction definition preview button if the file is large.
If the user says the document needs to be split, queue it after the extraction has completed.
Add tests for functionality introduced by the Split Worker.
Add test for preventing the Transformation when the extracted file needs to be split.
ERB Lint and Prettier
Split large files into batches of 100 records, rather than individually, to make transforming and dealing with them more efficient.
Copy link

@isabel-anastasiadis-boost isabel-anastasiadis-boost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Allow the user to select that the extraction needs to be split from creation.
Allow the user to run sample and transform data when the extraction needs to be split.
@richardmatthewsdev richardmatthewsdev merged commit d0fc391 into main Nov 2, 2023
7 checks passed
@richardmatthewsdev richardmatthewsdev deleted the rm/large-extractions branch November 2, 2023 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants