-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Since we have removed per page from the extraction definition these fields no longer work.
Initial idea for a sidekiq worker that can split large files into lots of small ones.
Initial working with XML format.
I think this could be refactored in the future if we move the record selector to the harvest block.
Add the ability for the operator to say if a file needs to be split or not.
Hide the extraction definition preview button if the file is large.
If the user says the document needs to be split, queue it after the extraction has completed.
Add tests for functionality introduced by the Split Worker.
Add test for preventing the Transformation when the extracted file needs to be split.
ERB Lint and Prettier
richardmatthewsdev
requested review from
isabel-anastasiadis-boost and
motizuki
November 1, 2023 00:54
app/frontend/js/modals/transformationDefinitionSettingsModal.js
Outdated
Show resolved
Hide resolved
motizuki
approved these changes
Nov 1, 2023
app/frontend/js/modals/transformationDefinitionSettingsModal.js
Outdated
Show resolved
Hide resolved
Code review feedback :)
Split large files into batches of 100 records, rather than individually, to make transforming and dealing with them more efficient.
isabel-anastasiadis-boost
approved these changes
Nov 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Allow the user to select that the extraction needs to be split from creation. Allow the user to run sample and transform data when the extraction needs to be split.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Acceptance Criteria
Solution Notes
What we have decided to do here is allow the harvest operator to split the resulting documents from an Extraction Definition. This means instead of a single large XML file full of records we end up with lots of little ones that can then flow into the normal use case. Currently we only support XML but we will look to change this in the future.