SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

richardmatthewsdev · 2023-10-30T03:46:11Z

Acceptance Criteria

SJ handles large extractions efficiently and operators do not get frustrated with slow page loads

Solution Notes

What we have decided to do here is allow the harvest operator to split the resulting documents from an Extraction Definition. This means instead of a single large XML file full of records we end up with lots of little ones that can then flow into the normal use case. Currently we only support XML but we will look to change this in the future.

Since we have removed per page from the extraction definition these fields no longer work.

Initial idea for a sidekiq worker that can split large files into lots of small ones.

Initial working with XML format.

I think this could be refactored in the future if we move the record selector to the harvest block.

Add the ability for the operator to say if a file needs to be split or not.

Hide the extraction definition preview button if the file is large.

If the user says the document needs to be split, queue it after the extraction has completed.

Add tests for functionality introduced by the Split Worker.

Add test for preventing the Transformation when the extracted file needs to be split.

ERB Lint and Prettier

app/frontend/entrypoints/application.js

app/frontend/js/modals/createModal.js

app/frontend/js/modals/transformationDefinitionSettingsModal.js

app/frontend/js/modals/updateExtractionDefinitionModal.js

config/brakeman.ignore

app/frontend/js/modals/transformationDefinitionSettingsModal.js

app/frontend/js/modals/updateExtractionDefinitionModal.js

Code review feedback :)

Split large files into batches of 100 records, rather than individually, to make transforming and dealing with them more efficient.

isabel-anastasiadis-boost

Looks good!

Allow the user to select that the extraction needs to be split from creation. Allow the user to run sample and transform data when the extraction needs to be split.

richardmatthewsdev added 13 commits October 30, 2023 16:44

fix(large_files): Small bug on extraction job page

8275274

Since we have removed per page from the extraction definition these fields no longer work.

feat(large_files): File Split Worker

f75a9e0

Initial idea for a sidekiq worker that can split large files into lots of small ones.

feat(large_files): Split Large XML files

0f51fff

Initial working with XML format.

feat(large_files): Add split related fields to the extraction definition

1a9bb84

I think this could be refactored in the future if we move the record selector to the harvest block.

feat(large_files): UI Elements for Extraction Definition

e3f5fa3

Add the ability for the operator to say if a file needs to be split or not.

feat(large_files): Hide Preview

2b4318e

Hide the extraction definition preview button if the file is large.

feat(large_documents): Add the Split Worker into the workflow

e56b5d1

If the user says the document needs to be split, queue it after the extraction has completed.

test(large_files): Split Worker Spec

47fcd22

Add tests for functionality introduced by the Split Worker.

test(large_files): Extraction Execution

78aec15

Add test for preventing the Transformation when the extracted file needs to be split.

Brakeman, ERB Lint, and Prettier

8abd08c

Design Review feedback + modal refactoring

f68f930

test:(large_files) Update tests following split_selector rename

4302ec5

style(large_files): Linters

6241bf7

ERB Lint and Prettier

richardmatthewsdev requested review from isabel-anastasiadis-boost and motizuki November 1, 2023 00:54

richardmatthewsdev added 2 commits November 1, 2023 14:05

Design review changes

c48baa9

Prettier

04a140b

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/entrypoints/application.js Outdated Show resolved Hide resolved

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/js/modals/createModal.js Outdated Show resolved Hide resolved

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/js/modals/transformationDefinitionSettingsModal.js Outdated Show resolved Hide resolved

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/js/modals/updateExtractionDefinitionModal.js Outdated Show resolved Hide resolved

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

config/brakeman.ignore Outdated Show resolved Hide resolved

richardmatthewsdev added 3 commits November 1, 2023 16:51

Code review feedback breaking up modals

7c58519

Prettier

faf1959

Add a note to file access brakeman ignore

099a6e8

motizuki approved these changes Nov 1, 2023

View reviewed changes

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/js/modals/transformationDefinitionSettingsModal.js Outdated Show resolved Hide resolved

isabel-anastasiadis-boost reviewed Nov 1, 2023

View reviewed changes

app/frontend/js/modals/updateExtractionDefinitionModal.js Outdated Show resolved Hide resolved

richardmatthewsdev added 2 commits November 2, 2023 10:05

refactor(large_files): Transformation Definition Settings Modal

55ede60

Code review feedback :)

Prettier

1bc75c3

richardmatthewsdev added 2 commits November 2, 2023 10:11

Rename bind of record selector for clarity

e924cda

feat(large_files): Batches

48b0b74

Split large files into batches of 100 records, rather than individually, to make transforming and dealing with them more efficient.

isabel-anastasiadis-boost approved these changes Nov 1, 2023

View reviewed changes

richardmatthewsdev added 3 commits November 2, 2023 13:36

feat(large_files): Feedback from Dan

8783e0e

Allow the user to select that the extraction needs to be split from creation. Allow the user to run sample and transform data when the extraction needs to be split.

Prettier

65db8b8

Update wording of split tooltip

b6fb736

richardmatthewsdev merged commit d0fc391 into main Nov 2, 2023
7 checks passed

richardmatthewsdev deleted the rm/large-extractions branch November 2, 2023 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

richardmatthewsdev commented Oct 30, 2023 •

edited

Loading

isabel-anastasiadis-boost left a comment

SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

SJ SLOWNESS WITH LARGE EXTRACTIONS: As Tim I want SJ to handle large extractions efficiently, so that my workflow is smooth and my experience is good #47

Conversation

richardmatthewsdev commented Oct 30, 2023 • edited Loading

isabel-anastasiadis-boost left a comment

Choose a reason for hiding this comment

richardmatthewsdev commented Oct 30, 2023 •

edited

Loading