Implement recursive folder support for S3 bucket syncs #284

alanking · 2024-09-30T18:34:27Z

Currently, all S3 bucket syncs treat the entire bucket like a flat directory. While this is the nature of S3 buckets, treating "/" characters as individual "sub-folders" in the bucket could massively improve performance. The Minio.list_objects call in the S3 bucket task specifies recursive=True:

irods_capability_automated_ingest/irods_capability_automated_ingest/tasks/s3_bucket_sync.py

Line 122 in ec34cb1

itr = client.list_objects(bucket_name, prefix=prefix, recursive=True)

This should probably be False, but that would require a lot of other changes.

Additionally, this would greatly improve the potential implementation of #282. As it stands, a query to hold all of the data objects under the target collection is required. This would mean that the entire S3 bucket is being held in memory (possibly - depends on the implementation of Minio.list_objects) and the entire target collection's contents as well, which could potentially be very large.

The text was updated successfully, but these errors were encountered:

alanking added the enhancement New feature or request label Sep 30, 2024

alanking modified the milestone: 0.6.0 Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement recursive folder support for S3 bucket syncs #284

Implement recursive folder support for S3 bucket syncs #284

alanking commented Sep 30, 2024

Implement recursive folder support for S3 bucket syncs #284

Implement recursive folder support for S3 bucket syncs #284

Comments

alanking commented Sep 30, 2024