Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability of copying files from lustre to s3 bucket on AWS #2873

Closed
wants to merge 86 commits into from

Conversation

weihuang-jedi
Copy link
Contributor

@weihuang-jedi weihuang-jedi commented Aug 28, 2024

Description

Add capability to allow global-workflow copying file from lustre to s3 bucket on AWS, and other CSPs.

Resolves #2872

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? YES
  • Does this change require an update to any of the following submodules? YES (If YES, please add a link to any PRs that are pending.)
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

[Wei.Huang@epicweiaws-130 wxflow]$ git diff fsutils.py
diff --git a/src/wxflow/fsutils.py b/src/wxflow/fsutils.py
index af9e5b8..5d6d4da 100644
--- a/src/wxflow/fsutils.py
+++ b/src/wxflow/fsutils.py
@@ -81,6 +81,9 @@ def cp(source: str, target: str) -> None:
if os.path.isdir(target):
target = os.path.join(target, os.path.basename(source))

  • if os.path.isfile(target):
  •    return
    
  • try:
    shutil.copy2(source, target)

(Will file PR, if the general idea is OK with reviewers).

How has this been tested?

  • Clone and build on AWS
  • Run C48 Coupled C48 on AWS

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

weihuang-jedi and others added 30 commits June 18, 2024 23:05
scripts/exglobal_atmos_products.sh Fixed Show fixed Hide fixed
scripts/exglobal_atmos_products.sh Fixed Show fixed Hide fixed
scripts/exglobal_atmos_products.sh Fixed Show fixed Hide fixed
scripts/exglobal_atmos_products.sh Fixed Show fixed Hide fixed
@weihuang-jedi weihuang-jedi marked this pull request as ready for review August 29, 2024 16:08
@weihuang-jedi
Copy link
Contributor Author

I understand that this PR is far from perfect.
But want to show a way that we can copy data from /lustre products to s3 bucket.
We want to see how other people think of this, and iterate to a better way.
Thanks,
Wei

@WalterKolczynski-NOAA
Copy link
Contributor

Unless there is need to populate the bucket right away, I think this is the wrong approach. It will be annoying to add and maintain, and also injects a bunch of unneeded code into production scripts.

Instead of adding a bunch of code to all of the jobs that write to COM as you've started here with products, we should do it at the end of the cycle, either by piggybacking on the existing archive job or creating a new job similar to the archive job that only runs on AWS. Then all the copying can be done in one go and in one place segregated from everything else.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @WalterKolczynski-NOAA's assessment and review.

@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as draft September 17, 2024 14:51
@weihuang-jedi
Copy link
Contributor Author

@WalterKolczynski-NOAA and @aerorahul,
I was thinking something similar to what Walter said, and only worried that will increase the total wall clock time of Global-Workflow, as if we make the copy work done at the archive stage.
The approach which I took here is an easy one to work, but as Walter said, make a lot of change to the current production code.
let me re-think about this request, and see if I can work out a way similar as archive.
Certainly, any suggestions/comments are more than welcome.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should never update this script in this fashion -- ever.
It is unsupported legacy code that happens to have some vestigial logic on EXPDIR.
These updates are speculative and irrelevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a nice catch. I should not do any change here.
I will remove those, even this PR is not going any where.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, Just wanted to keep you up to speed.

@DavidHuber-NOAA
Copy link
Contributor

Opened issue NOAA-EMC/wxflow#42 to add bucket transfer capability to wxflow.

@aerorahul
Copy link
Contributor

Closing after consulting w/ @weihuang-jedi
This capability will be coordinated in a future sprint.

@aerorahul aerorahul closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add copy products to s3 bucket capability on AWS
5 participants