[IT-2361] Custom Batch compute environment #180

zaro0508 · 2023-04-10T17:44:12Z

Setup a custom batch compute environment to be used with nexflow
tower in a tower launch configuration so that we can avoid some of the
limitations of the tower forge configuration.

depends on Sage-Bionetworks/aws-infra#392

zaro0508 · 2023-04-10T17:47:09Z

templates/nextflow-launch.yaml

+    MaxValue: 16
+    Default: 1
+Resources:
+  UserManagedPolicy:


this is setup from the configure AWS batch manually docs however I'm not sure how it's used. I'm wondering whether this is needed at all? would the user actually be using the nextflow-launch-iam-policy.yaml instead?

I'm not sure if they meant "IAM user" or "Tower user" in this context. Either way, Tower users wouldn't be launching workflows directly on AWS Batch themselves. They would be launching them via Tower. I also don't see how this policy gets used later in the documentation, so I'm not sure we need it.

Here's the general flow:

A user submit a workflow using the Tower web UI or API

The Tower ECS service uses its IAM role to assume a project-specific role (the goal of [IT-2360] Setup IAM roles for tower #181), which has permissions to that project's Batch resources and S3 buckets.

The project-specific role submits the Nextflow head job to an on-demand queue (we configure Tower so it knows which Batch queue to use).

The Nextflow head job uses its own role to submit the worker jobs to either the on-demand or spot queue.

zaro0508 · 2023-04-10T17:50:02Z

templates/nextflow-launch.yaml

+            'Fn::Sub': '${AWS::Region}-nextflow-ecs-cluster-EcsLaunchTemplate'
+          Version: !ImportValue
+            'Fn::Sub': '${AWS::Region}-nextflow-ecs-cluster-EcsLaunchTemplateLatestVersionNumber'
+  JobQueueOnDemand:


these on demand and spot queues are separate and requires users to specify which queue to put their jobs on. I assume NF tower has a way for users to specify this either using the NF CLI or console?

Correct, there is a way to do this within the console and it looks like this:

thomasyu888 · 2023-04-10T18:00:08Z

Thanks @zaro0508 for this work. Bruno is on vacation for the next 2 weeks, so I'd like to wait for him to be back to fully test this.

In the interim, I think that we need to have a system for triggering the production deployment rather than relying on the successful deployment to dev (like using github tags/releases). The reason is that some of these changes have a big impact on the way users interact with Tower, so I'd want to go in and make sure everything works as expected by running test workflows on the dev deployment.

zaro0508 · 2023-04-11T22:24:16Z

By all means @thomasyu888, do what you think is best. if you need IT help then please enter a jira issue and let us know.

thomasyu888 · 2023-04-12T16:25:31Z

Thanks @zaro0508! I created a ticket to track: https://sagebionetworks.jira.com/browse/IT-2790

BrunoGrandePhD

I haven't taken a close look at the resources since this needs to be restructured to be created for each Tower project. Once we have that set up, we can test the deployment.

BrunoGrandePhD · 2023-05-16T15:31:58Z

templates/nextflow-launch.yaml

+    MaxValue: 16
+    Default: 1
+Resources:
+  UserManagedPolicy:


I'm not sure if they meant "IAM user" or "Tower user" in this context. Either way, Tower users wouldn't be launching workflows directly on AWS Batch themselves. They would be launching them via Tower. I also don't see how this policy gets used later in the documentation, so I'm not sure we need it.

Here's the general flow:

A user submit a workflow using the Tower web UI or API

The Tower ECS service uses its IAM role to assume a project-specific role (the goal of [IT-2360] Setup IAM roles for tower #181), which has permissions to that project's Batch resources and S3 buckets.

The project-specific role submits the Nextflow head job to an on-demand queue (we configure Tower so it knows which Batch queue to use).

The Nextflow head job uses its own role to submit the worker jobs to either the on-demand or spot queue.

BrunoGrandePhD · 2023-05-16T15:33:24Z

templates/nextflow-launch.yaml

We will need to deploy these resources once per Tower project, similar to what I described in my review for #181. I'll let you determine the best way to lay this out using Sceptre because I know tower-project.j2 is already big enough as it is.

We will need to deploy these resources once per Tower project

i have moved the deployment of nextflow-launch.yaml into tower-project.j2

Setup to create a tower laucnh stack for every project. We move the tower launch template to aws-infra repo[1] and deploy it as a nested stack. [1] https://github.com/Sage-Bionetworks/aws-infra

dpulls · 2023-07-10T19:21:00Z

🎉 All dependencies have been resolved !

sonarcloud · 2023-12-15T04:47:00Z

Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

zaro0508 added 3 commits April 10, 2023 10:10

WIP: setup tower launch resources

7ca9719

Merge branch 'main' into it-2361

94e9368

make resource names more generic

f5d54cc

zaro0508 requested a review from a team as a code owner April 10, 2023 17:44

zaro0508 commented Apr 10, 2023

View reviewed changes

Merge branch 'dev' into it-2361

a2a9cb6

BrunoGrandePhD suggested changes May 16, 2023

View reviewed changes

zaro0508 added 2 commits May 19, 2023 08:00

Merge branch 'dev' into it-2361

1c2341d

create tower launch stack for every project

bb1407c

Setup to create a tower laucnh stack for every project. We move the tower launch template to aws-infra repo[1] and deploy it as a nested stack. [1] https://github.com/Sage-Bionetworks/aws-infra

zaro0508 requested review from thomasyu888 and BrunoGrandePhD May 19, 2023 15:46

BrunoGrandePhD removed their request for review May 25, 2023 20:32

zaro0508 requested a review from BWMac October 22, 2024 14:38

BWMac approved these changes Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IT-2361] Custom Batch compute environment #180

[IT-2361] Custom Batch compute environment #180

zaro0508 commented Apr 10, 2023 •

edited

Loading

zaro0508 Apr 10, 2023

BrunoGrandePhD May 16, 2023

zaro0508 Apr 10, 2023

thomasyu888 Apr 11, 2023

thomasyu888 commented Apr 10, 2023 •

edited

Loading

zaro0508 commented Apr 11, 2023

thomasyu888 commented Apr 12, 2023

BrunoGrandePhD left a comment

BrunoGrandePhD May 16, 2023

BrunoGrandePhD May 16, 2023

zaro0508 May 19, 2023

dpulls bot commented Jul 10, 2023

sonarcloud bot commented Dec 15, 2023

[IT-2361] Custom Batch compute environment #180

Are you sure you want to change the base?

[IT-2361] Custom Batch compute environment #180

Conversation

zaro0508 commented Apr 10, 2023 • edited Loading

zaro0508 Apr 10, 2023

Choose a reason for hiding this comment

BrunoGrandePhD May 16, 2023

Choose a reason for hiding this comment

zaro0508 Apr 10, 2023

Choose a reason for hiding this comment

thomasyu888 Apr 11, 2023

Choose a reason for hiding this comment

thomasyu888 commented Apr 10, 2023 • edited Loading

zaro0508 commented Apr 11, 2023

thomasyu888 commented Apr 12, 2023

BrunoGrandePhD left a comment

Choose a reason for hiding this comment

BrunoGrandePhD May 16, 2023

Choose a reason for hiding this comment

BrunoGrandePhD May 16, 2023

Choose a reason for hiding this comment

zaro0508 May 19, 2023

Choose a reason for hiding this comment

dpulls bot commented Jul 10, 2023

sonarcloud bot commented Dec 15, 2023

Quality Gate passed

zaro0508 commented Apr 10, 2023 •

edited

Loading

thomasyu888 commented Apr 10, 2023 •

edited

Loading