-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IT-2361] Custom Batch compute environment #180
base: dev
Are you sure you want to change the base?
Conversation
templates/nextflow-launch.yaml
Outdated
MaxValue: 16 | ||
Default: 1 | ||
Resources: | ||
UserManagedPolicy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is setup from the configure AWS batch manually docs however I'm not sure how it's used. I'm wondering whether this is needed at all? would the user actually be using the nextflow-launch-iam-policy.yaml instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they meant "IAM user" or "Tower user" in this context. Either way, Tower users wouldn't be launching workflows directly on AWS Batch themselves. They would be launching them via Tower. I also don't see how this policy gets used later in the documentation, so I'm not sure we need it.
Here's the general flow:
- A user submit a workflow using the Tower web UI or API
- The Tower ECS service uses its IAM role to assume a project-specific role (the goal of [IT-2360] Setup IAM roles for tower #181), which has permissions to that project's Batch resources and S3 buckets.
- The project-specific role submits the Nextflow head job to an on-demand queue (we configure Tower so it knows which Batch queue to use).
- The Nextflow head job uses its own role to submit the worker jobs to either the on-demand or spot queue.
templates/nextflow-launch.yaml
Outdated
'Fn::Sub': '${AWS::Region}-nextflow-ecs-cluster-EcsLaunchTemplate' | ||
Version: !ImportValue | ||
'Fn::Sub': '${AWS::Region}-nextflow-ecs-cluster-EcsLaunchTemplateLatestVersionNumber' | ||
JobQueueOnDemand: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these on demand and spot queues are separate and requires users to specify which queue to put their jobs on. I assume NF tower has a way for users to specify this either using the NF CLI or console?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zaro0508 for this work. Bruno is on vacation for the next 2 weeks, so I'd like to wait for him to be back to fully test this. In the interim, I think that we need to have a system for triggering the production deployment rather than relying on the successful deployment to dev (like using github tags/releases). The reason is that some of these changes have a big impact on the way users interact with Tower, so I'd want to go in and make sure everything works as expected by running test workflows on the dev deployment. |
By all means @thomasyu888, do what you think is best. if you need IT help then please enter a jira issue and let us know. |
Thanks @zaro0508! I created a ticket to track: https://sagebionetworks.jira.com/browse/IT-2790 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't taken a close look at the resources since this needs to be restructured to be created for each Tower project. Once we have that set up, we can test the deployment.
templates/nextflow-launch.yaml
Outdated
MaxValue: 16 | ||
Default: 1 | ||
Resources: | ||
UserManagedPolicy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if they meant "IAM user" or "Tower user" in this context. Either way, Tower users wouldn't be launching workflows directly on AWS Batch themselves. They would be launching them via Tower. I also don't see how this policy gets used later in the documentation, so I'm not sure we need it.
Here's the general flow:
- A user submit a workflow using the Tower web UI or API
- The Tower ECS service uses its IAM role to assume a project-specific role (the goal of [IT-2360] Setup IAM roles for tower #181), which has permissions to that project's Batch resources and S3 buckets.
- The project-specific role submits the Nextflow head job to an on-demand queue (we configure Tower so it knows which Batch queue to use).
- The Nextflow head job uses its own role to submit the worker jobs to either the on-demand or spot queue.
templates/nextflow-launch.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to deploy these resources once per Tower project, similar to what I described in my review for #181. I'll let you determine the best way to lay this out using Sceptre because I know tower-project.j2
is already big enough as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to deploy these resources once per Tower project
i have moved the deployment of nextflow-launch.yaml into tower-project.j2
Setup to create a tower laucnh stack for every project. We move the tower launch template to aws-infra repo[1] and deploy it as a nested stack. [1] https://github.com/Sage-Bionetworks/aws-infra
🎉 All dependencies have been resolved ! |
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
Setup a custom batch compute environment to be used with nexflow
tower in a tower launch configuration so that we can avoid some of the
limitations of the tower forge configuration.
depends on Sage-Bionetworks/aws-infra#392