Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect AWS/P5 (H100) instances to CI #1133

Closed
wants to merge 2 commits into from
Closed

Connect AWS/P5 (H100) instances to CI #1133

wants to merge 2 commits into from

Conversation

olupton
Copy link
Collaborator

@olupton olupton commented Oct 31, 2024

Make use of _test_slurm_pyxis.yaml a little less verbose, and add a new GPU_TYPE input to it. Also change the default to non-exclusive allocation, with exclusive allocation available on request.

@olupton olupton requested a review from yhtang October 31, 2024 13:47
Make use of _test_slurm_pyxis.yaml a little less verbose, and add a new
GPU_TYPE input to it. Also change the default to non-exclusive
allocation, with exclusive allocation available on request.
@mjsML mjsML self-requested a review October 31, 2024 13:58
Copy link
Member

@mjsML mjsML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is making a massive assumption w.r.t scheduling. IIUC we are tearing down the SLURM cluster in AWS?

@@ -2,29 +2,11 @@ name: ~test multi-node jobs via SLURM+Pyxis

on:
workflow_call:
secrets:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you remove it? It will make the workflow less adaptable to more clusters.

run-test:
name: ${{ inputs.NAME }}
runs-on: jumpbox
env:
OUTPUT_BASEDIR: /nfs/cluster
SLURM_CLUSTER: ${{ inputs.GPU_TYPE == 'A100' && secrets.CLUSTER_LOGIN_USER || secrets.AWS_LOGIN_USER }}@${{ inputs.GPU_TYPE == 'A100' && vars.HOSTNAME_SLURM_LOGIN || vars.HOSTNAME_AWS_SLURM }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic, if necessary, shall be better placed in the calling workflow to maximize reusability of this workflow (which is essentially a function to be called by the master workflow).

@olupton olupton closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants