Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of the nodes to the secondary cluster #253

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 50 additions & 22 deletions resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
default: vggp-v60-j322-692e75a7c101-main
gpu: vggp-v60-gpu-j322-692e75a7c101-main-kernel-4.18.0-477.21.1.el8_8-nvidia
secure: vggp-v60-secure-j322-692e75a7c101-main
htcondor-secondary: vgcn~workers+internal~rockylinux-8.6-x86_64~2023-10-26~43739~htcondor-secondary~ebb20b8~kysrpex_local_build

Check warning on line 7 in resources.yaml

View workflow job for this annotation

GitHub Actions / yamllint

7:121 [line-length] line too long (129 > 120 characters)
htcondor-secondary-gpu: vgcn~workers-gpu+internal~rockylinux-8.6-x86_64~2023-11-16~34096~htcondor-secondary~a23fbb0~kysrpex_local_build

Check warning on line 8 in resources.yaml

View workflow job for this annotation

GitHub Actions / yamllint

8:121 [line-length] line too long (137 > 120 characters)
network: bioinf
secgroups:
- ufr-ingress
Expand Down Expand Up @@ -50,21 +50,24 @@
# mem_limit_policy: hard
# mem_reserved_size: 2048

worker-fetch:
worker-fetch-htcondor-secondary:
count: 1
flavor: c1.c36m100d50
group: upload
worker-interactive:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-interactive-htcondor-secondary:
count: 13 #8
flavor: c1.c36m100d50
group: interactive
docker: true
image: secure
image: htcondor-secondary
secondary_htcondor_cluster: true
volume:
size: 1024
type: default
worker-c28m475:
count: 12 #19
worker-c28m475-htcondor-secondary:
count: 10 #19
flavor: c1.c28m475d50
group: compute
docker: true
Expand All @@ -74,7 +77,9 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c28m225:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c28m225-htcondor-secondary:
count: 0 #7
flavor: c1.c28m225d50
group: compute_test
Expand All @@ -85,7 +90,9 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c36m100:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c36m100-htcondor-secondary:
count: 17 #32
flavor: c1.c36m100d50
group: compute
Expand All @@ -96,8 +103,10 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c36m225:
count: 15 #15
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c36m225-htcondor-secondary:
count: 14 #15
flavor: c1.c36m225d50
group: compute
docker: true
Expand All @@ -107,7 +116,9 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c36m900:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c36m900-htcondor-secondary:
count: 1 #1 it's a c1.c36m975d50 host with probably a faulty memory bank
flavor: c1.c36m900d50
group: compute
Expand All @@ -118,7 +129,9 @@
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
worker-c36m975:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c36m975-htcondor-secondary:
count: 8 #8
flavor: c1.c36m975d50
group: compute
Expand All @@ -129,7 +142,9 @@
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
worker-c28m935:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c28m935-htcondor-secondary:
count: 4 #4
flavor: c1.c28m935d50
group: compute
Expand All @@ -140,7 +155,9 @@
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
worker-c28m875:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c28m875-htcondor-secondary:
count: 2 #2
flavor: c1.c28m875d50
group: compute
Expand All @@ -151,15 +168,19 @@
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
worker-c64m2:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c64m2-htcondor-secondary:
count: 1 #1
flavor: c1.c60m1975d50
group: compute
docker: true
volume:
size: 1024
type: default
worker-c120m225:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c120m225-htcondor-secondary:
count: 12 #12
flavor: c1.c120m225d50
group: compute
Expand All @@ -170,7 +191,9 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c120m425:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c120m425-htcondor-secondary:
count: 22
flavor: c1.c120m425d50
group: compute
Expand All @@ -181,7 +204,9 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c125m425:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c125m425-htcondor-secondary:
count: 16 #16
flavor: c1.c125m425d50
group: compute
Expand All @@ -192,31 +217,34 @@
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
worker-c14m40g1:
image: htcondor-secondary
secondary_htcondor_cluster: true
worker-c14m40g1-htcondor-secondary:
count: 4 #4
flavor: g1.c14m40g1d50
group: compute_gpu
image: gpu
docker: true
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: soft
mem_reserved_size: 1024
worker-c8m40g1:
image: htcondor-secondary-gpu
secondary_htcondor_cluster: true
worker-c8m40g1-htcondor-secondary:
count: 4 #4
flavor: g1.c8m40g1d50
group: compute_gpu
image: gpu
docker: true
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: soft
mem_reserved_size: 1024

image: htcondor-secondary-gpu
secondary_htcondor_cluster: true

training-ga-e:
count: 3
Expand Down
Loading