Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Cgroups default for all resource types and apply hard mem policy and set 2GB as default reserved memory #312

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 0 additions & 39 deletions resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

# 18/6/2024: Updated the max number of possible workers
nodes_inventory:
c1.c28m225d50: 5 #(16.04.2024: RZ swapped the underlying servers for a 4 in 1 node and this will be of a different flavor and we need to wait to get the hardware)

Check warning on line 23 in resources.yaml

View workflow job for this annotation

GitHub Actions / yamllint

23:121 [line-length] line too long (164 > 120 characters)
c1.c28m475d50: 19
c1.c36m100d50: 30
c1.c36m225d50: 15
Expand Down Expand Up @@ -59,9 +59,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
image: default

worker-c28m225:
Expand All @@ -72,9 +69,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
image: default

worker-c36m100:
Expand All @@ -85,9 +79,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
image: default

worker-c36m225:
Expand All @@ -98,9 +89,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
image: default

worker-c36m900:
Expand All @@ -111,9 +99,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
image: default

worker-c36m975:
Expand All @@ -124,9 +109,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: soft
mem_reserved_size: 2048
image: default

# 18/06/24: Hardware is still connected to the old cloud
Expand All @@ -138,9 +120,6 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: soft
# mem_reserved_size: 2048
# image: default

# 18/06/24: Hardware is still connected to the old cloud
Expand All @@ -152,9 +131,6 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: soft
# mem_reserved_size: 2048
# image: default

worker-c64m2:
Expand All @@ -176,9 +152,6 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: hard
# mem_reserved_size: 2048
# image: default

# 18/06/24: Hardware is still connected to the old cloud
Expand All @@ -190,9 +163,6 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: hard
# mem_reserved_size: 2048
# image: default

worker-c125m425:
Expand All @@ -203,9 +173,6 @@
volume:
size: 1024
type: default
cgroups:
mem_limit_policy: hard
mem_reserved_size: 2048
image: default

# 18/06/24: Hardware is still connected to the old cloud.
Expand All @@ -217,12 +184,9 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: soft
# mem_reserved_size: 1024
# image: gpu

# 18/06/24: Hardware is still connected to the old cloud. This GPU flavor shares the host with the flavor c1.c28m935d50.

Check warning on line 189 in resources.yaml

View workflow job for this annotation

GitHub Actions / yamllint

189:121 [line-length] line too long (122 > 120 characters)
# worker-c8m40g1:
# count: 4 #4
# flavor: g1.c8m40g1d50
Expand All @@ -231,9 +195,6 @@
# volume:
# size: 1024
# type: default
# cgroups:
# mem_limit_policy: soft
# mem_reserved_size: 1024
# image: gpu

# Trainings
Expand Down
12 changes: 2 additions & 10 deletions userdata.yaml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,9 @@ write_files:
GalaxyDockerHack = {{ docker }}
STARTD_ATTRS = GalaxyTraining, GalaxyGroup, GalaxyCluster, GalaxyDockerHack
Rank = StringListMember(MY.GalaxyGroup, TARGET.Group)
{% if cgroups is defined -%}
BASE_CGROUP = htcondor
{% if cgroups.mem_limit_policy is defined -%}
CGROUP_MEMORY_LIMIT_POLICY = {{ cgroups.mem_limit_policy }}
{% endif -%}
{% if cgroups.mem_reserved_size is defined -%}
RESERVED_MEMORY = {{ cgroups.mem_reserved_size }}
{% else -%}
RESERVED_MEMORY = 1024
{% endif -%}
{% endif %}
CGROUP_MEMORY_LIMIT_POLICY = hard
RESERVED_MEMORY = 2048
owner: root:root
path: /etc/condor/config.d/99-cloud-init.conf
permissions: "0644"
Expand Down
Loading