gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

consideRatio · 2023-09-13T14:32:31Z

This is the resolution to #2947, because the 2i2c cluster's core node pool got itself a balanced disk, and can therefore run the ingress nginx controller performant enough.

Since the cloudbank and 2i2c cluster was actively used, I transferred all the existing core node pool workloads to run in a temporary created node pool via the cloud console as an intermediate step.

k8s cluster upgrades

meom-ige from 1.26 -> 1.27
callysto from 1.25 -> 1.27
2i2c-uk from 1.24 -> 1.27
m2lines from 1.25 -> 1.27
linked-earth from 1.27 -> 1.27

standard disk -> pd-balanced on core nodes

2i2c
2i2c-uk
callysto
cloudbank
meom-ige
m2lines

transitions to n2

2i2c transitions from n1- to n2-
m2lines transitions from n1- to n2-
meom-ige transitions from n1- to n2-, and being a daskhub, also from -highmem-2 to -highmem-4 to ensure it can fit a prometheus-server consuming memory as a daskhub
linked-earth transitions from e2- to n2- (historically this was me testing it out)

terraform/gcp/projects/meom-ige.tfvars

yuvipanda

I'd eventually like us to do a swipe through various clusters and look at resizing prometheus as well, now that things are less broken there. But no need to block that on this one, although I'd have preferred this PR to have just dealt with the 2i2c cluster.

yuvipanda · 2023-09-13T20:11:26Z

Ah, I see perhaps that this resizing is in response to the oscillating pagerduty alerts? Is a bit unclear to me, but ok to try if that is the case.

consideRatio · 2023-09-13T20:12:24Z

Ah, I see perhaps that this resizing is in response to the oscillating pagerduty alerts? Is a bit unclear to me, but ok to try if that is the case.

Yes! It was apparently very broken, with a user pod stuck with DNS issues to mount NFS for 36 hours for example.

This will force a recreation of core nodes, but not having this has turned out to break the pilot-hubs cluster and meom-ige, so we really need to do this if there is a project without it already.

github-actions · 2023-09-13T23:36:16Z

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider	Cluster Name	Upgrade Support?	Reason for Support Redeploy	Upgrade Staging?	Reason for Staging Redeploy
gcp	linked-earth	No		Yes	Following helm chart values files were modified: common.values.yaml

Production deployments

Cloud Provider	Cluster Name	Hub Name	Reason for Redeploy
gcp	linked-earth	prod	Following helm chart values files were modified: common.values.yaml

github-actions · 2023-09-13T23:45:43Z

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/6179039712

consideRatio requested a review from a team as a code owner September 13, 2023 14:32

github-actions bot assigned consideRatio Sep 13, 2023

This comment was marked as resolved.

Sign in to view

consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from ac4f203 to a5e4262 Compare September 13, 2023 20:02

yuvipanda reviewed Sep 13, 2023

View reviewed changes

terraform/gcp/projects/meom-ige.tfvars Outdated Show resolved Hide resolved

consideRatio changed the title ~~2i2c, terraform: update core node from n1- to n2-highmem-4~~ 2i2c and meom-ige, terraform: update core nodes to n2-highmem-4 with pd-balanced disks Sep 13, 2023

yuvipanda approved these changes Sep 13, 2023

View reviewed changes

terraform, gcp: make core node pd-balanced

7c717b7

This will force a recreation of core nodes, but not having this has turned out to break the pilot-hubs cluster and meom-ige, so we really need to do this if there is a project without it already.

consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from a5e4262 to de5d712 Compare September 13, 2023 22:29

consideRatio changed the title ~~2i2c and meom-ige, terraform: update core nodes to n2-highmem-4 with pd-balanced disks~~ gcp: k8s version updates, transitions to pd-balanced disks Sep 13, 2023

consideRatio added 5 commits September 14, 2023 00:46

2i2c, terraform: update core node from n1- to n2-highmem-4

27b739d

meom-ige: upgrade k8s 1.26 -> 1.27, n2-highmem-4, pd-balanced

79e3a1e

callysto: upgrade k8s from 1.25 to 1.27, and use pd-balanced disks

5ec0ec0

2i2c-uk: upgrade k8s 1.24 -> 1.27, and use pd-balanced disks

4a113fc

m2lines: upgrade k8s 1.25 -> 1.27, and use pd-balanced disks

991d121

consideRatio force-pushed the pr/2i2c-pilot-hubs-update-core-node-pool branch from de5d712 to 991d121 Compare September 13, 2023 22:46

consideRatio added 3 commits September 14, 2023 01:16

cloudbank: use pd-balanaced disks

5bc922a

qcl: update metadata to reflect current state

2c4a941

linked-earth: upgrade k8s 1.26 -> 1.27, transition from e2- to n2-

a862afe

consideRatio changed the title ~~gcp: k8s version updates, transitions to pd-balanced disks~~ gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes Sep 13, 2023

consideRatio requested a review from yuvipanda September 13, 2023 23:43

yuvipanda approved these changes Sep 13, 2023

View reviewed changes

consideRatio merged commit 31ba2d8 into 2i2c-org:master Sep 13, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

consideRatio commented Sep 13, 2023 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

yuvipanda left a comment

yuvipanda commented Sep 13, 2023

consideRatio commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

gcp: k8s version updates, transitions to pd-balanced disks, towards n2- nodes #3131

Conversation

consideRatio commented Sep 13, 2023 • edited Loading

k8s cluster upgrades

standard disk -> pd-balanced on core nodes

transitions to n2

This comment was marked as resolved.

This comment was marked as resolved.

yuvipanda left a comment

Choose a reason for hiding this comment

yuvipanda commented Sep 13, 2023

consideRatio commented Sep 13, 2023

github-actions bot commented Sep 13, 2023

Support and Staging deployments

Production deployments

github-actions bot commented Sep 13, 2023

consideRatio commented Sep 13, 2023 •

edited

Loading