diff --git a/CODEOWNERS b/CODEOWNERS index e445a94..f97fc52 100644 --- a/CODEOWNERS +++ b/CODEOWNERS @@ -1 +1 @@ -* @Fovty @jonathan-mayer @JTaeuber +* @Fovty @jonathan-mayer @JTaeuber @samuel-esp diff --git a/README.md b/README.md index a9f063c..a04db4c 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,19 @@ -Python Kubernetes Downscaler -===================== +# Python Kubernetes Downscaler + +![GitHub Release](https://img.shields.io/github/v/release/caas-team/py-kube-downscaler?style=flat&link=%2F..%2F..%2Fcommits%2F) +![GitHub Issues](https://img.shields.io/github/issues/caas-team/py-kube-downscaler) +![GitHub License](https://img.shields.io/github/license/caas-team/py-kube-downscaler) +![Slack Workspace](https://img.shields.io/badge/slack-py--kube--downscaler-dark_green?style=flat&logo=slack&link=https%3A%2F%2Fcommunityinviter.com%2Fapps%2Fpy-kube-downscaler%2Fpy-kube-downscaler) This is a fork of [hjacobs/kube-downscaler](https://codeberg.org/hjacobs/kube-downscaler) which is no longer maintained. Scale down / "pause" Kubernetes workload (`Deployments`, `StatefulSets`, -`HorizontalPodAutoscalers`, `DaemonSets`, `CronJobs`, `Jobs`, `PodDisruptionBudgets`, `Argo Rollouts` and `Keda ScaledObjects` too !) during non-work hours. +`HorizontalPodAutoscalers`, `DaemonSets`, `CronJobs`, `Jobs`, `PodDisruptionBudgets`, `Argo Rollouts` and `Keda ScaledObjects` too !) during non-work hours. -**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* + +**Table of Contents** _generated with [DocToc](https://github.com/thlorenz/doctoc)_ - [Python Kubernetes Downscaler](#python-kubernetes-downscaler) - [Concepts](#concepts) @@ -40,7 +45,6 @@ Scale down / "pause" Kubernetes workload (`Deployments`, `StatefulSets`, - ## Concepts > :memo: `Deployments` are interchangeable by any kind of _supported workload_ for this whole guide unless explicitly stated otherwise. @@ -56,37 +60,36 @@ conditions are met: If true, the schedules are evaluated in the following order: - - `downscaler/downscale-period` or `downscaler/downtime` - annotation on the workload definition - - `downscaler/upscale-period` or `downscaler/uptime` - annotation on the workload definition - - `downscaler/downscale-period` or `downscaler/downtime` - annotation on the workload\'s namespace - - `downscaler/upscale-period` or `downscaler/uptime` - annotation on the workload\'s namespace - - `--upscale-period` or `--default-uptime` CLI argument - - `--downscale-period` or `--default-downtime` CLI argument - - `UPSCALE_PERIOD` or `DEFAULT_UPTIME` environment variable - - `DOWNSCALE_PERIOD` or `DEFAULT_DOWNTIME` environment - variable + - `downscaler/downscale-period` or `downscaler/downtime` + annotation on the workload definition + - `downscaler/upscale-period` or `downscaler/uptime` + annotation on the workload definition + - `downscaler/downscale-period` or `downscaler/downtime` + annotation on the workload\'s namespace + - `downscaler/upscale-period` or `downscaler/uptime` + annotation on the workload\'s namespace + - `--upscale-period` or `--default-uptime` CLI argument + - `--downscale-period` or `--default-downtime` CLI argument + - `UPSCALE_PERIOD` or `DEFAULT_UPTIME` environment variable + - `DOWNSCALE_PERIOD` or `DEFAULT_DOWNTIME` environment + variable - The workload\'s **namespace** is not part of the exclusion list: - - If you provide an exclusion list, it will be used in place - of the default (which includes only `kube-system`). + - If you provide an exclusion list, it will be used in place + of the default (which includes only `kube-system`). - The workload\'s label does not match the labels list. - The **workload\'s name** is not part of the exclusion list - The workload is not marked for exclusion (annotation - `downscaler/exclude: "true"` or - `downscaler/exclude-until: "2024-04-05"`) + `downscaler/exclude: "true"` or + `downscaler/exclude-until: "2024-04-05"`) - There are no active pods that force the whole cluster into uptime (annotation `downscaler/force-uptime: "true"`) - ### Minimum replicas The deployment, by default, **will be scaled down to zero replicas**. This can @@ -95,7 +98,6 @@ or via CLI with `--downtime-replicas`. Ex: `downscaler/downtime-replicas: "1"` - ### Specific workload In case of `HorizontalPodAutoscalers`, the `minReplicas` field cannot be set to zero and thus @@ -103,20 +105,18 @@ In case of `HorizontalPodAutoscalers`, the `minReplicas` field cannot be set to -> See later in [#Usage notes](#notes) - Regarding `CronJobs`, their state will be defined to `suspend: true` as you might expect. - ### Example use cases -- Deploy the downscaler to a test (non-prod) cluster with a default - uptime or downtime time range to scale down all deployments during - the night and weekend. -- Deploy the downscaler to a production cluster without any default - uptime/downtime setting and scale down specific deployments by - setting the `downscaler/uptime` (or `downscaler/downtime`) - annotation. This might be useful for internal tooling frontends - which are only needed during work time. +- Deploy the downscaler to a test (non-prod) cluster with a default + uptime or downtime time range to scale down all deployments during + the night and weekend. +- Deploy the downscaler to a production cluster without any default + uptime/downtime setting and scale down specific deployments by + setting the `downscaler/uptime` (or `downscaler/downtime`) + annotation. This might be useful for internal tooling frontends + which are only needed during work time. You need to combine the downscaler with an elastic cluster autoscaler to actually **save cloud costs**. The [official cluster @@ -125,14 +125,12 @@ and the [kube-aws-autoscaler](https://github.com/hjacobs/kube-aws-autoscaler) were tested to work fine with the downscaler. - ## Usage ### Helm Chart For detailed information on deploying the `py-kube-downscaler` using our Helm chart, please refer to the [Helm Chart README](./chart/README.md#Deploy-py-kube-downscaler-using-Helm-chart) in the chart directory. - ### Example configuration The example configuration uses the `--dry-run` as a safety flag to @@ -152,12 +150,12 @@ $ kubectl run nginx --image=nginx $ kubectl annotate deploy nginx 'downscaler/uptime=Mon-Fri 09:00-17:00 America/Buenos_Aires' ``` - ### Notes Note that the _default grace period_ of 15 minutes applies to the new nginx deployment, i.e. -* if the current time is not within `Mon-Fri 9-17 (Buenos Aires timezone)`, + +- if the current time is not within `Mon-Fri 9-17 (Buenos Aires timezone)`, it will downscale not immediately, but after 15 minutes. The downscaler will eventually log something like: @@ -168,20 +166,20 @@ INFO: Scaling down Deployment default/nginx from 1 to 0 replicas (uptime: Mon-Fr Note that in cases where a `HorizontalPodAutoscaler` (HPA) is used along with Deployments, consider the following: -- If downscale to 0 replicas is desired, the annotation should be - applied on the `Deployment`. This is a special case, since - `minReplicas` of 0 on HPA is not allowed. Setting Deployment - replicas to 0 essentially disables the HPA. In such a case, the HPA - will emit events like `failed to get memory utilization: unable to - get metrics for resource memory: no metrics returned from resource - metrics API` as there is no Pod to retrieve metrics from. -- If downscale greater than 0 is desired, the annotation should be - applied on the HPA. This allows for dynamic scaling of the Pods even - during downtime based upon the external traffic as well as maintain - a lower `minReplicas` during downtime if there is no/low traffic. **If - the Deployment is annotated instead of the HPA, it leads to a race - condition** where `py-kube-downscaler` scales down the Deployment and HPA - upscales it as its `minReplicas` is higher. +- If downscale to 0 replicas is desired, the annotation should be + applied on the `Deployment`. This is a special case, since + `minReplicas` of 0 on HPA is not allowed. Setting Deployment + replicas to 0 essentially disables the HPA. In such a case, the HPA + will emit events like `failed to get memory utilization: unable to +get metrics for resource memory: no metrics returned from resource +metrics API` as there is no Pod to retrieve metrics from. +- If downscale greater than 0 is desired, the annotation should be + applied on the HPA. This allows for dynamic scaling of the Pods even + during downtime based upon the external traffic as well as maintain + a lower `minReplicas` during downtime if there is no/low traffic. **If + the Deployment is annotated instead of the HPA, it leads to a race + condition** where `py-kube-downscaler` scales down the Deployment and HPA + upscales it as its `minReplicas` is higher. To enable Downscaler on HPA with `--downtime-replicas=1`, ensure to add the following annotations to Deployment and HPA. @@ -194,11 +192,12 @@ $ kubectl annotate hpa nginx 'downscaler/uptime=Mon-Fri 09:00-17:00 America/Buen ## Installation -KubeDownscaler offers two installation methods. +KubeDownscaler offers two installation methods. + - **Cluster Wide Access**: This method is dedicated for users who have total access to the Cluster and aspire to adopt the -tool throughout the cluster -- **Limited Access**: This method is dedicated to users who only have access to a limited number of namespaces and can -adopt the tool only within them + tool throughout the cluster +- **Limited Access**: This method is dedicated to users who only have access to a limited number of namespaces and can + adopt the tool only within them ### Cluster Wide Access Installation @@ -212,7 +211,7 @@ $ helm install py-kube-downscaler py-kube-downscaler/py-kube-downscaler This command will deploy: -- **Deployment**: main deployment +- **Deployment**: main deployment - **ConfigMap**: used to supply parameters to the deployment - **ServiceAccount**: represents the Cluster Idenity of the KubeDownscaler - **ClusterRole**: needed to access all the resources that can be modified by the KubeDownscaler @@ -225,6 +224,7 @@ It is possible to further customize it by changing the parameters present in the **RBAC-Prerequisite**: This installation mode requires permission to deploy Service Account, Role and RoleBinding The Limited Access installation requires the user to fill the following parameters inside values.yaml + - **constrainedDownscaler**: true (mandatory) - **constrainedNamespaces**: [namespace1,namespace2,namespace3,...] (list of namespaces - mandatory) @@ -236,7 +236,7 @@ $ helm install py-kube-downscaler py-kube-downscaler/py-kube-downscaler --namesp This command will deploy: -- **Deployment**: main deployment +- **Deployment**: main deployment - **ConfigMap**: used to supply parameters to the deployment - **ServiceAccount**: represents the Cluster Idenity of the KubeDownscaler @@ -245,7 +245,7 @@ For each namespace inside constrainedNamespaces, the chart will deploy - **Role**: needed to access all the resources that can be modified by the KubeDownscaler (inside that namespace) - **RoleBinding**: links the ServiceAccount used by KubeDownscaler to the Role inside that namespace -If RBAC permissions are misconfigured and the KubeDownscaler is unable to access resources in one of the specified namespaces, +If RBAC permissions are misconfigured and the KubeDownscaler is unable to access resources in one of the specified namespaces, a warning message will appear in the logs indicating a `403 Error` ## Configuration @@ -271,15 +271,14 @@ DEFAULT_DOWNTIME="Sat-Sun 00:00-24:00 CET,Fri-Fri 20:00-24:00 CET' Each time specification can be in one of two formats: -- Recurring specifications have the format - `- :-: `. - The timezone value can be any [Olson - timezone](https://en.wikipedia.org/wiki/Tz_database), e.g. - \"US/Eastern\", \"PST\" or \"UTC\". -- Absolute specifications have the format `-` - where each `