Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update the sysdig alerts for new pod counts and PVC limits #56

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

thegentlemanphysicist
Copy link
Contributor

No description provided.

Copy link

Terraform Format and Style 🖌success

Terraform Initialization ⚙️success

Terraform Plan 📖success

Show Plan
module.c6af30-team.sysdig_monitor_alert_metric.prod_db_pod_restarts_gte_1: Refreshing state... [id=10777729]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_pods_low: Refreshing state... [id=10777730]
module.c6af30-team.sysdig_monitor_alert_metric.prod_db_pods_low: Refreshing state... [id=10777732]
module.c6af30-team.sysdig_monitor_alert_promql.prod_sso_db_pv_gt_80: Refreshing state... [id=10777733]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_med: Refreshing state... [id=10777726]
module.c6af30-team.sysdig_monitor_alert_metric.prod_db_pods_high: Refreshing state... [id=10777727]
module.c6af30-team.sysdig_monitor_dashboard.pods_cpu: Refreshing state... [id=296848]
module.c6af30-team.sysdig_monitor_dashboard.pv_usage: Refreshing state... [id=296849]
module.c6af30-team.sysdig_monitor_alert_promql.prod_sso_db_pv_gt_60: Refreshing state... [id=10777731]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_pods_med: Refreshing state... [id=10777734]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_spike_high: Refreshing state... [id=10777728]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_high: Refreshing state... [id=10777725]
module.c6af30-team.sysdig_monitor_alert_metric.prod_keycloak_pods_high: Refreshing state... [id=10777724]
module.eb75ad-team.sysdig_monitor_alert_metric.dev_backup_storage_pv_usage_gt_med: Refreshing state... [id=16074248]
module.eb75ad-team.sysdig_monitor_alert_promql.dev_db_pv_usage_ninety: Refreshing state... [id=14625871]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_backup_storage_pv_usage_gt_med: Refreshing state... [id=9921933]
module.c6af30-team.sysdig_monitor_dashboard.pv_overall: Refreshing state... [id=296847]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_dr_pod: Refreshing state... [id=15328795]
module.eb75ad-team.sysdig_monitor_alert_promql.test_kc_disk_log_pv_usage_sixty: Refreshing state... [id=15959116]
module.eb75ad-team.sysdig_monitor_alert_metric.dev_dr_pod: Refreshing state... [id=15328793]
module.eb75ad-team.sysdig_monitor_dashboard.general_pod_performance: Refreshing state... [id=404905]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_db_pod_restarts_gte_1: Refreshing state... [id=9921905]
module.eb75ad-team.sysdig_monitor_alert_promql.dev_kc_disk_log_pv_usage_sixty: Refreshing state... [id=15959117]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_med: Refreshing state... [id=9921897]
module.eb75ad-team.sysdig_monitor_alert_promql.prod_db_pv_usage_low: Refreshing state... [id=9921935]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_db_pods_high: Refreshing state... [id=9921906]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_log_pv_med: Refreshing state... [id=9921922]
module.eb75ad-team.sysdig_monitor_alert_promql.dev_db_pv_usage_seventyfive: Refreshing state... [id=14625866]
module.eb75ad-team.sysdig_monitor_alert_downtime.test_dr_pod_downtime: Refreshing state... [id=15346484]
module.eb75ad-team.sysdig_monitor_alert_metric.test_dr_pod: Refreshing state... [id=15328794]
module.eb75ad-team.sysdig_monitor_alert_metric.test_backup_storage_pv_usage_gt_med: Refreshing state... [id=16074249]
module.eb75ad-team.sysdig_monitor_alert_promql.test_db_pv_usage_seventyfive: Refreshing state... [id=14625869]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_high: Refreshing state... [id=9921898]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_sustained: Refreshing state... [id=15961444]
module.e4ca1d-team.sysdig_monitor_dashboard.pods_cpu: Refreshing state... [id=405694]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_spike_high: Refreshing state... [id=9921904]
module.eb75ad-team.sysdig_monitor_alert_downtime.dev_dr_pod_downtime: Refreshing state... [id=15338912]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_high: Refreshing state... [id=9921902]
module.eb75ad-team.sysdig_monitor_alert_promql.prod_kc_disk_log_pv_usage_sixty: Refreshing state... [id=15959118]
module.eb75ad-team.sysdig_monitor_alert_promql.prod_db_pv_usage_med: Refreshing state... [id=9921934]
module.eb75ad-team.sysdig_monitor_alert_promql.test_db_pv_usage_ninety: Refreshing state... [id=14625872]
module.e4ca1d-team.sysdig_monitor_dashboard.pv_overall: Refreshing state... [id=405696]
module.e4ca1d-team.sysdig_monitor_dashboard.general_pod_performance: Refreshing state... [id=405697]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_low: Refreshing state... [id=9921901]
module.eb75ad-team.sysdig_monitor_dashboard.pv_usage: Refreshing state... [id=363085]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_db_pods_low: Refreshing state... [id=9921900]
module.eb75ad-team.sysdig_monitor_alert_downtime.prod_dr_pod_downtime: Refreshing state... [id=15346483]
module.eb75ad-team.sysdig_monitor_dashboard.pods_cpu: Refreshing state... [id=363086]
module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_med: Refreshing state... [id=9921903]
module.eb75ad-team.sysdig_monitor_alert_promql.prod_minio_pvc_storage_low: Refreshing state... [id=14080753]
module.eb75ad-team.sysdig_monitor_dashboard.pv_overall: Refreshing state... [id=363087]
module.e4ca1d-team.sysdig_monitor_dashboard.pv_usage: Refreshing state... [id=405695]

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "terraform apply":

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_cpu_usage_sustained has changed
  ~ resource "sysdig_monitor_alert_metric" "prod_keycloak_cpu_usage_sustained" {
        id                    = "15961444"
        name                  = "[GOLD CUST PROD] Keycloak - Sustained Elevated CPU"
      ~ version               = 1 -> 2
        # (9 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the
relevant attributes using ignore_changes, the following plan may include
actions to undo or respond to these changes.

─────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
  ~ update in-place

Terraform will perform the following actions:

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_log_pv_high will be created
  + resource "sysdig_monitor_alert_metric" "prod_keycloak_log_pv_high" {
      + enabled               = true
      + id                    = (known after apply)
      + metric                = "max(avg(sysdig_container_fs_used_percent)) > 90"
      + multiple_alerts_by    = []
      + name                  = "[GOLD CUST PROD] SSO - Log PV Usage over 90%"
      + notification_channels = [
          + 132277,
          + 57336,
          + 57341,
        ]
      + scope                 = "kubernetes.cluster.name in (\"gold\") and kubernetes.namespace.name in (\"eb75ad-prod\") and kubernetes.deployment.name in (\"sso-keycloak\")"
      + severity              = 0
      + team                  = (known after apply)
      + trigger_after_minutes = 2
      + version               = (known after apply)

      + custom_notification {
          + title = "{{__alert_name__}} is {{__alert_status__}}"
        }
    }

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_log_pv_med will be updated in-place
  ~ resource "sysdig_monitor_alert_metric" "prod_keycloak_log_pv_med" {
        id                    = "9921922"
        name                  = "[GOLD CUST PROD] SSO - Log PV Usage over 70%"
      ~ severity              = 2 -> 4
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_high will be updated in-place
  ~ resource "sysdig_monitor_alert_metric" "prod_keycloak_pods_high" {
        id                    = "9921902"
      ~ metric                = "sum(min(kube_pod_sysdig_status_ready)) < 3" -> "sum(min(kube_pod_sysdig_status_ready)) < 4"
        name                  = "[GOLD CUST PROD] SSO - Ready Pods High"
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_low will be updated in-place
  ~ resource "sysdig_monitor_alert_metric" "prod_keycloak_pods_low" {
        id                    = "9921901"
      ~ metric                = "sum(max(kube_pod_sysdig_status_ready)) < 5" -> "sum(max(kube_pod_sysdig_status_ready)) < 7"
        name                  = "[GOLD CUST PROD] SSO - Ready Pods Low"
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eb75ad-team.sysdig_monitor_alert_metric.prod_keycloak_pods_med will be updated in-place
  ~ resource "sysdig_monitor_alert_metric" "prod_keycloak_pods_med" {
        id                    = "9921897"
      ~ metric                = "sum(min(kube_pod_sysdig_status_ready)) < 4" -> "sum(min(kube_pod_sysdig_status_ready)) < 6"
        name                  = "[GOLD CUST PROD] SSO - Ready Pods Med"
        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eb75ad-team.sysdig_monitor_alert_promql.prod_db_pv_usage_high will be created
  + resource "sysdig_monitor_alert_promql" "prod_db_pv_usage_high" {
      + enabled               = true
      + id                    = (known after apply)
      + name                  = "[GOLD CUST PROD] SSO DB PV over 90%"
      + notification_channels = [
          + 132277,
          + 57336,
          + 57341,
        ]
      + promql                = "avg(kubelet_volume_stats_used_bytes{namespace=\"eb75ad-prod\", persistentvolumeclaim=~\"storage-volume-sso-patroni-.*\"}*100 / kubelet_volume_stats_capacity_bytes{namespace=\"eb75ad-prod\", persistentvolumeclaim=~\"storage-volume-sso-patroni-.*\"}) by (persistentvolumeclaim) > 90"
      + severity              = 0
      + team                  = (known after apply)
      + trigger_after_minutes = 2
      + version               = (known after apply)

      + custom_notification {
          + title = "{{__alert_name__}} is {{__alert_status__}}"
        }
    }

Plan: 2 to add, 4 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't
guarantee to take exactly these actions if you run "terraform apply" now.

Pusher: @thegentlemanphysicist, Action: pull_request

@thegentlemanphysicist thegentlemanphysicist added the enhancement New feature or request label Mar 11, 2024
@thegentlemanphysicist thegentlemanphysicist merged commit 0ac3c84 into main Mar 11, 2024
3 checks passed
@thegentlemanphysicist thegentlemanphysicist deleted the SSOTEAM-1612 branch March 11, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants