Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: work for prow monitoring #841

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions prow/manifests/overlays/metal3/ingress.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@ spec:
name: hook
port:
number: 8888
- path: /monitoring
pathType: Prefix
backend:
service:
name: grafana
port:
number: 80
tls:
- hosts:
- prow.apps.test.metal3.io
Expand Down
156 changes: 156 additions & 0 deletions prow/monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Monitoring of K8s cluster and Prow resources

This is a wip that provides insight into how to monitor k8s
cluster resources and prow services.

The k8s is based on the kubernetes mixins that can found here:
[k8s-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin)

The steps to set this up is the following and is tested to work in minikube.
In our case we need to integrate this to the granfana.yaml here.

The main detail is the generation of the resources and deciding if we want
to generate these dynamically or if we take a static snapshot of the yaml
and use that. Currently there is static snapshot in grafana-dashboard-definitions.

Also the kustomize.yaml resource needs to be created it should be able
to automate the process of creating a configmap out of grafana-dashboard-definitions

Further for the alertmanager to automate the alerts the slackwebhook needs to be created as
a secret. This can be done the sameway as the secrets in
`project-infra/prow/manifests/overlays/metal3`

## Deploying Grafana with Kubernetes-mixins


### Step 1: Install Prometheus and Grafana using Helm

NOTE: We will most likely not use helm but kustomize, only used helm for a quick poc

First, add the Helm repositories for Prometheus and Grafana:

```kubectl
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
```

Now, install Prometheus and Grafana:

```kubectl
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
```

This command installs the Prometheus stack, which includes Prometheus, Alertmanager, and Grafana.

### Step 2: Access Grafana

Expose the Grafana service using `kubectl port-forward`:

```kubectl
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring
```

You can now access Grafana at `http://localhost:3000`. The default login is:

- **Username:** `admin`
- **Password:** `prom-operator`

### Step 3: Generate and Create a ConfigMap for Grafana Dashboards

Assuming you have cloned the kubernetes-mixin You can manually generate the
alerts, dashboards and rules files, but first you must install some tools:

```
$ go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
$ brew install jsonnet
```

Then, grab the mixin and its dependencies:

```
$ git clone https://github.com/kubernetes-monitoring/kubernetes-mixin
$ cd kubernetes-mixin
$ jb install
```

Finally, build the mixin:

```
$ make prometheus_alerts.yaml
$ make prometheus_rules.yaml
$ make dashboards_out
```
1. To apply the rules and alerts you need to add the following to the files

So add the following header and replace the groups with whatever was generated

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-mixin-alerts
namespace: monitoring
spec:
"groups":
...
```

2. Create a ConfigMap with the Grafana dashboards:

```kubectl
kubectl create configmap grafana-dashboards --from-file=dashboards_out/ -n monitoring
```

This command creates a ConfigMap named `grafana-dashboards` in the
`monitoring` namespace, containing all the JSON files in the
`dashboards_out/` directory. This needs to be mounted to grafana
in the following steps

### Step 4: Configure Grafana to Load Dashboards from the ConfigMap

Patch the Grafana deployment:

1. **Edit the Grafana deployment:**

```kubectl
kubectl edit deployment prometheus-grafana -n monitoring
```

2. **Add the following under `spec` > `volumes`:**

```yaml
volumes:
- name: grafana-dashboards
configMap:
name: grafana-dashboards
```

3. **Mount the volume under `containers` > `volumeMounts`:**

```yaml
volumeMounts:
- name: grafana-dashboards
mountPath: /var/lib/grafana/dashboards
```

4. **Ensure Grafana is configured to load dashboards:**

Ensure that Grafana is set up to load dashboards from the specified directory:

```yaml
env:
- name: GF_DASHBOARDS_JSON_ENABLED
value: "true"
- name: GF_DASHBOARDS_JSON_PATH
value: "/var/lib/grafana/dashboards"
```

### Step 5: Verify the Dashboards in Grafana

After applying the changes, Grafana should automatically load the dashboards from the ConfigMap.

1. Access Grafana at `http://localhost:3000`.
2. Navigate to "Dashboards" > "Manage" and you should see the dashboards listed and ready to use.


24 changes: 24 additions & 0 deletions prow/monitoring/additional-scrape-configs_secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
apiVersion: v1
kind: Secret
metadata:
name: additional-scrape-configs
namespace: prow-monitoring
stringData:
prometheus-additional.yaml: |
- job_name: blackbox
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
# ATTENTION: Keep this in sync with the list in mixins/prometheus/prober_alerts.libsonnet
- https://prow.apps.test.metal3.io/
# - https://monitoring.prow.apps.test.metal3.io/ Add this once we have the subdomain
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-prober
type: Opaque
94 changes: 94 additions & 0 deletions prow/monitoring/alertmanager.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: prow
namespace: prow-monitoring
spec:
replicas: 3
image: docker.io/prom/alertmanager
listenLocal: false
nodeSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: alertmanager
version: v0.27.0
storage: # Note that this section is immutable so changes require deleting and recreating the resource.
volumeClaimTemplate:
metadata:
name: prometheus
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: "standard"
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
labels:
app: alertmanager
name: alertmanager
namespace: prow-monitoring
spec:
ports:
- name: http
port: 9093
protocol: TCP
targetPort: 9093
selector:
alertmanager: prow
app: alertmanager
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: alertmanager
namespace: prow-monitoring
---
# TODO: NEED CHANGE HERE TO CORRECT SLACK SETTINGS
# Slack endpoint, or even different methods of alerting
# Please replace '{{ api_url }}' below with the URL of slack incoming hook
# before `kubectl apply -f`
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-prow
namespace: prow-monitoring
stringData:
alertmanager.yaml: |
global:
resolve_timeout: 5m

route:
group_by: ['alertname', 'job']
group_wait: 30s
group_interval: 10m
repeat_interval: 4h
receiver: 'slack-warnings'
routes:
- receiver: 'cluster-api-aws-alerts'
group_interval: 5m
repeat_interval: 2h
match:
boskos_type: aws-account



receivers:
- name: 'slack-warnings'
slack_configs:
- channel: '#prow-alerts'
api_url: '{{ api_url }}'
icon_url: https://avatars3.githubusercontent.com/u/3380462
text: '{{ template "custom_slack_text" . }}'
link_names: true

templates:
- '*.tmpl'
msg.tmpl: |
{{ define "custom_slack_text" }}{{ .CommonAnnotations.message }}{{ end }}
type: Opaque
6 changes: 6 additions & 0 deletions prow/monitoring/alertmanager_rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: alertmanager
namespace: prow-monitoring
66 changes: 66 additions & 0 deletions prow/monitoring/blackbox_prober.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-prober
namespace: prow-monitoring
labels:
app: blackbox-prober
spec:
selector:
matchLabels:
app: blackbox-prober
replicas: 1
template:
metadata:
labels:
app: blackbox-prober
spec:
containers:
- name: blackbox-prober
args:
- --config.file=/etc/config/prober.yaml
image: prom/blackbox-exporter:v0.15.1
volumeMounts:
- name: config
mountPath: /etc/config/
volumes:
- name: config
configMap:
name: blackbox-prober-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-prober-config
namespace: prow-monitoring
labels:
app: blackbox-prober
data:
prober.yaml: |-
modules:
http_2xx:
prober: http
timeout: 8s
http:
# valid_status_codes defaults to 2xx
method: GET
no_follow_redirects: false
fail_if_ssl: false
fail_if_not_ssl: true
preferred_ip_protocol: "ip4" # Defaults to ip6
---
apiVersion: v1
kind: Service
metadata:
name: blackbox-prober
namespace: prow-monitoring
labels:
app: blackbox-prober
spec:
type: ClusterIP
ports:
- name: blackbox-prober
port: 80
targetPort: 9115
selector:
app: blackbox-prober
Loading
Loading