Docker 101, Kubernetes 101, Gravity 101.
Note: This part of the training pertains to Gravity 5.5 and earlier.
Gravity Clusters come with a fully configured and customizable monitoring and alerting systems by default. The system consists of various components, which are automatically included into a Cluster Image that is built with a single command tele build
.
Before getting into Gravity’s monitoring and alerts capability in more detail, let’s first discuss the various components that are involved.
There are 4 main components in the monitoring system: InfluxDB, Heapster, Grafana, and Kapacitor.
Is an open source time series database which is used for the main data store for monitoring time series data. Provides the Kubernetes service influxdb.monitoring.svc.cluster.local.
Monitors Kubernetes components in generating a collection of not only performance metrics about workloads, nodes, and pods, but also events generated by Clusters. The statistics captured are reported to InfluxDB.
Is an open source metrics suite which provides the dashboard in the Gravity monitoring and alerts system. The dashboard provides a visual to the information stored in InfluxDB, which is exposed as the service grafana.monitoring.svc.cluster.local
. Credentials generated are placed into a secret grafana
in the monitoring namespace
Gravity is shipped with 2 pre-configured dashboards providing a visual of machine and pod-level overview of the installed cluster. Within the Gravity control panel, you can access the dashboard by navigating to the Monitoring page.
By default, Grafana is running in anonymous read-only mode. Anyone who logs into Gravity can view but not modify the dashboards.
Is the data processing engine for InfluxDB, which streams data from InfluxDB and sends alerts to the end user exposed as the service kapacitor.monitoring.svc.cluster.local.
All monitoring components are running in the “monitoring” namespace in Gravity. Let’s take a look at them:
$ kubectl -nmonitoring get pods
NAME READY STATUS RESTARTS AGE
grafana-8cb94d5dc-6dc2h 2/2 Running 0 10m
heapster-57fbfbbc7-9xtm6 1/1 Running 0 10m
influxdb-599c5f5c45-6hqmc 2/2 Running 1 10m
kapacitor-68f6d76878-8m26x 3/3 Running 0 10m
telegraf-75487b79bd-ptvzd 1/1 Running 0 10m
telegraf-node-master-x9v48 1/1 Running 0 10m
Most of the cluster metrics are collected by Heapster. Heapster runs as a part of a Deployment and collects metrics from the cluster nodes and persists them into the configured “sinks”.
The Heapster pod collects metrics from kubelets running on the cluster nodes, which in turn queries the data from cAdvisors - a container resource usage collector integrated into kubelet that supports Docker containers natively. cAdvisor agent running on a node discovers all running containers and collects their CPU, memory, filesystem and network usage statistics.
Both of these collectors operate on their own intervals - kubelet queries cAdvisor every 15 seconds, while Heapster scrapes metrics from all kubelets every minute.
Heapster by itself does not store any data - instead, it ships all scraped metrics to the configured sinks. In Gravity clusters the sink is an InfluxDB database that is deployed as a part of the monitoring application.
All metrics collected by Heapster are placed into the k8s
database in InfluxDB. In InfluxDB the data is organized into "measurements". A measurement acts as a container for "fields" and a few other things. Applying a very rough analogy with relational databases, a measurement can be thought of as a "table" whereas the fields are "columns" of the table. In addition, each measurement can have tags attached to it which can be used to add various metadata to the data.
Each metric is stored as a separate “series” in InfluxDB. A series in InfluxDB is the collection of data that share a retention policy, a measurement and a tag set. Heapster tags each metrics with different labels, such as host name, pod name, container name and others, which become “tags” on the stored series. Tags are indexed so queries on tags are fast.
When troubleshooting problems with metrics, it is sometimes useful to look into the Heapster container logs where it can be seen if it experiences communication issues with InfluxDB service or has other issues:
$ kubectl -nmonitoring logs heapster-57fbfbbc7-9xtm6
In addition, any other apps that collect metrics should also submit them into the same DB in order for proper retention policies to be enforced.
Like mentioned above, InfluxDB is exposed via a cluster-local Kubernetes service influxdb.monitoring.svc.cluster.local
and serves its HTTP API on port 8086
so we can use it to explore the database from the CLI.
Let's enter the Gravity master container to make sure the services are resolvable and to get access to additional CLI tools:
$ sudo gravity shell
Let's ping the database to make sure it's up and running:
$ curl -sl -I http://influxdb.monitoring.svc.cluster.local:8086/ping
// Should return 204 response.
InfluxDB API endpoint requires authentication so to make actual queries to the database we need to determine the credentials first. The generated credentials are kept in the influxdb
secret in the monitoring namespace:
$ kubectl -nmonitoring get secrets/influxdb -oyaml
Note that the credentials in the secret are base64-encoded so you'd need to decode them:
$ echo <encoded-password> | base64 -d
$ export PASS=xxx
Once the credentials have been decoded (the username is root
and the password is generated during installation), they can be supplied via a cURL command. For example, let's see what databases we currently have:
$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query --data-urlencode 'q=show databases' | jq
Now we can also see which measurements are currently being collected:
$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show measurements' | jq
Finally, we can query specific metrics if we want to using InfluxDB's SQL-like query language:
$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=select * from uptime limit 10' | jq
Refer to the InfluxDB API documentation if you want to learn more about querying the database.
Let's now talk about durations the measurements are stored for. During initial installation Gravity pre-configures InfluxDB with the following retention policies:
- default = 24 hours - is used for high precision metrics.
- medium = 4 weeks - is used for medium precision metrics.
- long = 52 weeks - keeps metrics aggregated over even larger intervals.
We can use the same InfluxDB API to see the retention policies configured in the database:
$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show retention policies' | jq
All metrics sent to InfluxDB by Heapster are saved using the default retention policy which means that all the high-resolution metrics collected are kept intact for 24 hours.
To provide historical overview some of the most commonly helpful metrics (such as CPU/memory usage, network transfer rates) are rolled up to lower resolutions and stored using the longer retention policies mentioned above.
In order to provide such downsampled metrics, Gravity uses InfluxDB “continuous queries” which are programmed to run automatically and aggregate metrics over a certain interval.
The Gravity monitoring system allows two types of rollup configurations for collecting metrics:
- medium = aggregates data over 5 minute intervals
- long = aggregates data over 1 hour intervals
Each of the two rollups mentioned above, continue to their respective retention policy following. For example the long rollup aggregates data over 1 hour interval and goes into the long retention policy.
Preconfigured rollups that Gravity clusters come with are stored in the rollups-default
config map in the monitoring namespace:
$ kubectl -nmonitoring get configmaps/rollups-default -oyaml
The configuration of retention policies and rollups is handled by a “watcher” service that runs in a container as a part of the InfluxDB pod so all these configurations can be seen in its logs:
$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher
In addition to the rollups pre-configured by Gravity, applications can downsample their own metrics (or create different rollups for standard metrics) by configuring their own rollups through ConfigMaps.
Custom rollup ConfigMaps should be created in the monitoring
namespace and assigned a monitoring
label with value of rollup
.
An example ConfigMap is shown below with a Custom Metric Rollups:
apiVersion: v1
kind: ConfigMap
metadata:
name: myrollups
namespace: monitoring
labels:
monitoring: rollup
data:
rollups: |
[
{
"retention": "medium",
"measurement": "cpu/usage_rate",
"name": "cpu/usage_rate/medium",
"functions": [
{
"function": "max",
"field": "value",
"alias": "value_max"
},
{
"function": "mean",
"field": "value",
"alias": "value_mean"
}
]
}
]
The watcher process will detect the new ConfigMap and configure an appropriate continuous query for the new rollup:
$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher
...
time="2020-01-24T05:40:13Z" level=info msg="Detected event ADDED for configmap \"myrollups\"" label="monitoring in (rollup)" watch=configmap
time="2020-01-24T05:40:13Z" level=info msg="New rollup." query="create continuous query \"cpu/usage_rate/medium\" on k8s begin select max(\"value\") as value_max, mean(\"value\") as value_mean into k8s.\"medium\".\"cpu/usage_rate/medium\" from k8s.\"default\".\"cpu/usage_rate\" group by *, time(5m) end"
Along with the dashboards mentioned above, your applications can use their own Grafana dashboards by using ConfigMaps.
Similar to creating custom rollups, in order to use a custom dashboard, the ConfigMap should be created in the monitoring
namespace, assigned a monitoring
label with a value dashboard
.
Under the specified namespace, the ConfigMap will be recognized and loaded when installing the application. It is possible to add new ConfigMaps at a later time as the watcher will then pick it up and create it in Grafana. Similarly, if you delete the ConfigMap, the watcher will delete it from Grafana.
Dashboard ConfigMaps may contain multiple keys with dashboards as key names are not relevant.
An example ConfigMap is shown below:
apiVersion: v1
kind: ConfigMap
metadata:
name: mydashboard
namespace: monitoring
labels:
monitoring: dashboard
data:
mydashboard: |
{ ... dashboard JSON ... }
Note: by default Grafana is run in read-only mode, a separate Grafana instance is required to create custom dashboards.
The following are the default metrics captured by the Gravity Monitoring & Alerts system:
Below are a list of metrics captured by Heapster which are exported to the backend:
Metric Name | Description | |
cpu | limit | CPU hard limit in millicores. |
node_capacity | CPU capacity of a node. | |
node_allocatable | CPU allocatable of a node. | |
node_reservation | Share of CPU that is reserved on the node allocatable. | |
node_utilization | CPU utilization as a share of node allocatable. | |
request | CPU request (the guaranteed amount of resources) in millicores. | |
usage | Cumulative amount of consumed CPU time on all cores in nanoseconds. | |
usage_rate | CPU usage on all cores in millicores. | |
load | CPU load in milliloads, i.e., runnable threads * 1000 | |
ephemeral_storage | limit | Local ephemeral storage hard limit in bytes. |
request | Local ephemeral storage request (the guaranteed amount of resources) in bytes. | |
usage | Total local ephemeral storage usage. | |
node_capacity | Local ephemeral storage capacity of a node. | |
node_allocatable | Local ephemeral storage allocatable of a node. | |
node_reservation | Share of local ephemeral storage that is reserved on the node allocatable. | |
node_utilization | Local ephemeral utilization as a share of ephemeral storage allocatable. | |
filesystem | usage | Total number of bytes consumed on a filesystem. |
limit | The total size of filesystem in bytes. | |
available | The number of available bytes remaining in a the filesystem | |
inodes | The number of available inodes in a the filesystem | |
inodes_free | The number of free inodes remaining in a the filesystem | |
disk | io_read_bytes | Number of bytes read from a disk partition |
io_write_bytes | Number of bytes written to a disk partition | |
io_read_bytes_rate | Number of bytes read from a disk partition per second | |
io_write_bytes_rate | Number of bytes written to a disk partition per second | |
memory | limit | Memory hard limit in bytes. |
major_page_faults | Number of major page faults. | |
major_page_faults_rate | Number of major page faults per second. | |
node_capacity | Memory capacity of a node. | |
node_allocatable | Memory allocatable of a node. | |
node_reservation | Share of memory that is reserved on the node allocatable. | |
node_utilization | Memory utilization as a share of memory allocatable. | |
page_faults | Number of page faults. | |
page_faults_rate | Number of page faults per second. | |
request | Memory request (the guaranteed amount of resources) in bytes. | |
usage | Total memory usage. | |
cache | Cache memory usage. | |
rss | RSS memory usage. | |
working_set | Total working set usage. Working set is the memory being used and not easily dropped by the kernel. | |
accelerator | memory_total | Memory capacity of an accelerator. |
memory_used | Memory used of an accelerator. | |
duty_cycle | Duty cycle of an accelerator. | |
request | Number of accelerator devices requested by container. | |
network | rx | Cumulative number of bytes received over the network. |
rx_errors | Cumulative number of errors while receiving over the network. | |
rx_errors_rate | Number of errors while receiving over the network per second. | |
rx_rate | Number of bytes received over the network per second. | |
tx | Cumulative number of bytes sent over the network | |
tx_errors | Cumulative number of errors while sending over the network | |
tx_errors_rate | Number of errors while sending over the network | |
tx_rate | Number of bytes sent over the network per second. | |
uptime | - | Number of milliseconds since the container was started. |
Satellite is an open-source tool prepared by Gravitational that collects health information related to the Kubernetes cluster. Satellite runs on each Gravity Cluster node and has various checks assessing the health of a Cluster.
Satellite collects several metrics related to cluster health and exposes them over the Prometheus endpoint. Among the metrics collected by Satellite are:
- Etcd related metrics:
- Current leader address
- Etcd cluster health
- Docker related metrics:
- Overall health of the Docker daemon
- Sysctl related metrics:
- Status of IPv4 forwarding
- Status of netfilter
- Systemd related metrics:
- State of various systemd units such as etcd, flannel, kube-*, etc.
The nodes also run Telegraf - an agent for collecting, processing, aggregating, and writing metrics. Some system input plugins related to cpu and memory are captured as default metrics as well.
Metric Name | Description |
load1 (float) | Warning threshold for load over 1 min |
load15 (float) | Warning threshold for load over 15 mins |
load5 (float) | Warning threshold for load over 5 mins |
n_users (integer) | Number of users |
n_cpus (integer) | Number of CPU cores |
uptime (integer, seconds) | Number of milliseconds since the system was started |
In addition to the default metrics, Telegraf also queries the Satellite Prometheus endpoint described above and ships all metrics to the same “k8s” database in InfluxDB.
Telegraf configuration can be found here. The respective configuration files show which input plugins each Telegraf instance has enabled.
As mentioned Kapacitor is the alerting system that streams data from InfluxDB and handles alerts sent to users. Kapacitor can also be configured to send email alerts, or customized with other alerts.
The following are alerts that Gravity Monitoring & Alerts system ships with by default:
Component | Alert | Description |
CPU | High CPU usage | Warning at > 75% used
Critical error at > 90% used |
Memory | High Memory usage | Warning at > 80% used
Critical error at > 90% used |
Systemd | Individual | Error when unit not loaded/active |
Overall systemd health | Error when systemd detects a failed service | |
Filesystem | High disk space usage | Warning at > 80% used
Critical error at > 90% used |
High inode usage | Warning at > 90% used
Critical error at > 95% used |
|
System | Uptime | Warning node uptime < 5 mins |
Kernel params | Error if param not set | |
Etcd | Etcd instance health | Error when etcd master down > 5 mins |
Etcd latency check | Warning when follower <-> leader latency > 500 ms
Error when > 1 sec over period of 1 min |
|
Docker | Docker daemon health | Error when docker daemon is down |
InfluxDB | InfluxDB instance health | Error when InfluxDB is inaccessible |
Kubernetes | Kubernetes node readiness | Error when the node is not ready |
In order to configure email alerts via Kapacitor you will need to create Gravity resources of type smtp
and alerttarget
.
An example of the configuration is shown below:
kind: smtp
version: v2
metadata:
name: smtp
spec:
host: smtp.host
port: <smtp port> # 465 by default
username: <username>
password: <password>
---
kind: alerttarget
version: v2
metadata:
name: email-alerts
spec:
email: [email protected] # Email address of the alert recipient
Creating these resources will accordingly update and reload Kapacitor configuration:
$ gravity resource create -f smtp.yaml
In order to view the current SMTP settings or alert target:
$ gravity resource get smtp
$ gravity resource get alerttarget
Only a single alert target can be configured. To remove the current alert target, you can execute the following kapacitor command inside the designated pod:
$ kapacitor delete alerttarget email-alerts
To test a Kapacitor SMTP configuration you can execute the following:
$ kubectl exec -n monitoring $POD_ID -c kapacitor -- /bin/bash -c "kapacitor service-tests smtp"
If the settings are set up appropriately, the recipient should receive an email with the subject “test subject”.
Creating new alerts is as easy as using another Gravity resource of type alert
. The alerts are written in TICKscript and are automatically detected, loaded, and enabled for Gravity Monitoring and Alerts system.
For demonstration purposes let’s define an alert that always fires:
kind: alert
version: v2
metadata:
name: my-formula
spec:
formula: |
var period = 5m
var every = 1m
var warnRate = 2
var warnReset = 1
var usage_rate = stream
|from()
.measurement('cpu/usage_rate')
.groupBy('nodename')
.where(lambda: "type" == 'node')
|window()
.period(period)
.every(every)
var cpu_total = stream
|from()
.measurement('cpu/node_capacity')
.groupBy('nodename')
.where(lambda: "type" == 'node')
|window()
.period(period)
.every(every)
var percent_used = usage_rate
|join(cpu_total)
.as('usage_rate', 'total')
.tolerance(30s)
.streamName('percent_used')
|eval(lambda: (float("usage_rate.value") * 100.0) / float("total.value"))
.as('percent_usage')
|mean('percent_usage')
.as('avg_percent_used')
var trigger = percent_used
|alert()
.message('{{ .Level}} / Node {{ index .Tags "nodename" }} has high cpu usage: {{ index .Fields "avg_percent_used" }}%')
.warn(lambda: "avg_percent_used" > warnRate)
.warnReset(lambda: "avg_percent_used" < warnReset)
.stateChangesOnly()
.details('''
<b>{{ .Message }}</b>
<p>Level: {{ .Level }}</p>
<p>Nodename: {{ index .Tags "nodename" }}</p>
<p>Usage: {{ index .Fields "avg_percent_used" | printf "%0.2f" }}%</p>
''')
.email()
.log('/var/lib/kapacitor/logs/high_cpu.log')
.mode(0644)
And create it :
$ gravity resource create -f formula.yaml
Custom alerts are being monitored by another “watcher” type of service that runs inside the Kapacitor pod:
$ kubectl -nmonitoring logs kapacitor-68f6d76878-8m26x watcher
time="2020-01-24T06:18:10Z" level=info msg="Detected event ADDED for configmap \"my-formula\"" label="monitoring in (alert)" watch=configmap
We can confirm the alert is running checking the logs after a few seconds:
$ kubectl -nmonitoring exec -ti kapacitor-68f6d76878-8m26x -c kapacitor cat -- /var/lib/kapacitor/logs/high_cpu.log
{"id":"percent_used:nodename=10.0.2.15","message":"WARNING / Node 10.0.2.15 has high cpu usage: 15%","details":"\n\u003cb\u003eWARNING / Node 10.0.2.15 has high cpu usage: 15%\u003c/b\u003e\n\u003cp\u003eLevel: WARNING\u003c/p\u003e\n\u003cp\u003eNodename: 10.0.2.15\u003c/p\u003e\n\u003cp\u003eUsage: 15.00%\u003c/p\u003e\n","time":"2020-01-24T06:30:00Z","duration":0,"level":"WARNING","data":{"series":[{"name":"percent_used","tags":{"nodename":"10.0.2.15"},"columns":["time","avg_percent_used"],"values":[["2020-01-24T06:30:00Z",15]]}]},"previousLevel":"OK","recoverable":true}
To view all currently configured custom alerts you can run:
$ gravity resource get alert my-formula
In order to remove a specific alert you can execute the following kapacitor command inside the designated pod:
$ kapacitor delete alert my-formula
This concludes our monitoring training.