Skip to content

Commit

Permalink
Merge pull request #106 from kermat/mdk/add-more-drbd-metrics
Browse files Browse the repository at this point in the history
Add more DRBD resource metrics
  • Loading branch information
stefanotorresi authored Dec 5, 2019
2 parents 4a85365 + 68c8cf8 commit 1da69e1
Show file tree
Hide file tree
Showing 5 changed files with 310 additions and 52 deletions.
154 changes: 146 additions & 8 deletions doc/metric_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,9 +205,19 @@ The DRBD subsystems collect devices stats by parsing its configuration the JSON

0. [Sample](../test/drbd.metrics)
1. [`ha_cluster_drbd_resources`](#ha_cluster_drbd_resources)
2. [`ha_cluster_drbd_connections`](#ha_cluster_drbd_connections)
3. [`ha_cluster_drbd_connections_sync`](#ha_cluster_drbd_connections_sync)
4. [`ha_cluster_drbd_split_brain`](#ha_cluster_drbd_split_brain)
2. [`ha_cluster_drbd_written`](#ha_cluster_drbd_written)
3. [`ha_cluster_drbd_read`](#ha_cluster_drbd_read)
4. [`ha_cluster_drbd_al_writes`](#ha_cluster_al_writes)
5. [`ha_cluster_drbd_bm_writes`](#ha_cluster_bm_writes)
6. [`ha_cluster_drbd_upper_pending`](#ha_cluster_drbd_upper_pending)
7. [`ha_cluster_drbd_lower_pending`](#ha_cluster_drbd_lower_pending)
8. [`ha_cluster_drbd_connections`](#ha_cluster_drbd_connections)
9. [`ha_cluster_drbd_connections_sync`](#ha_cluster_drbd_connections_sync)
10. [`ha_cluster_drbd_connections_received`](#ha_cluster_drbd_connections_received)
11. [`ha_cluster_drbd_connections_sent`](#ha_cluster_drbd_connections_sent)
12. [`ha_cluster_drbd_connections_pending`](#ha_cluster_drbd_connections_pending)
13. [`ha_cluster_drbd_connections_unacked`](#ha_cluster_drbd_connections_unacked)
14. [`ha_cluster_drbd_split_brain`](#ha_cluster_drbd_split_brain)

### `ha_cluster_drbd_connections`

Expand All @@ -226,13 +236,69 @@ Either the value is `1`, or the line is absent altogether.

The total number of lines for this metric will be the cardinality of `resource` times the cardinality of `peer_node_id`.


### `ha_cluster_drbd_connections_sync`

#### Descriptions
#### Description

The DRBD disk connections in sync percentage. Values are float from `0` to `100.00`.

#### Labels

- `resource`: the resource this connection is for.
- `peer_node_id`: the id of the node this connection is for
- `volume`: the volume number

### `ha_cluster_drbd_connections_received`

#### Description

Volume of net data received from the partner via the network connection in KiB; 1 line per per `resource`, per `peer_node_id`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the resource this connection is for.
- `peer_node_id`: the id of the node this connection is for
- `volume`: the volume number

### `ha_cluster_drbd_connections_sent`

#### Description

Volume of net data sent to the partner via the network connection in KiB; 1 line per per `resource`, per `peer_node_id`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the resource this connection is for.
- `peer_node_id`: the id of the node this connection is for
- `volume`: the volume number

### `ha_cluster_drbd_connections_pending`

#### Description

Number of requests sent to the partner that have not yet been received; 1 line per per `resource`, per `peer_node_id`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the resource this connection is for.
- `peer_node_id`: the id of the node this connection is for
- `volume`: the volume number

### `ha_cluster_drbd_connections_unacked`

#### Description

Number of requests received by the partner but have not yet been acknowledged; 1 line per per `resource`, per `peer_node_id`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the resource this connection is for.
- `peer_node_id`: the id of the node this connection is for
- `volume`: the volume number

### `ha_cluster_drbd_resources`

Expand All @@ -250,14 +316,86 @@ Either the value is `1`, or the line is absent altogether.

The total number of lines for this metric will be the cardinality of `name` times the cardinality of `volume`.

### `ha_cluster_drbd_written`

#### Description

Amount in KiB written to the DRBD resource; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_read`

#### Description

Amount in KiB read from the DRBD resource; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_al_writes`

#### Description

Number of updates of the activity log area of the meta data; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_bm_writes`

#### Description

Number of updates of the bitmap area of the meta data; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_upper_pending`

#### Description

Number of block I/O requests forwarded to DRBD, but not yet answered by DRBD; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_lower_pending`

#### Description

Number of open requests to the local I/O sub-system issued by DRBD; 1 line per `resource`, per `volume`
Value is an integer greater than or equal to `0`.

#### Labels

- `resource`: the name of the resource.
- `volume`: the volume number

### `ha_cluster_drbd_split_brain`

#### Description

This metric signal if there is a split brain occuring per resource and volume.
This metric signal if there is a split brain occurring per resource and volume.
Either the value is `1`, or the line is absent altogether.

This metric is a special metric comparing to others, because in order to make this metric working you will need to set a drbd customer split-brain handler. Look at the end
This metric is a special metric compared to others, because in order to make this metric work you will need to setup a DRBD custom split-brain handler. Look at the end.

#### Labels

Expand All @@ -273,7 +411,7 @@ In order to get the `split_brain` metric working:
get the hook from:
https://github.com/SUSE/ha-sap-terraform-deployments/blob/72c9d3ecf6c3f6dd18ccb7bcbde4b40722d5c641/salt/drbd_node/files/notify-split-brain-haclusterexporter-suse-metric.sh

2) on the drbd configuration enable the hook:
2) on the DRBD configuration enable the hook:

```split_brain: "/usr/lib/drbd/notify-split-brain-haclusterexporter-suse-metric.sh"```

Expand Down
48 changes: 44 additions & 4 deletions drbd_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,23 @@ type drbdStatus struct {
Role string `json:"role"`
Devices []struct {
Volume int `json:"volume"`
Written int `json:"written"`
Read int `json:"read"`
AlWrites int `json:"al-writes"`
BmWrites int `json:"bm-writes"`
UpPending int `json:"upper-pending"`
LoPending int `json:"lower-pending"`
DiskState string `json:"disk-state"`
} `json:"devices"`
Connections []struct {
PeerNodeID int `json:"peer-node-id"`
PeerRole string `json:"peer-role"`
PeerDevices []struct {
Volume int `json:"volume"`
Received int `json:"received"`
Sent int `json:"sent"`
Pending int `json:"pending"`
Unacked int `json:"unacked"`
PeerDiskState string `json:"peer-disk-state"`
PercentInSync float64 `json:"percent-in-sync"`
} `json:"peer_devices"`
Expand All @@ -36,10 +46,20 @@ var (
drbdMetrics = metricDescriptors{
// the map key will function as an identifier of the metric throughout the rest of the code;
// it is arbitrary, but by convention we use the actual metric name
"resources": NewMetricDesc("drbd", "resources", "The DRBD resources; 1 line per name, per volume", []string{"resource", "role", "volume", "disk_state"}),
"connections": NewMetricDesc("drbd", "connections", "The DRBD resource connections; 1 line per per resource, per peer_node_id", []string{"resource", "peer_node_id", "peer_role", "volume", "peer_disk_state"}),
"connections_sync": NewMetricDesc("drbd", "connections_sync", "The in sync percentage value for DRBD resource connections", []string{"resource", "peer_node_id", "volume"}),
"split_brain": NewMetricDesc("drbd", "split_brain", "Whether a split brain has been detected; 1 line per resource, per volume.", []string{"resource", "volume"}),
"resources": NewMetricDesc("drbd", "resources", "The DRBD resources; 1 line per name, per volume", []string{"resource", "role", "volume", "disk_state"}),
"written": NewMetricDesc("drbd", "written", "KiB written to DRBD; 1 line per res, per volume", []string{"resource", "volume"}),
"read": NewMetricDesc("drbd", "read", "KiB read from DRBD; 1 line per res, per volume", []string{"resource", "volume"}),
"al_writes": NewMetricDesc("drbd", "al_writes", "Writes to activity log; 1 line per res, per volume", []string{"resource", "volume"}),
"bm_writes": NewMetricDesc("drbd", "bm_writes", "Writes to bitmap; 1 line per res, per volume", []string{"resource", "volume"}),
"upper_pending": NewMetricDesc("drbd", "upper_pending", "Upper pending; 1 line per res, per volume", []string{"resource", "volume"}),
"lower_pending": NewMetricDesc("drbd", "lower_pending", "Lower pending; 1 line per res, per volume", []string{"resource", "volume"}),
"connections": NewMetricDesc("drbd", "connections", "The DRBD resource connections; 1 line per per resource, per peer_node_id", []string{"resource", "peer_node_id", "peer_role", "volume", "peer_disk_state"}),
"connections_sync": NewMetricDesc("drbd", "connections_sync", "The in sync percentage value for DRBD resource connections", []string{"resource", "peer_node_id", "volume"}),
"connections_received": NewMetricDesc("drbd", "connections_received", "KiB received per connection", []string{"resource", "peer_node_id", "volume"}),
"connections_sent": NewMetricDesc("drbd", "connections_sent", "KiB sent per connection", []string{"resource", "peer_node_id", "volume"}),
"connections_pending": NewMetricDesc("drbd", "connections_pending", "Pending value per connection", []string{"resource", "peer_node_id", "volume"}),
"connections_unacked": NewMetricDesc("drbd", "connections_unacked", "Unacked value per connection", []string{"resource", "peer_node_id", "volume"}),
"split_brain": NewMetricDesc("drbd", "split_brain", "Whether a split brain has been detected; 1 line per resource, per volume.", []string{"resource", "volume"}),
}
)

Expand Down Expand Up @@ -89,6 +109,18 @@ func (c *drbdCollector) Collect(ch chan<- prometheus.Metric) {
for _, device := range resource.Devices {
// the `resources` metric value is always 1, otherwise it's absent
ch <- c.makeGaugeMetric("resources", float64(1), resource.Name, resource.Role, strconv.Itoa(device.Volume), strings.ToLower(device.DiskState))

ch <- c.makeGaugeMetric("written", float64(device.Written), resource.Name, strconv.Itoa(device.Volume))

ch <- c.makeGaugeMetric("read", float64(device.Read), resource.Name, strconv.Itoa(device.Volume))

ch <- c.makeGaugeMetric("al_writes", float64(device.AlWrites), resource.Name, strconv.Itoa(device.Volume))

ch <- c.makeGaugeMetric("bm_writes", float64(device.BmWrites), resource.Name, strconv.Itoa(device.Volume))

ch <- c.makeGaugeMetric("upper_pending", float64(device.UpPending), resource.Name, strconv.Itoa(device.Volume))

ch <- c.makeGaugeMetric("lower_pending", float64(device.LoPending), resource.Name, strconv.Itoa(device.Volume))
}
if len(resource.Connections) == 0 {
log.Warnf("Could not retrieve connection info for resource '%s'\n", resource.Name)
Expand All @@ -106,6 +138,14 @@ func (c *drbdCollector) Collect(ch chan<- prometheus.Metric) {

ch <- c.makeGaugeMetric("connections_sync", float64(peerDev.PercentInSync), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume))

ch <- c.makeGaugeMetric("connections_received", float64(peerDev.Received), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume))

ch <- c.makeGaugeMetric("connections_sent", float64(peerDev.Sent), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume))

ch <- c.makeGaugeMetric("connections_pending", float64(peerDev.Pending), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume))

ch <- c.makeGaugeMetric("connections_unacked", float64(peerDev.Unacked), resource.Name, strconv.Itoa(conn.PeerNodeID), strconv.Itoa(peerDev.Volume))

}
}
}
Expand Down
80 changes: 60 additions & 20 deletions drbd_metrics_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ func TestDrbdParsing(t *testing.T) {
"client": false,
"quorum": true,
"size": 409600,
"read": 0,
"written": 548525,
"al-writes": 4,
"bm-writes": 0,
"upper-pending": 0,
"lower-pending": 0
"read": 654321,
"written": 123456,
"al-writes": 123,
"bm-writes": 321,
"upper-pending": 1,
"lower-pending": 2
}
],
"connections": [
Expand All @@ -46,11 +46,11 @@ func TestDrbdParsing(t *testing.T) {
"peer-disk-state": "UpToDate",
"peer-client": false,
"resync-suspended": "no",
"received": 548525,
"sent": 0,
"received": 456,
"sent": 654,
"out-of-sync": 0,
"pending": 0,
"unacked": 0,
"pending": 3,
"unacked": 4,
"has-sync-details": false,
"has-online-verify-details": false,
"percent-in-sync": 100
Expand All @@ -73,12 +73,12 @@ func TestDrbdParsing(t *testing.T) {
"client": false,
"quorum": true,
"size": 10200,
"read": 0,
"written": 546005,
"al-writes": 1,
"bm-writes": 0,
"upper-pending": 0,
"lower-pending": 0
"read": 654321,
"written": 123456,
"al-writes": 123,
"bm-writes": 321,
"upper-pending": 1,
"lower-pending": 2
}
],
"connections": [
Expand All @@ -97,11 +97,11 @@ func TestDrbdParsing(t *testing.T) {
"peer-disk-state": "UpToDate",
"peer-client": false,
"resync-suspended": "no",
"received": 546005,
"sent": 0,
"received": 456,
"sent": 654,
"out-of-sync": 0,
"pending": 0,
"unacked": 0,
"pending": 3,
"unacked": 4,
"has-sync-details": false,
"has-online-verify-details": false,
"percent-in-sync": 99.8
Expand Down Expand Up @@ -142,6 +142,46 @@ func TestDrbdParsing(t *testing.T) {
t.Errorf("volumes should be 0")
}

if 123456 != drbdDevs[0].Devices[0].Written {
t.Errorf("written should be 123456")
}

if 654321 != drbdDevs[0].Devices[0].Read {
t.Errorf("read should be 654321")
}

if 123 != drbdDevs[0].Devices[0].AlWrites {
t.Errorf("al-writes should be 123")
}

if 321 != drbdDevs[0].Devices[0].BmWrites {
t.Errorf("bm-writes should be 321")
}

if 1 != drbdDevs[0].Devices[0].UpPending {
t.Errorf("upper-pending should be 1")
}

if 2 != drbdDevs[0].Devices[0].LoPending {
t.Errorf("lower-pending should be 2")
}

if 456 != drbdDevs[0].Connections[0].PeerDevices[0].Received {
t.Errorf("received should be 456")
}

if 654 != drbdDevs[0].Connections[0].PeerDevices[0].Sent {
t.Errorf("sent should be 654")
}

if 3 != drbdDevs[0].Connections[0].PeerDevices[0].Pending {
t.Errorf("pending should be 3")
}

if 4 != drbdDevs[0].Connections[0].PeerDevices[0].Unacked {
t.Errorf("unacked should be 4")
}

if 100 != drbdDevs[0].Connections[0].PeerDevices[0].PercentInSync {
t.Errorf("PercentInSync doesn't correspond! fail got %f", drbdDevs[0].Connections[0].PeerDevices[0].PercentInSync)
}
Expand Down
Loading

0 comments on commit 1da69e1

Please sign in to comment.