Skip to content

Commit

Permalink
Merge pull request #58 from MalloZup/documentation-metric
Browse files Browse the repository at this point in the history
 document metrics specification (initial effort and skeleton, pacemaker)
  • Loading branch information
MalloZup authored Oct 17, 2019
2 parents 6f568db + 3b6e564 commit 46aee69
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ For a terraform deployment you can also read: https://github.com/SUSE/ha-sap-ter

- show SBD disk health metrics

- show DRBD metrics (local and remote disks resource metrics)

We mantain a complete list of the [metric specification](doc/metric_spec.md), usage and possible values.

## Devel:

Build the binary with `make` and run it inside a node of the ha cluster, it will show the metrics on port `9002` by default.
Expand Down
112 changes: 112 additions & 0 deletions doc/metric_spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Metrics specification:

This is a specification of metrics exposed by the ha_cluster exporter.

All metrics from the exporter start with the prefix `ha_cluster`

Below you have a complete specification, ordered by component.

1. [pacemaker](#pacemaker)
2. [drbd](#drbd)
3. [sbd](#sbd)
4. [corosyncl](#corosync)

# Pacemaker

The Pacemaker cluster metrics are atomic metrics and represent and updated snapshot of the HA cluster, retrieved fetching the XML CIB of pacemaker.

Some of the pacemaker metrics like `ha_cluster_node_resources` and `ha_cluster_nodes` metrics with labels share a common trait:

they can be either set to `1` or they are absent, this is because they track the real state of the cluster resources monitored.

1. [ha_cluster_node_resources](#ha_cluster_node_resources)
2. [ha_cluster_nodes](#ha_cluster_nodes)
3. [ha_cluster_nodes_configured_total](#ha_cluster_nodes_configured_total)
4. [ha_cluster_resources_configured_total](#ha_cluster_resources_configured_total)



## ha_cluster_node_resources

This metric show the current status of a cluster resource.

A resource that previously was in the cluster but isn't anymore, will not monitored. Example:

```ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="cluster_md",role="started",status="active"} 1```

The metric will absent and not `0`


All the values are 1:1 with Pacemaker schema.

- `managed`: indicates `true` or `false` if the resource is managed in cluster
- `node_name`: name of node of cluster
- `resource_name`: resource id/name of the CIB pacemaker
- `role`: allowed values `Started/Stopped/Master/Slave` or pending state `Starting/Stopping/Migrating/Promoting/Demoting` which are same as pacemaker roles for resources.
- `status` allowed values `active/orphaned/blocked/failed/failureIgnored/` status of resource from pacemaker XML.
Additionaly for the same resource we can have a combination of status.

Example:

```
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="cluster_md",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="clvm",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="dlm",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="drbd_passive",role="master",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="fs_cluster_md",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="fs_drbd_passive",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="stonith-sbd",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b115",resource_name="vg_cluster_md",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b211",resource_name="dlm",role="started",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b211",resource_name="fs_cluster_md",role="stopped",status="active"} 1
ha_cluster_node_resources{managed="true",node_name="1b211",resource_name="vg_cluster_md",role="stopped",status="active"} 1
```

## ha_cluster_nodes

- `node_name`: name of cluster node
- `type`: allowed values `online/standby/standby_onfail/maintanance/pending/unclean/shutdown/expected_up/dc/member/ping/remote/`. This are the possible type of pacemaker ha cluster

Again here, when the resource is absent will be not showed. There is no `0` value, since it is a real snapshot from the HA cluster.
Examples:
```
ha_cluster_nodes{node_name="1b115",type="dc"} 1
ha_cluster_nodes{node_name="1b115",type="expected_up"} 1
ha_cluster_nodes{node_name="1b115",type="member"} 1
ha_cluster_nodes{node_name="1b115",type="online"} 1
ha_cluster_nodes{node_name="1b211",type="expected_up"} 1
ha_cluster_nodes{node_name="1b211",type="member"} 1
ha_cluster_nodes{node_name="1b211",type="online"} 1
```

## ha_cluster_nodes_configured_total

Show the total number of configured noded in the HA cluster

Example:

```
ha_cluster_nodes_configured_total 2
```


## ha_cluster_resources_configured_total

Show the total number of resource configured in HA cluster
Example:
```
ha_cluster_resources_configured_total 14
```


# Corosync

`TODO`

# Drbd

`TODO`@MalloZup

# SBD

`TODO`

0 comments on commit 46aee69

Please sign in to comment.