Releases: ClusterLabs/ha_cluster_exporter
1.0.0-beta
Changed
- BC Break - Metrics timestamps are now opt-in and disabled by default. The old behaviour can be kept via the
--enable-timestamps
CLI flag / config option. (#118) - BC Break - Default TCP listening port changed from 9002 to 9964. The old behaviour can be kept via the
--port
CLI flags / config option. (#122)
0.4.0
0.3.0
Added
- Brand new
ha_cluster_drbd_split_brain
metric (#100) - Travis "Build & release" job automatically publishes built assets to GH releases and deploys tags in OBS. (#102)
Changed
- BC Break - Refactor Pacemaker location constraints metric (#99)
Fixed
- DRBD default path now corresponds with the DBRD packages of the distros we use and support (#101)
Configuration and various fixes
Added
- Configuration mechanism via file and CLI flags (#91)
- SBD devices total count metric
ha_cluster_sbd_devices_total
(#92)
Changed
- BC break: rename some labels to make aggregated promql queries possible (#94)
Fixed
- Paths to external tools are no longer hardcoded (#87)
- BC break:
ha_cluster_pacemaker_config_last_change
now uses its value as the last config timestamp, rather than its timestamp itself (#89) ha_cluster_drbd_connections_sync
number format (#86)
RPMs
https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=18
New Metrics (DRBD, Pacemaker etc)
Added
CLI
Metrics
ha_cluster_pacemaker_fail_count
andha_cluster_pacemaker_migration_threshold
(#71)ha_cluster_pacemaker_config_last_change
to track if the cluster CIB changes (#80)ha_cluster_drbd_connections_sync
to track sync percentage of DRBD connections (#75)ha_cluster_pacemaker_constraints
to track resource constraints (#84)
Other
- a root HTML landing page in the HTTP listener (#73)
RPMs
https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=17
Refactorings, docs and new metrics
Added
- All the metrics are now timestamped (#67)
- New
ha_cluster_pacemaker_stonith_enabled
metric (#68) - Comprehensive metrics documentation
- RPM package spec and systemd unit are now in the repo rather than in the OBS package.
Changed
- Extensive refactoring to implement
prometheus.Collector
API (#66) - Various metrics and their labels were renamed (#61, #62)
- Removed Go vendoring mode (#76)
Fixed
RPMs
https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=9
DRBD monitoring, refactoring and small improvements
breaking changes:
- all metrics will have now a uniform prefix
ha_cluster
(#48).
Example: fromcorosync_quorate 1
toha_cluster_corosync_quorate 1
new metrics:
-
- drbd metric local disk resource monitoring (#26)
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-0",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg2",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg3",role="Secondary",volume="0"} 1
-
- Monitor remote DRBD disk resource
ha_cluster_drbd_resources_remote_connection{peer_disk_state="dunknown",peer_node_id="0",peer_role="Unknown",resource_name="drbd_passive",volume="0"} 1
Improvements:
-
pacemaker metrics refactored in separated file and implemented unit-tests (#45)
-
introduced level logging (#48)
-
introduced wrapping errors(#48)
-
handle error better: don't panic, just skip metric and print error in logs. (#23)
openSUSE Linux Packages:
You can find openSUSE packages:
https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter
Contributors:
Thanks a lot for all the contributors that made this release possible:
@stefanotorresi , @ashleyprimo , @mbiagetti @sathia27
SBD metrics
Features:
- add sbd metric status #19
The new metric will expose and monitor all SBD devices on cluster.
If a SBD device is unhealthy it will be set to 0
, if it is healthy it will be set to 1
new metrics:
# HELP cluster_sbd_device_status cluster sbd status for each SBD device. 1 is healthy device, 0 is not
# TYPE cluster_sbd_device_status gauge
cluster_sbd_device_status{device_name="/dev/vdc"} 1
cluster_sbd_device_status{device_name="/dev/vdb"} 0
Corosync Quorum and RingErrors Metrics
Features:
- add corosync metrics about ring error
- add corosync quorum metrics.
tools:
- enabled CI and tests
NEW Metrics:
# HELP corosync_quorate shows if the cluster is quorate. 1 cluster is quorate, 0 not
# TYPE corosync_quorate gauge
corosync_quorate 1
# HELP corosync_quorum cluster quorum information
# TYPE corosync_quorum gauge
corosync_quorum{type="expected_votes"} 2
corosync_quorum{type="highest_expected"} 2
corosync_quorum{type="quorum"} 1
corosync_quorum{type="total_votes"} 2
# HELP corosync_ring_errors_total Total number of ring errors in corosync
# TYPE corosync_ring_errors_total gauge
corosync_ring_errors_total 0
Metrics total:
# HELP cluster_node_resources metric inherent per node resources
# TYPE cluster_node_resources gauge
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_ip_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_saphana_prd_hdb00",role="master",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana02",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana02",resource_name="stonith-sbd",role="started",status="active"} 1
# HELP cluster_nodes cluster nodes metrics for all of them
# TYPE cluster_nodes gauge
cluster_nodes{node="dma-dog-hana01",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana01",type="member"} 1
cluster_nodes{node="dma-dog-hana01",type="online"} 1
cluster_nodes{node="dma-dog-hana02",type="dc"} 1
cluster_nodes{node="dma-dog-hana02",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana02",type="member"} 1
cluster_nodes{node="dma-dog-hana02",type="online"} 1
# HELP cluster_nodes_configured_total Number of nodes configured in ha cluster
# TYPE cluster_nodes_configured_total gauge
cluster_nodes_configured_total 2
# HELP cluster_resources_configured_total Number of total configured resources in ha cluster
# TYPE cluster_resources_configured_total gauge
cluster_resources_configured_total 6
# HELP corosync_quorate shows if the cluster is quorate. 1 cluster is quorate, 0 not
# TYPE corosync_quorate gauge
corosync_quorate 1
# HELP corosync_quorum cluster quorum information
# TYPE corosync_quorum gauge
corosync_quorum{type="expected_votes"} 2
corosync_quorum{type="highest_expected"} 2
corosync_quorum{type="quorum"} 1
corosync_quorum{type="total_votes"} 2
# HELP corosync_ring_errors_total Total number of ring errors in corosync
# TYPE corosync_ring_errors_total gauge
corosync_ring_errors_total 0
Metrics optimisation
Features:
- reintroduce configured metrics (MalloZup@20d1e8a)
- move
managed
to a label of the metric (MalloZup@36d8cb6)
Example metrics:
# HELP cluster_node_resources metric inherent per node resources
# TYPE cluster_node_resources gauge
cluster_node_resources{managed="false",node="dma-dog-hana01",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_ip_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="stonith-sbd",role="started",status="active"} 1
# HELP cluster_nodes cluster nodes metrics for all of them
# TYPE cluster_nodes gauge
cluster_nodes{node="dma-dog-hana01",type="dc"} 1
cluster_nodes{node="dma-dog-hana01",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana01",type="member"} 1
cluster_nodes{node="dma-dog-hana01",type="online"} 1
# HELP cluster_nodes_configured_total Number of nodes configured in ha cluster
# TYPE cluster_nodes_configured_total gauge
cluster_nodes_configured_total 1
# HELP cluster_resources_configured_total Number of total configured resources in ha cluster
# TYPE cluster_resources_configured_total gauge
cluster_resources_configured_total 5