Skip to content

Releases: ClusterLabs/ha_cluster_exporter

1.0.0-beta

04 Feb 10:16
b57378e
Compare
Choose a tag to compare
1.0.0-beta Pre-release
Pre-release

Changed

  • BC Break - Metrics timestamps are now opt-in and disabled by default. The old behaviour can be kept via the --enable-timestamps CLI flag / config option. (#118)
  • BC Break - Default TCP listening port changed from 9002 to 9964. The old behaviour can be kept via the --port CLI flags / config option. (#122)

0.4.0

13 Dec 15:20
bc84e82
Compare
Choose a tag to compare
0.4.0 Pre-release
Pre-release

Added

  • Added more new DRBD metrics (#106 #108)

Changed

  • OBS builds don't reuse the binaries built in Travis anymore, due to compliance requirements (#105)

Fixed

Removed

  • Useless warning about missing DRBD split-brain notifications directory (#110)

0.3.0

28 Nov 12:40
a8d58b7
Compare
Choose a tag to compare
0.3.0 Pre-release
Pre-release

Added

  • Brand new ha_cluster_drbd_split_brain metric (#100)
  • Travis "Build & release" job automatically publishes built assets to GH releases and deploys tags in OBS. (#102)

Changed

  • BC Break - Refactor Pacemaker location constraints metric (#99)

Fixed

  • DRBD default path now corresponds with the DBRD packages of the distros we use and support (#101)

Configuration and various fixes

15 Nov 11:35
c5f4bd1
Compare
Choose a tag to compare
Pre-release

Added

  • Configuration mechanism via file and CLI flags (#91)
  • SBD devices total count metric ha_cluster_sbd_devices_total (#92)

Changed

  • BC break: rename some labels to make aggregated promql queries possible (#94)

Fixed

  • Paths to external tools are no longer hardcoded (#87)
  • BC break: ha_cluster_pacemaker_config_last_change now uses its value as the last config timestamp, rather than its timestamp itself (#89)
  • ha_cluster_drbd_connections_sync number format (#86)

RPMs

https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=18

New Metrics (DRBD, Pacemaker etc)

31 Oct 09:05
b2f09e1
Compare
Choose a tag to compare
Pre-release

Added

CLI

  • address flag to bind to specific addresses (#82)
  • level flag to change the log level (#82)

Metrics

  • ha_cluster_pacemaker_fail_count and ha_cluster_pacemaker_migration_threshold (#71)
  • ha_cluster_pacemaker_config_last_change to track if the cluster CIB changes (#80)
  • ha_cluster_drbd_connections_sync to track sync percentage of DRBD connections (#75)
  • ha_cluster_pacemaker_constraints to track resource constraints (#84)

Other

  • a root HTML landing page in the HTTP listener (#73)

RPMs

https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=17

Refactorings, docs and new metrics

30 Oct 16:49
78703c2
Compare
Choose a tag to compare
Pre-release

Added

  • All the metrics are now timestamped (#67)
  • New ha_cluster_pacemaker_stonith_enabled metric (#68)
  • Comprehensive metrics documentation
  • RPM package spec and systemd unit are now in the repo rather than in the OBS package.

Changed

  • Extensive refactoring to implement prometheus.Collector API (#66)
  • Various metrics and their labels were renamed (#61, #62)
  • Removed Go vendoring mode (#76)

Fixed

  • Metrics no longer need to be reset (#60)
  • TravisCI configuration (#74)

RPMs

https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter?rev=9

DRBD monitoring, refactoring and small improvements

18 Oct 10:24
Compare
Choose a tag to compare

breaking changes:

  • all metrics will have now a uniform prefix ha_cluster (#48).
    Example: from corosync_quorate 1 to ha_cluster_corosync_quorate 1

new metrics:

    1. drbd metric local disk resource monitoring (#26)
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-0",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="1-single-1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg1",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg2",role="Secondary",volume="0"} 1
ha_cluster_drbd_resource{disk_state="uptodate",resource_name="vg3",role="Secondary",volume="0"} 1
    1. Monitor remote DRBD disk resource
ha_cluster_drbd_resources_remote_connection{peer_disk_state="dunknown",peer_node_id="0",peer_role="Unknown",resource_name="drbd_passive",volume="0"} 1

Improvements:

  • pacemaker metrics refactored in separated file and implemented unit-tests (#45)

  • introduced level logging (#48)

  • introduced wrapping errors(#48)

  • handle error better: don't panic, just skip metric and print error in logs. (#23)

openSUSE Linux Packages:

You can find openSUSE packages:
https://build.opensuse.org/package/show/server:monitoring/prometheus-ha_cluster_exporter

Contributors:

Thanks a lot for all the contributors that made this release possible:
@stefanotorresi , @ashleyprimo , @mbiagetti @sathia27

SBD metrics

02 Oct 14:37
52785eb
Compare
Choose a tag to compare
SBD metrics Pre-release
Pre-release

Features:

  • add sbd metric status #19

The new metric will expose and monitor all SBD devices on cluster.

If a SBD device is unhealthy it will be set to 0, if it is healthy it will be set to 1

new metrics:

# HELP cluster_sbd_device_status cluster sbd status for each SBD device. 1 is healthy device, 0 is not
# TYPE cluster_sbd_device_status gauge
cluster_sbd_device_status{device_name="/dev/vdc"} 1
cluster_sbd_device_status{device_name="/dev/vdb"} 0

Corosync Quorum and RingErrors Metrics

30 Sep 13:49
d9c4995
Compare
Choose a tag to compare
Pre-release

Features:

  • add corosync metrics about ring error
  • add corosync quorum metrics.

tools:

  • enabled CI and tests

NEW Metrics:

# HELP corosync_quorate shows if the cluster is quorate. 1 cluster is quorate, 0 not
# TYPE corosync_quorate gauge
corosync_quorate 1
# HELP corosync_quorum cluster quorum information
# TYPE corosync_quorum gauge
corosync_quorum{type="expected_votes"} 2
corosync_quorum{type="highest_expected"} 2
corosync_quorum{type="quorum"} 1
corosync_quorum{type="total_votes"} 2
# HELP corosync_ring_errors_total Total number of ring errors in corosync
# TYPE corosync_ring_errors_total gauge
corosync_ring_errors_total 0

Metrics total:

# HELP cluster_node_resources metric inherent per node resources
# TYPE cluster_node_resources gauge
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_ip_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_saphana_prd_hdb00",role="master",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana02",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana02",resource_name="stonith-sbd",role="started",status="active"} 1
# HELP cluster_nodes cluster nodes metrics for all of them
# TYPE cluster_nodes gauge
cluster_nodes{node="dma-dog-hana01",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana01",type="member"} 1
cluster_nodes{node="dma-dog-hana01",type="online"} 1
cluster_nodes{node="dma-dog-hana02",type="dc"} 1
cluster_nodes{node="dma-dog-hana02",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana02",type="member"} 1
cluster_nodes{node="dma-dog-hana02",type="online"} 1
# HELP cluster_nodes_configured_total Number of nodes configured in ha cluster
# TYPE cluster_nodes_configured_total gauge
cluster_nodes_configured_total 2
# HELP cluster_resources_configured_total Number of total configured resources in ha cluster
# TYPE cluster_resources_configured_total gauge
cluster_resources_configured_total 6
# HELP corosync_quorate shows if the cluster is quorate. 1 cluster is quorate, 0 not
# TYPE corosync_quorate gauge
corosync_quorate 1
# HELP corosync_quorum cluster quorum information
# TYPE corosync_quorum gauge
corosync_quorum{type="expected_votes"} 2
corosync_quorum{type="highest_expected"} 2
corosync_quorum{type="quorum"} 1
corosync_quorum{type="total_votes"} 2
# HELP corosync_ring_errors_total Total number of ring errors in corosync
# TYPE corosync_ring_errors_total gauge
corosync_ring_errors_total 0

Metrics optimisation

18 Sep 15:34
Compare
Choose a tag to compare
Metrics optimisation Pre-release
Pre-release

Features:

Example metrics:

# HELP cluster_node_resources metric inherent per node resources
# TYPE cluster_node_resources gauge
cluster_node_resources{managed="false",node="dma-dog-hana01",resource_name="rsc_saphanatopology_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="rsc_ip_prd_hdb00",role="started",status="active"} 1
cluster_node_resources{managed="true",node="dma-dog-hana01",resource_name="stonith-sbd",role="started",status="active"} 1
# HELP cluster_nodes cluster nodes metrics for all of them
# TYPE cluster_nodes gauge
cluster_nodes{node="dma-dog-hana01",type="dc"} 1
cluster_nodes{node="dma-dog-hana01",type="expected_up"} 1
cluster_nodes{node="dma-dog-hana01",type="member"} 1
cluster_nodes{node="dma-dog-hana01",type="online"} 1
# HELP cluster_nodes_configured_total Number of nodes configured in ha cluster
# TYPE cluster_nodes_configured_total gauge
cluster_nodes_configured_total 1
# HELP cluster_resources_configured_total Number of total configured resources in ha cluster
# TYPE cluster_resources_configured_total gauge
cluster_resources_configured_total 5