You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you do?
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.
What did you expect to see?.
We expect to see consul_up give a value of 1 and be constant.
What did you see instead? Under which circumstances?
Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.
Environment
Prod
System information:
Linux 5.8.0-1041-aws x86_64
consul_exporter version:
0.7.1
Consul version:
Consul v1.8.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible
agents)
Prometheus version:
prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)
The text was updated successfully, but these errors were encountered:
nikashnarula
changed the title
Consul Fails to Query Service Health - consul_up is down ~60% of time
Consul Fails to Query Service Health - consul_up is down ~40% of time
Dec 16, 2022
What did you do?
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.
What did you expect to see?.
We expect to see consul_up give a value of 1 and be constant.
What did you see instead? Under which circumstances?
Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.
Environment
Prod
System information:
Linux 5.8.0-1041-aws x86_64
consul_exporter version:
0.7.1
Consul version:
Consul v1.8.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible
agents)
Prometheus version:
prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)
Prometheus configuration file:
prometheus_config.txt
Logs:
prometheus_consul_exporter_logs.txt
prometheus_logs.txt
The text was updated successfully, but these errors were encountered: