Consul Fails to Query Service Health - consul_up is down ~40% of time #255

nikashnarula · 2022-12-16T21:53:35Z

What did you do?
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.

What did you expect to see?.
We expect to see consul_up give a value of 1 and be constant.

What did you see instead? Under which circumstances?
Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.

Environment
Prod

System information:

Linux 5.8.0-1041-aws x86_64
consul_exporter version:

0.7.1
Consul version:

Consul v1.8.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible
agents)
Prometheus version:

prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)
Prometheus configuration file:

prometheus_config.txt
Logs:
prometheus_consul_exporter_logs.txt
prometheus_logs.txt

nikashnarula · 2022-12-20T17:56:19Z

Hi,
Any update on this from the consul_exporter team?

iamPrakhar · 2023-09-01T10:23:39Z

Even we are facing this issue , is there any setting related to consul exporter that we should change ?

Thanks

nikashnarula changed the title ~~Consul Fails to Query Service Health - consul_up is down ~60% of time~~ Consul Fails to Query Service Health - consul_up is down ~40% of time Dec 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

nikashnarula commented Dec 16, 2022

nikashnarula commented Dec 20, 2022

iamPrakhar commented Sep 1, 2023

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Comments

nikashnarula commented Dec 16, 2022

nikashnarula commented Dec 20, 2022

iamPrakhar commented Sep 1, 2023