Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Open
nikashnarula opened this issue Dec 16, 2022 · 2 comments
Open

Comments

@nikashnarula
Copy link

What did you do?
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.

What did you expect to see?.
We expect to see consul_up give a value of 1 and be constant.

What did you see instead? Under which circumstances?
Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.

consul_nodes_health

consul_uptime_graph

consul_uptime_value

Environment
Prod

  • System information:

    Linux 5.8.0-1041-aws x86_64

  • consul_exporter version:

    0.7.1

  • Consul version:

    Consul v1.8.0
    Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible
    agents)

  • Prometheus version:

    prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)

  • Prometheus configuration file:

    prometheus_config.txt

  • Logs:
    prometheus_consul_exporter_logs.txt
    prometheus_logs.txt

@nikashnarula nikashnarula changed the title Consul Fails to Query Service Health - consul_up is down ~60% of time Consul Fails to Query Service Health - consul_up is down ~40% of time Dec 16, 2022
@nikashnarula
Copy link
Author

Hi,
Any update on this from the consul_exporter team?

@iamPrakhar
Copy link

Even we are facing this issue , is there any setting related to consul exporter that we should change ?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants