Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frequent timeout with many sites to monitor #100

Open
osallou opened this issue Oct 2, 2018 · 1 comment
Open

frequent timeout with many sites to monitor #100

osallou opened this issue Oct 2, 2018 · 1 comment

Comments

@osallou
Copy link
Contributor

osallou commented Oct 2, 2018

Hi,
I have a setup with ~100 sites to monitor in config.yml
software works but I have lots of incidents/outage on sites with error "net/http: request canceled (Client.Timeout exceeded while awaiting headers)"

I tested one of the site alone inthe config (with a different config.yml but no same server) and it always shows up site up and running, no timeout/errors.

So the problem seems to be when monitor gets "too many" sites to monitor.

@osallou
Copy link
Contributor Author

osallou commented Oct 2, 2018

After some testing:

it appears that on client.Do call request, requests look pending (manage one after the other), and time is increasing.
Displaying lag shows that lag is getting higher at each managed monitor.

It seems that http timeout is set for all monitors and starts at each tick, but request is not yet sent (concurrency, done one after the other). This leads to a timeout for all requests that are managed after timeout value, and lag value is also wrong (as it cumulates response time for all requests).
So either you have 1 cpu (go max procs) per monitor and everything will be fine, either you get wrong data (and checks) with too many monitors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant