Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a health check endpoint #643

Open
mwasilew2 opened this issue Nov 27, 2024 · 3 comments · May be fixed by #644
Open

Expose a health check endpoint #643

mwasilew2 opened this issue Nov 27, 2024 · 3 comments · May be fixed by #644

Comments

@mwasilew2
Copy link

It would be great if the collector exposed a health check that could be used for determining if the process is healthy or not.

We run this collector in ECS. For some reason, it got stuck and wasn't sending any diagnostics upstream. We had to manually restart it to make it work again. If the collector exposed a health check endpoint, we could include it in the ECS task definition which would prevent this from happening.

I imagine that a health check endpoint would be useful for other deployment methods as well.

@mwasilew2 mwasilew2 changed the title Add support for a health check Expose a health check endpoint Nov 27, 2024
mwasilew2 added a commit to mwasilew2/collector that referenced this issue Dec 2, 2024
This exposes a basic healtcheck server. Internally, the healtcheck handler doesn't do anything complex.
The idea is to just check if the process is responding.

Signed-off-by: Michal Wasilewski <[email protected]>
@mwasilew2 mwasilew2 linked a pull request Dec 2, 2024 that will close this issue
@seanlinsley
Copy link
Member

Hi @mwasilew2, we recently fixed an issue in #637 which might have been what you were seeing. Could you upgrade to version 0.63.0 and confirm if the collector is still getting stuck?

@msakrejda
Copy link
Contributor

msakrejda commented Dec 2, 2024

In addition to that, unfortunately, given the current architecture of the collector, I'm not sure if there's good places for a health check to hook into to make a meaningful assertion about the state of the process. As I commented on #644, I don't think a basic health check would have helped, and I don't see an easy path to a more advanced health check without significant re-architecting. Thoughts?

@mwasilew2
Copy link
Author

we recently fixed an issue in #637 which might have been what you were seeing. Could you upgrade to version 0.63.0 and confirm if the collector is still getting stuck?

Thanks a lot, I'll bump our deployed version. Although, the current version was running fine for weeks, so it's possible the problem won't occur immediately (if at all).

I don't think a basic health check would have helped, and I don't see an easy path to a more advanced health check without significant re-architecting

that makes sense, I left a comment in #644 , I'm ok with closing it too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants