You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The admin dashboard, and the metrics apis available (like /admin/metrics) are interesting to help us investigate the current status synchronizer, but they are not very useful for monitoring the system over time. They don't have a historical view of things like queue size or lambdas, and it also requires you to remember to check that dashboard.
It would be good to have a way to pipe these metrics out of the synchronizer, and into a tool designed for monitoring and alerting, such as Datadog or Prometheus. We would like to monitor the queue size and trigger alerts based on queue size or bad lambda values.
I can think of a few good options for this, what do others think?
adding a StatsD client metric exporter to the synchronizer (like the one built by etsy or datadog). I know it's easy to pass statsd into Datadog, not sure about Prometheus.
Creating a custom Datadog Integration (which would probably call the existing admin REST api and convert it to datadog metrics. Downside is it requires separate maintenance and is datadog specific, but might provide a better experience for customers using datadog like us.
The text was updated successfully, but these errors were encountered:
Hi Russel, You can use /admin/events/queueSize & /admin/impressions/queueSize, if the size is growing then the synchronizer is not catching up to the incoming impressions/events. This is where the lambda value is calculated in admin dashboard.
You can send this value to Datadog.
The admin dashboard, and the metrics apis available (like /admin/metrics) are interesting to help us investigate the current status synchronizer, but they are not very useful for monitoring the system over time. They don't have a historical view of things like queue size or lambdas, and it also requires you to remember to check that dashboard.
It would be good to have a way to pipe these metrics out of the synchronizer, and into a tool designed for monitoring and alerting, such as Datadog or Prometheus. We would like to monitor the queue size and trigger alerts based on queue size or bad lambda values.
I can think of a few good options for this, what do others think?
The text was updated successfully, but these errors were encountered: