Integrate OpenTelemetry and Prometheus #106
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Two new util crates are introduced:
observability
common things for logs, tracing, and metricsgraceful-shutdown
mini-crate for signal handlingGraceful shutdown was implemented for
api-server
. Currently only waits for pending HTTP requests to finish, not other protocols.Bunyan log format was removed - instead the apps will auto-detect the environment:
stderr
points to the TTY (developer mode) it will use pretty text format (see example below)tracing
's built-in json format:Better defaults for HTTP tracing were configured, the example below shows span start/end (note method, route span fields) along with
HTTP request
andHTTP response
events that include full URI, headers, and latency:When
OTLP_ENDPOINT
env var is provided an OpenTelemetry layer will be configured:trace_id
will appear in the root span allowing us to link logs to traces in Grafanagrpc
to the OTEL collectorNew system endpoints are proposed:
/system/health?type={liveness,readiness,startup}
- using k8s semantics/system/metrics
for Prometheus metricsOrder of middlewares was modified to have
/system/health
outside of tracing middleware not to produce too much spam.No-op health endpoints were added to
api-server
andoracle-provider
apps.Prometheus metrics were added to
oracle-provider
app as a test of integration.