Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OpenTelemetry and Prometheus #106

Merged
merged 1 commit into from
Jul 23, 2024
Merged

Conversation

sergiimk
Copy link
Member

Two new util crates are introduced:

  • observability common things for logs, tracing, and metrics
  • graceful-shutdown mini-crate for signal handling

Graceful shutdown was implemented for api-server. Currently only waits for pending HTTP requests to finish, not other protocols.

Bunyan log format was removed - instead the apps will auto-detect the environment:

  • if stderr points to the TTY (developer mode) it will use pretty text format (see example below)
  • otherwise will use tracing's built-in json format:

Better defaults for HTTP tracing were configured, the example below shows span start/end (note method, route span fields) along with HTTP request and HTTP response events that include full URI, headers, and latency:

  2024-07-23T03:54:18.107150Z  INFO observability::axum: new
    at src/utils/observability/src/axum.rs:86 on tokio-runtime-worker
    in observability::axum::http_request with method: GET, route: /:account/:dataset

  2024-07-23T03:54:18.107294Z  INFO observability::axum: HTTP request, uri: /foo/bar?some=param, version: HTTP/1.1, headers: {"accept-encoding": "gzip, deflate, br", "user-agent": "xh/0.22.2", "connection": "keep-alive", "accept": "*/*", "host": "localhost:3003"}
    at src/utils/observability/src/axum.rs:33 on tokio-runtime-worker
    in observability::axum::http_request with method: GET, route: /:account/:dataset

  2024-07-23T03:54:18.107595Z  INFO observability::axum: HTTP response, status: 405, headers: {"content-length": "0", "access-control-allow-origin": "*", "vary": "origin", "vary": "access-control-request-method", "vary": "access-control-request-headers"}, latency: 0 ms
    at src/utils/observability/src/axum.rs:54 on tokio-runtime-worker
    in observability::axum::http_request with method: GET, route: /:account/:dataset

  2024-07-23T03:54:18.107806Z  INFO observability::axum: close, time.busy: 389µs, time.idle: 272µs
    at src/utils/observability/src/axum.rs:86 on tokio-runtime-worker
    in observability::axum::http_request with method: GET, route: /:account/:dataset

When OTLP_ENDPOINT env var is provided an OpenTelemetry layer will be configured:

  • trace_id will appear in the root span allowing us to link logs to traces in Grafana
  • Traces will be sent via grpc to the OTEL collector

New system endpoints are proposed:

  • /system/health?type={liveness,readiness,startup} - using k8s semantics
  • /system/metrics for Prometheus metrics

Order of middlewares was modified to have /system/health outside of tracing middleware not to produce too much spam.

No-op health endpoints were added to api-server and oracle-provider apps.

Prometheus metrics were added to oracle-provider app as a test of integration.

@sergiimk sergiimk force-pushed the feature/observability branch from 9453a60 to e6da15e Compare July 23, 2024 04:12
Copy link
Member

@s373r s373r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added several minor questions

src/app/oracle-provider/src/provider.rs Show resolved Hide resolved
src/utils/observability/src/axum.rs Show resolved Hide resolved
src/utils/observability/src/axum.rs Show resolved Hide resolved
src/utils/observability/src/config.rs Outdated Show resolved Hide resolved
@zaychenko-sergei
Copy link
Contributor

As I understood, this PR will have to wait for Hauki to configure the endpoint?

@sergiimk
Copy link
Member Author

As I understood, this PR will have to wait for Hauki to configure the endpoint?

@zaychenko-sergei , not really - we can deploy the new version and turn on OTLP_ENDPOINT later.

@sergiimk sergiimk force-pushed the feature/observability branch 2 times, most recently from 9f25f7f to c73aaa5 Compare July 23, 2024 17:39
@sergiimk sergiimk force-pushed the feature/observability branch from c73aaa5 to efd12d2 Compare July 23, 2024 17:53
@sergiimk sergiimk merged commit efd12d2 into master Jul 23, 2024
3 checks passed
@sergiimk sergiimk deleted the feature/observability branch July 23, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants