Traceability of Proposal flow #3429

pinebit · 2024-12-18T15:03:01Z

🎯 Problem to be solved

To better understand the timing breakdown for the entire proposal duty flow in real clusters (both in test and production), we aim to visualize the time spent on tasks such as querying BN, achieving consensus, and other processes for specific duties or slots. This data will help us investigate missing proposals more quickly and, if necessary, adjust consensus timings appropriately or shift focus to other significant contributors to delays.

Charon currently has initial support for Jaeger, which has not been widely used for debugging or production purposes due to the high telemetry traffic it generates. The proposed solution is to start using Grafana Tempo (alongside Prometheus and Loki) as the server for collecting telemetry events in production, while narrowing the tracing scope to the Propose duty only, to minimize traffic.

Under this ticket, we need to revisit the existing Jaeger-specific code and CLI flags, making them universal by adopting the OpenTelemetry library. This library is flexible enough to support tracing with Jaeger, Tempo, and other protocols. Additionally, we will need to eliminate or disable most of the existing tracing for other duties or HTTP calls, or make it conditional.

🛠️ Proposed solution

Change the existing Jeager support to work with Tempo (better - universal way).
Ensure the existing tracing calls are disabled or removed.
Ensure the entire Propose flow is fully covered with sufficient spans to give us the full picture on timings.
Work with Platform team to set up a Tempo server instance for our clients.
Change *CDVN to include Tempo instance and Charon CLI flags to use it.
Test with Kurtosis and Canary clusters.

🧪 Tests

Tested by new automated unit/integration/smoke tests
Manually tested on core team/canary/test clusters
Manually tested on local compose simnet

pinebit added proposal protocol Protocol Team tickets labels Dec 18, 2024

pinebit mentioned this issue Dec 19, 2024

app: eth2wrap latency logging #3417

Open

pinebit added this to the v1.3.0 milestone Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traceability of Proposal flow #3429

Traceability of Proposal flow #3429

pinebit commented Dec 18, 2024 •

edited

Loading

Traceability of Proposal flow #3429

Traceability of Proposal flow #3429

Comments

pinebit commented Dec 18, 2024 • edited Loading

🎯 Problem to be solved

🛠️ Proposed solution

🧪 Tests

pinebit commented Dec 18, 2024 •

edited

Loading