Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traceability of Proposal flow #3429

Open
9 tasks
pinebit opened this issue Dec 18, 2024 · 0 comments
Open
9 tasks

Traceability of Proposal flow #3429

pinebit opened this issue Dec 18, 2024 · 0 comments
Labels
proposal protocol Protocol Team tickets
Milestone

Comments

@pinebit
Copy link
Contributor

pinebit commented Dec 18, 2024

🎯 Problem to be solved

To better understand the timing breakdown for the entire proposal duty flow in real clusters (both in test and production), we aim to visualize the time spent on tasks such as querying BN, achieving consensus, and other processes for specific duties or slots. This data will help us investigate missing proposals more quickly and, if necessary, adjust consensus timings appropriately or shift focus to other significant contributors to delays.

Charon currently has initial support for Jaeger, which has not been widely used for debugging or production purposes due to the high telemetry traffic it generates. The proposed solution is to start using Grafana Tempo (alongside Prometheus and Loki) as the server for collecting telemetry events in production, while narrowing the tracing scope to the Propose duty only, to minimize traffic.

Under this ticket, we need to revisit the existing Jaeger-specific code and CLI flags, making them universal by adopting the OpenTelemetry library. This library is flexible enough to support tracing with Jaeger, Tempo, and other protocols. Additionally, we will need to eliminate or disable most of the existing tracing for other duties or HTTP calls, or make it conditional.

🛠️ Proposed solution

  • Change the existing Jeager support to work with Tempo (better - universal way).
  • Ensure the existing tracing calls are disabled or removed.
  • Ensure the entire Propose flow is fully covered with sufficient spans to give us the full picture on timings.
  • Work with Platform team to set up a Tempo server instance for our clients.
  • Change *CDVN to include Tempo instance and Charon CLI flags to use it.
  • Test with Kurtosis and Canary clusters.

🧪 Tests

  • Tested by new automated unit/integration/smoke tests
  • Manually tested on core team/canary/test clusters
  • Manually tested on local compose simnet
@pinebit pinebit added proposal protocol Protocol Team tickets labels Dec 18, 2024
@pinebit pinebit added this to the v1.3.0 milestone Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

1 participant