feat(rpc): Sync endpoint #443

HermanObst · 2024-12-26T18:51:58Z

Is there an existing issue?

I have searched the existing issues

Motivation

A dedicated syncing status endpoint is essential for production deployments for several reasons:

Health Monitoring

Provides real-time status of node synchronization
Enables monitoring systems to detect out-of-sync nodes quickly
Helps maintain overall system reliability

Load Balancing & Orchestration

Allows Kubernetes to make intelligent routing decisions
Facilitates automatic traffic routing to fully synced nodes
Supports graceful handling of nodes that fall behind

Production Readiness

While not critical for local development
Essential for production environments where uptime and data consistency are crucial
Enables automated failover mechanisms

Request

When running Madara in full node mode, we need an endpoint that indicates whether the node is currently syncing or not. Similar functionality exists in Juno, where this signal is used for orchestration logic in Kubernetes environments.

Solution

Add a new endpoint that returns the node's current syncing status, similar to Juno's implementation. This should be integrated with Madara's existing API infrastructure.

Are you willing to help with this request?

Yes!

jbcaron · 2024-12-26T22:26:43Z

Looking at this feature request for a dedicated syncing status endpoint, I have a question: Wouldn't the existing starknet_syncing endpoint already fulfill these requirements?

The endpoint already returns either:

false when the node is not syncing
a detailed SyncStatus struct when syncing is in progress, containing:

pub struct SyncStatus<F> {
    /// The hash of the current block being synchronized
    pub current_block_hash: BlockHash<F>,
    /// The number (height) of the current block being synchronized
    pub current_block_num: BlockNumber,
    /// The hash of the estimated highest block to be synchronized
    pub highest_block_hash: BlockHash<F>,
    /// The number (height) of the estimated highest block to be synchronized
    pub highest_block_num: BlockNumber,
    /// The hash of the block from which the sync started
    pub starting_block_hash: BlockHash<F>,
    /// The number (height) of the block from which the sync started
    pub starting_block_num: BlockNumber,
}

This seems to provide all the necessary information for health monitoring, load balancing and production orchestration. Could you clarify what additional functionality would be needed beyond what this endpoint currently offers?

Mohiiit · 2024-12-27T08:57:37Z

Hey, noticed something about the syncing logic that seems a bit off:

pub async fn syncing(starknet: &Starknet) -> StarknetRpcResult<SyncingStatus<Felt>> {
    // obtain best seen (highest) block number
    let Some(current_block_info) = starknet
        .backend
        .get_block_info(&BlockId::Tag(BlockTag::Latest))
        .or_internal_server_error("Error getting latest block")?
    else {
        return Ok(SyncingStatus::NotSyncing); // TODO: This doesn't really make sense? This can only happen when there are no block in the db at all.
    };
    let current_block_info =
        current_block_info.as_nonpending().ok_or_internal_server_error("Latest block cannot be pending")?;
    let starting_block_num = 0; // TODO(rpc): fix this // starknet.starting_block;
    let starting_block_info = starknet.get_block_info(&BlockId::Number(starting_block_num))?;
    let starting_block_info =
        starting_block_info.as_nonpending().ok_or_internal_server_error("Block cannot be pending")?;
    let starting_block_hash = starting_block_info.block_hash;
    let current_block_num = current_block_info.header.block_number;
    let current_block_hash = current_block_info.block_hash;
    Ok(SyncingStatus::Syncing(SyncStatus {
        starting_block_num,
        starting_block_hash,
        highest_block_num: current_block_num, // TODO(merge): is this correct?
        highest_block_hash: current_block_hash,
        current_block_num,
        current_block_hash,
    }))
}

Shouldn't we be checking with the feeder-gateway here? Like, the highest_block_num should really be coming from the feeder-gateway response, right?

And yeah, while this could tell us if a Madara full node is syncing or not, there's another thing for k8s clusters - we'd mainly be looking at status codes. So we need an endpoint that tells us straight up if it's syncing or not. Maybe return 200 when it's syncing, or something else if it's not.

Or we could flip it - 200 when it's synced, different status code when it's not. This is super important because when we're auto-scaling to handle lots of requests, we'll be spinning up more Madara nodes as full nodes. Before we start routing traffic to any new node, we need to know if it's actually ready to handle requests.

HermanObst mentioned this issue Dec 26, 2024

feat(tracking): Madara core #413

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rpc): Sync endpoint #443

feat(rpc): Sync endpoint #443

HermanObst commented Dec 26, 2024

jbcaron commented Dec 26, 2024

Mohiiit commented Dec 27, 2024

feat(rpc): Sync endpoint #443

feat(rpc): Sync endpoint #443

Comments

HermanObst commented Dec 26, 2024

Is there an existing issue?

Motivation

Request

Solution

Are you willing to help with this request?

jbcaron commented Dec 26, 2024

Mohiiit commented Dec 27, 2024