Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rpc): Sync endpoint #443

Open
1 task done
Tracked by #413
HermanObst opened this issue Dec 26, 2024 · 2 comments
Open
1 task done
Tracked by #413

feat(rpc): Sync endpoint #443

HermanObst opened this issue Dec 26, 2024 · 2 comments

Comments

@HermanObst
Copy link

Is there an existing issue?

  • I have searched the existing issues

Motivation

A dedicated syncing status endpoint is essential for production deployments for several reasons:

Health Monitoring

Provides real-time status of node synchronization
Enables monitoring systems to detect out-of-sync nodes quickly
Helps maintain overall system reliability

Load Balancing & Orchestration

Allows Kubernetes to make intelligent routing decisions
Facilitates automatic traffic routing to fully synced nodes
Supports graceful handling of nodes that fall behind

Production Readiness

While not critical for local development
Essential for production environments where uptime and data consistency are crucial
Enables automated failover mechanisms

Request

When running Madara in full node mode, we need an endpoint that indicates whether the node is currently syncing or not. Similar functionality exists in Juno, where this signal is used for orchestration logic in Kubernetes environments.

Solution

Add a new endpoint that returns the node's current syncing status, similar to Juno's implementation. This should be integrated with Madara's existing API infrastructure.

Are you willing to help with this request?

Yes!

@jbcaron
Copy link
Member

jbcaron commented Dec 26, 2024

Looking at this feature request for a dedicated syncing status endpoint, I have a question: Wouldn't the existing starknet_syncing endpoint already fulfill these requirements?

The endpoint already returns either:

  • false when the node is not syncing
  • a detailed SyncStatus struct when syncing is in progress, containing:
pub struct SyncStatus<F> {
    /// The hash of the current block being synchronized
    pub current_block_hash: BlockHash<F>,
    /// The number (height) of the current block being synchronized
    pub current_block_num: BlockNumber,
    /// The hash of the estimated highest block to be synchronized
    pub highest_block_hash: BlockHash<F>,
    /// The number (height) of the estimated highest block to be synchronized
    pub highest_block_num: BlockNumber,
    /// The hash of the block from which the sync started
    pub starting_block_hash: BlockHash<F>,
    /// The number (height) of the block from which the sync started
    pub starting_block_num: BlockNumber,
}

This seems to provide all the necessary information for health monitoring, load balancing and production orchestration. Could you clarify what additional functionality would be needed beyond what this endpoint currently offers?

@Mohiiit
Copy link
Contributor

Mohiiit commented Dec 27, 2024

Hey, noticed something about the syncing logic that seems a bit off:

pub async fn syncing(starknet: &Starknet) -> StarknetRpcResult<SyncingStatus<Felt>> {
    // obtain best seen (highest) block number
    let Some(current_block_info) = starknet
        .backend
        .get_block_info(&BlockId::Tag(BlockTag::Latest))
        .or_internal_server_error("Error getting latest block")?
    else {
        return Ok(SyncingStatus::NotSyncing); // TODO: This doesn't really make sense? This can only happen when there are no block in the db at all.
    };
    let current_block_info =
        current_block_info.as_nonpending().ok_or_internal_server_error("Latest block cannot be pending")?;
    let starting_block_num = 0; // TODO(rpc): fix this // starknet.starting_block;
    let starting_block_info = starknet.get_block_info(&BlockId::Number(starting_block_num))?;
    let starting_block_info =
        starting_block_info.as_nonpending().ok_or_internal_server_error("Block cannot be pending")?;
    let starting_block_hash = starting_block_info.block_hash;
    let current_block_num = current_block_info.header.block_number;
    let current_block_hash = current_block_info.block_hash;
    Ok(SyncingStatus::Syncing(SyncStatus {
        starting_block_num,
        starting_block_hash,
        highest_block_num: current_block_num, // TODO(merge): is this correct?
        highest_block_hash: current_block_hash,
        current_block_num,
        current_block_hash,
    }))
}

Shouldn't we be checking with the feeder-gateway here? Like, the highest_block_num should really be coming from the feeder-gateway response, right?

And yeah, while this could tell us if a Madara full node is syncing or not, there's another thing for k8s clusters - we'd mainly be looking at status codes. So we need an endpoint that tells us straight up if it's syncing or not. Maybe return 200 when it's syncing, or something else if it's not.

Or we could flip it - 200 when it's synced, different status code when it's not. This is super important because when we're auto-scaling to handle lots of requests, we'll be spinning up more Madara nodes as full nodes. Before we start routing traffic to any new node, we need to know if it's actually ready to handle requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants