Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(l1): fix snap sync + add healing #1505

Open
wants to merge 397 commits into
base: main
Choose a base branch
from
Open

feat(l1): fix snap sync + add healing #1505

wants to merge 397 commits into from

Conversation

fmoletta
Copy link
Contributor

@fmoletta fmoletta commented Dec 13, 2024

Motivation
Fix snap sync logic:
Instead of rebuilding all block's state via snap, we select a pivot block (sync head - 64) fetch its state via snap and then execute all blocks after it
Add Healing phase

Missing from this PR:

  • Reorg handling
  • Fetching receipts

Description

Closes #1455

@fmoletta fmoletta marked this pull request as ready for review December 17, 2024 20:47
@fmoletta fmoletta requested a review from a team as a code owner December 17, 2024 20:47
/// Returns the nodes or None if:
/// - There are no available peers (the node just started up or was rejected by all other nodes)
/// - The response timed out
/// - The response was empty or not valid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't read the implementation for this, but it's not clear what happens if some nodes are found and others have issues (partial success). Does it return None or the nodes that are valid? Seems like the comment should make this clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node is invalid then the whole response would be invalid, there are no partial successes.

let mut pivot_idx = if all_block_headers.len() > MIN_FULL_BLOCKS {
all_block_headers.len() - MIN_FULL_BLOCKS
} else {
all_block_headers.len() - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird to do snap sync if there is less than 64 total blocks. Can't we default to full sync here or would this increase complexity? If it's not trivial I would not do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id say this is a very rare edge case, we only perform snap sync if:
A) We are just starting up the node, in which case it is very rare that the chain we sync to would have less than 64 blocks in a non-test case
B) A re-org threw us below the latest pivot, in which case it would also be very rare that the new canonical chain has only 64 blocks or less

// If the pivot became stale, set a further pivot and try again
if stale_pivot && pivot_idx != all_block_headers.len() - 1 {
warn!("Stale pivot, switching to newer head");
pivot_idx = all_block_headers.len() - 1;
Copy link
Collaborator

@mpaulucci mpaulucci Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you update all_block_headers before choosing a new pivot? I imagine the state_pivot could happen after some time and there are a lot more headers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you choosing pivot_idx = all_block_headers.len() - 1;? It will only move forward by 63 blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not fetching any more headers as we don't know about any later block aside from the sync root, the only way to sync further would be to wait for a next fork choice update.

store.set_canonical_block(header.number, hash)?;
store.add_block_header(hash, header)?;
let mut pivot_idx = if all_block_headers.len() > MIN_FULL_BLOCKS {
all_block_headers.len() - MIN_FULL_BLOCKS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have all block headers in memory? For mainnet it would be ~21M. If it's a first implementation it's fine, we can optimize later

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe I would just do something like pivot_idx = head_block.number - MIN_FULL_BLOCKS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It made sense for small tests as we could store the full block once we had the body too, but yes, we wouldn't be able to do it with the full ethereum state

result?;
if stale_pivot {
warn!("Stale pivot, aborting sync");
return Ok(());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the error handling on geth and they don't return an error from the main sync process.
The sync process happenes on its own, there is no one we need to reply to with the status of the sync, only the owner/user of the node itself (via tracing) so it doesn't make much sense to return an error unless we want to shut down the node entirely

.zip(all_block_bodies.into_iter()),
) {
if header.number <= pivot_number {
store.set_canonical_block(header.number, hash)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only one calling set_canonical_block should be the apply forkchoice function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we already left the fork choice function by the time the sync ends.
Syncs are only triggered by forkChoiceUpdates so I think it makes sense to consider the sync root as the new head of the canonical chain.
Also the ethereum/sync hive test expects the latest block to be set after triggering the sync with a single forkChoiceUpdate

store.add_block(Block::new(header, body))?;
} else {
store.set_canonical_block(header.number, hash)?;
store.update_latest_block_number(header.number)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem, this is a job of the apply forkchoice update function, here we just store the block

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants