-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(l1): fix snap sync + add healing #1505
base: main
Are you sure you want to change the base?
Conversation
…to eth-get-receipts-msg
/// Returns the nodes or None if: | ||
/// - There are no available peers (the node just started up or was rejected by all other nodes) | ||
/// - The response timed out | ||
/// - The response was empty or not valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read the implementation for this, but it's not clear what happens if some nodes are found and others have issues (partial success). Does it return None
or the nodes that are valid? Seems like the comment should make this clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a node is invalid then the whole response would be invalid, there are no partial successes.
let mut pivot_idx = if all_block_headers.len() > MIN_FULL_BLOCKS { | ||
all_block_headers.len() - MIN_FULL_BLOCKS | ||
} else { | ||
all_block_headers.len() - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit weird to do snap sync if there is less than 64 total blocks. Can't we default to full sync here or would this increase complexity? If it's not trivial I would not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Id say this is a very rare edge case, we only perform snap sync if:
A) We are just starting up the node, in which case it is very rare that the chain we sync to would have less than 64 blocks in a non-test case
B) A re-org threw us below the latest pivot, in which case it would also be very rare that the new canonical chain has only 64 blocks or less
// If the pivot became stale, set a further pivot and try again | ||
if stale_pivot && pivot_idx != all_block_headers.len() - 1 { | ||
warn!("Stale pivot, switching to newer head"); | ||
pivot_idx = all_block_headers.len() - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you update all_block_headers
before choosing a new pivot? I imagine the state_pivot
could happen after some time and there are a lot more headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you choosing pivot_idx = all_block_headers.len() - 1;
? It will only move forward by 63 blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not fetching any more headers as we don't know about any later block aside from the sync root, the only way to sync further would be to wait for a next fork choice update.
store.set_canonical_block(header.number, hash)?; | ||
store.add_block_header(hash, header)?; | ||
let mut pivot_idx = if all_block_headers.len() > MIN_FULL_BLOCKS { | ||
all_block_headers.len() - MIN_FULL_BLOCKS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have all block headers in memory? For mainnet it would be ~21M. If it's a first implementation it's fine, we can optimize later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe I would just do something like pivot_idx = head_block.number - MIN_FULL_BLOCKS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made sense for small tests as we could store the full block once we had the body too, but yes, we wouldn't be able to do it with the full ethereum state
result?; | ||
if stale_pivot { | ||
warn!("Stale pivot, aborting sync"); | ||
return Ok(()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the error handling on geth and they don't return an error from the main sync process.
The sync process happenes on its own, there is no one we need to reply to with the status of the sync, only the owner/user of the node itself (via tracing) so it doesn't make much sense to return an error unless we want to shut down the node entirely
.zip(all_block_bodies.into_iter()), | ||
) { | ||
if header.number <= pivot_number { | ||
store.set_canonical_block(header.number, hash)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only one calling set_canonical_block
should be the apply forkchoice function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we already left the fork choice function by the time the sync ends.
Syncs are only triggered by forkChoiceUpdates so I think it makes sense to consider the sync root as the new head of the canonical chain.
Also the ethereum/sync hive test expects the latest block to be set after triggering the sync with a single forkChoiceUpdate
store.add_block(Block::new(header, body))?; | ||
} else { | ||
store.set_canonical_block(header.number, hash)?; | ||
store.update_latest_block_number(header.number)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem, this is a job of the apply forkchoice update function, here we just store the block
Motivation
Fix snap sync logic:
Instead of rebuilding all block's state via snap, we select a pivot block (sync head - 64) fetch its state via snap and then execute all blocks after it
Add Healing phase
Missing from this PR:
Description
Closes #1455