-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protocol: block safety index #70
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great proposal!
One aspect I don't understand is the advantage of tracking additional info like span batch bounds, on top of just the L2/L1 derivation mapping.
Thankfully with strict batch ordering we won't have multiple buffered channels any more, but before Holocene, would we also want to track additional derivation pipeline state, like what L1s caused buffered frames/channels in the channel bank?
where a span-batch may generate pending blocks that can be reorged out | ||
if later content of the span-batch is found invalid. | ||
|
||
This is changing with Holocene (steady batch derivation, aka strict ordering): we do not want the complexity of having to revert data that was tentatively accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you agree that the span batch recovery problem gets easier with Partial Span Batch Validity? If we only forward-invalidate, but not backward-invalidate in a span batch, the start of the span batch is less important as it won't be needed to reorg out on a next invalid batch.
we currently cannot be sure it was derived from a certain L1; we have to recreate the derivation state to verify. | ||
And if the tip was part of a span-batch, we need to find the start of said span-batch. | ||
So while we can do away with the multi-block pending-safe reorg, we still have to "find" the start of a span-batch. | ||
If we had an index of L2-block to L1-derived-from, with span-batch bounds info, | ||
then finding that start would be much faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to recreate the derivation state on a reorg or restart anyways? What's the difference to, say, a channel full of singular batches, where the reorg or node restart may have happened in the middle of deriving singular batches from the channel, so the L2 tip is in the middle of a channel? What I mean is that we always have to recreate the derivation state in a way that we get clarity on which L1 block a certain L2 block is derived from, and this will get easier with the set of changes we're introducing with Holocene.
This is not to say that the block safety index is still very useful.
This prevents these chains from entering a divergent hardfork path: | ||
very important to keep the code-base unified and tech-debt low. | ||
|
||
The main remaining question is how we bootstrap the data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For how long do we actually need to maintain this db? Is there an expiry duration that satisfies all above cases, e.g. the FP challenge window? For most mentioned cases I think a soft rollout could work good enough.
we need to roll it out gradually. | ||
|
||
This feature could either be a "soft-rollout" type of thing, | ||
or a Holocene feature (if it turns out to be tightly integrated into the steady batch dervation work). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we need to make it a Holocene feature. Holocene still works, like the pre-Holocene protocol, with a sync start protocol (that will actually get easier). But it's a helpful feature the moment it's there.
at 32 bytes per L1 and L2 block hash, some L1 metadata, some L2 metadata, and both local-safe cross-safe, | ||
each entry may about 200 bytes. It does not compress well, as it largely contains blockhash data. | ||
|
||
Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what use case would we need more than ~a week of safe db history?
Storing a week of this data, at 2 second blocks, would thus be `7 * 24 * 60 * 60 / 2 * 200 = 60,480,000`, or about 60 MB. | ||
Storing a year of this data would be around ~3 GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the average node of a singular chain, I imagine that not more than a few hours of data would ever be needed to support restarts. Maybe 12 hours to support a sequencer window lapse, but I'm not sure if safety indexing is needed when you simply don't have valid blocks over a large range.
For nodes which participate in fault proofs, I imagine only 3.5 days of data is needed, except in cases where a game starts, then up to 7 days is required. Maybe we could incorporate some holding mechanism in the case of open disputes, and be aggressive otherwise.
For nodes of an interoperating chain, they theoretically need all possible chain-safety state in order to respond to Executing Message validity. However, in these cases the nodes should use an op-supervisor anyway.
All this is to say, nodes in different environments may have different retention needs, but they should always be pretty well restrained. Unless I'm missing some use cases? What node would want a year of data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explorers / indexers might want extended data, to show past L1<>L2 relations. But yes, 1 week should otherwise be enough. Perhaps we can add an archive-flag, where it writes the data that gets pruned to a separate DB, for archival purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With clock extension games can be in progress for longer than a week (16 days I think is worst case but may have that wrong) and you typically want to monitor them for some period after they resolve so you have a chance to detect if they resolved incorrectly. Currently dispute-mon and challenger have a default window of 28 days that they monitor. So I'd suggest we want to keep data for at least that long by default.
On top of that: the finality deriver sub-system of the op-node can potentially be completely removed, | ||
if finality is checked through the op-supervisor API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still the case where you might run a monochain and not want the sequencer. In that situation, you'd still want some sort of finality deriver.
I'm also thinking about Alt-DA finality, where the finality is pointed at the L1 block where the commitment's challenge is over. For that we'd want a slightly more flexible representation of finality, and I think the safety index serves that well, where the Supervisor would not be suitable.
This is not really a blocker for the design itself, but worth noting a chore we should do: in various places in our docs, we talk about the statelessness of our components. That statelessness is much weaker already with things like the SafeDB, but this would definitely push us far enough that we'd want to amend documentation. |
Description
Proposal to enshrine the idea of a "block safety index", such that other (existing and new) features can be built against this in a unified way.
In particular, as we enter Holocene steady-batch-derivation work, and Interop devnet 2, I believe there is a need for a shared block-safety-index feature, to reduce overall protocol complexity by unifying solutions to the same problem.