-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement PeerDAS subnet decoupling (aka custody groups) #6736
base: unstable
Are you sure you want to change the base?
Conversation
Squashed commit of the following: commit 898d05e Merge: ffbd25e 7e0cdde Author: Jimmy Chen <[email protected]> Date: Tue Dec 24 14:41:19 2024 +1100 Merge branch 'unstable' into refactor-ef-tests-features commit ffbd25e Author: Jimmy Chen <[email protected]> Date: Tue Dec 24 14:40:38 2024 +1100 Fix `SszStatic` tests for PeerDAS: exclude eip7594 test vectors when testing Electra types. commit aa593cf Author: Jimmy Chen <[email protected]> Date: Fri Dec 20 12:08:54 2024 +1100 Refactor spec testing for features and simplify usage.
b449b85
to
d1233f8
Compare
d1233f8
to
de7293f
Compare
@dapplion @pawanjay176 I think this PR is ready for review now. The failing spec tests are new electra tests, which Pawan and Michael are currently working on, and the DAS spec tests are all passing. Please review and let me know what you think 🙏 |
let columns_per_custody_group = self | ||
.number_of_columns | ||
.safe_div(self.number_of_custody_groups) | ||
.expect("Custody group count must be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.expect("Custody group count must be greater than 0"); | |
.expect("number_of_custody_groups must be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user may run LH with a config that has number_of_custody_groups: 0
, then LH will panic here. Should there be a sanity check somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is intentional - If they're running a custom testnet and overrides this value to an invalid value (0
), I think panicing is fine?
|
||
let custody_column_count = columns_per_custody_group | ||
.safe_mul(custody_group_count) | ||
.expect("Computing sampling size should not overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure custody_group_count
is always checked when read from external sources? Add a comment here saying:
- ENR getter checks it's less than X
- On ReqResp Metadata messages?
EDIT: Metadata value is checked in https://github.com/sigp/lighthouse/pull/6736/files#diff-bb7fd4c05488f03a20af7fd1d00ccde4438d074889ce1fc20d8e919845837001R1433-R1434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so this method is only used for determining the sampling count for the running node, so the assumption here is these spec values MUST be valid, otherwise we panic. We currently don't need to determine the number of samples for other node, and i dont foresee that we'd need this.
If we're paranoid about this, we could make this return Result
and panic on the caller site, to prevent potential future usage of this method unintentionally causing panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw re error handling in this PR, I have intentionally:
- handle errors when computing using a peer
cgc
, computing peer custody should never panic. - panic if we're computing custody for our own node (it means invalid
spec
config)
@@ -555,7 +555,7 @@ impl<T: BeaconChainTypes> DataAvailabilityCheckerInner<T> { | |||
if custody_column_count != total_column_count { | |||
return ReconstructColumnsDecision::No("not required for full node"); | |||
} | |||
if received_column_count == self.spec.number_of_columns { | |||
if received_column_count == total_column_count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not >=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's dangerous to allow for unexpected columns, because it means we either have a duplicate or we have a column that we don't expect - which may unintentionally satisfy the custody column count. I think we need to make sure it's an exact match, given that we're only checking the count
.
The code here validates for duplicate:
lighthouse/beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs
Lines 176 to 178 in e094a24
if !self.data_column_exists(data_column.index()) { | |
self.verified_data_columns.push(data_column); | |
} |
Gossip validation should ensure we only get the columns we subscribe to.
Do you see any valid scenario where this would be>
?
A few comments, else looks solid and good to go |
|
||
if csc >= spec.custody_requirement && csc <= spec.data_column_sidecar_subnet_count { | ||
Ok(csc) | ||
if cgc >= spec.custody_requirement && cgc <= spec.number_of_custody_groups { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this line equal to the one in compute_peer_custody_groups
? It would be easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eyes! I've change this to the same format: 73cf91a, without the !
.
I prefer to avoid negation here for readability, whereas in compute_peer_custody_groups
it's a bit easier to follow if we return invalid input early - this is my personal preference though, happy to change
Issue Addressed
#6709
This spec change changes the custody unit from subnets to custody groups, in order to decouple networking and DAS core parameters - this allow us to tweak DAS core and network parameters in isolation, i.e. changing number of column subnets will not have a direct impact to sampling parameters & security.
Spec: ethereum/consensus-specs#3832
Proposed Changes:
On startup, we construct
NetworkGlobals
and compute sampling subnets and sampling columns fromNodeId
- this remains the same but the computation logic has been updated to compute these from custody groups. (node_id
=>custody_groups
=>subnets
andcolumns
)custody_subnet_count
(csc
) tocustody_group_count
(cgc
) inMetaData
andENR
. (spec)sampling_size
computation logic is has been updated to be base on custody groups instead of custody subnets. code (spec update)sampling_size
andcustody_group_count
logic intoChainSpec
to keep them in one place.