Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PeerDAS subnet decoupling (aka custody groups) #6736

Open
wants to merge 6 commits into
base: unstable
Choose a base branch
from

Conversation

jimmygchen
Copy link
Member

@jimmygchen jimmygchen commented Dec 20, 2024

Issue Addressed

#6709

This spec change changes the custody unit from subnets to custody groups, in order to decouple networking and DAS core parameters - this allow us to tweak DAS core and network parameters in isolation, i.e. changing number of column subnets will not have a direct impact to sampling parameters & security.

Spec: ethereum/consensus-specs#3832

Proposed Changes:

On startup, we construct NetworkGlobals and compute sampling subnets and sampling columns from NodeId - this remains the same but the computation logic has been updated to compute these from custody groups. (node_id => custody_groups => subnets and columns)

  • Introduce "custody groups" and compute custody subnets and columns with custody group.
  • Change custody_subnet_count (csc) to custody_group_count (cgc) in MetaData and ENR. (spec)
  • Add a custody group to subnet mapping in PeerManager to avoid re-computing it when setting peer subnets based on their custody groups.
  • The sampling_size computation logic is has been updated to be base on custody groups instead of custody subnets. code (spec update)
  • Refactored sampling_size and custody_group_count logic into ChainSpec to keep them in one place.

@jimmygchen jimmygchen changed the title Implement PeerDAS subnet decoupling (aka custody groups). Implement PeerDAS subnet decoupling (aka custody groups) Dec 20, 2024
@jimmygchen jimmygchen added das Data Availability Sampling work-in-progress PR is a work-in-progress labels Dec 20, 2024
Squashed commit of the following:

commit 898d05e
Merge: ffbd25e 7e0cdde
Author: Jimmy Chen <[email protected]>
Date:   Tue Dec 24 14:41:19 2024 +1100

    Merge branch 'unstable' into refactor-ef-tests-features

commit ffbd25e
Author: Jimmy Chen <[email protected]>
Date:   Tue Dec 24 14:40:38 2024 +1100

    Fix `SszStatic` tests for PeerDAS: exclude eip7594 test vectors when testing Electra types.

commit aa593cf
Author: Jimmy Chen <[email protected]>
Date:   Fri Dec 20 12:08:54 2024 +1100

    Refactor spec testing for features and simplify usage.
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jan 3, 2025
@jimmygchen jimmygchen marked this pull request as ready for review January 3, 2025 02:10
@jimmygchen
Copy link
Member Author

@dapplion @pawanjay176 I think this PR is ready for review now. The failing spec tests are new electra tests, which Pawan and Michael are currently working on, and the DAS spec tests are all passing.

Please review and let me know what you think 🙏

let columns_per_custody_group = self
.number_of_columns
.safe_div(self.number_of_custody_groups)
.expect("Custody group count must be greater than 0");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.expect("Custody group count must be greater than 0");
.expect("number_of_custody_groups must be greater than 0");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user may run LH with a config that has number_of_custody_groups: 0, then LH will panic here. Should there be a sanity check somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is intentional - If they're running a custom testnet and overrides this value to an invalid value (0), I think panicing is fine?


let custody_column_count = columns_per_custody_group
.safe_mul(custody_group_count)
.expect("Computing sampling size should not overflow");
Copy link
Collaborator

@dapplion dapplion Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure custody_group_count is always checked when read from external sources? Add a comment here saying:

  • ENR getter checks it's less than X
  • On ReqResp Metadata messages?

EDIT: Metadata value is checked in https://github.com/sigp/lighthouse/pull/6736/files#diff-bb7fd4c05488f03a20af7fd1d00ccde4438d074889ce1fc20d8e919845837001R1433-R1434

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so this method is only used for determining the sampling count for the running node, so the assumption here is these spec values MUST be valid, otherwise we panic. We currently don't need to determine the number of samples for other node, and i dont foresee that we'd need this.

If we're paranoid about this, we could make this return Result and panic on the caller site, to prevent potential future usage of this method unintentionally causing panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw re error handling in this PR, I have intentionally:

  • handle errors when computing using a peer cgc, computing peer custody should never panic.
  • panic if we're computing custody for our own node (it means invalid spec config)

@@ -555,7 +555,7 @@ impl<T: BeaconChainTypes> DataAvailabilityCheckerInner<T> {
if custody_column_count != total_column_count {
return ReconstructColumnsDecision::No("not required for full node");
}
if received_column_count == self.spec.number_of_columns {
if received_column_count == total_column_count {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not >=?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's dangerous to allow for unexpected columns, because it means we either have a duplicate or we have a column that we don't expect - which may unintentionally satisfy the custody column count. I think we need to make sure it's an exact match, given that we're only checking the count.

The code here validates for duplicate:

if !self.data_column_exists(data_column.index()) {
self.verified_data_columns.push(data_column);
}

Gossip validation should ensure we only get the columns we subscribe to.

Do you see any valid scenario where this would be>?

@dapplion
Copy link
Collaborator

dapplion commented Jan 7, 2025

A few comments, else looks solid and good to go


if csc >= spec.custody_requirement && csc <= spec.data_column_sidecar_subnet_count {
Ok(csc)
if cgc >= spec.custody_requirement && cgc <= spec.number_of_custody_groups {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this line equal to the one in compute_peer_custody_groups? It would be easier to follow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good eyes! I've change this to the same format: 73cf91a, without the !.

I prefer to avoid negation here for readability, whereas in compute_peer_custody_groups it's a bit easier to follow if we return invalid input early - this is my personal preference though, happy to change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants