Implement PeerDAS subnet decoupling (aka custody groups) #6736

jimmygchen · 2024-12-20T00:16:39Z

Issue Addressed

This spec change changes the custody unit from subnets to custody groups, in order to decouple networking and DAS core parameters - this allow us to tweak DAS core and network parameters in isolation, i.e. changing number of column subnets will not have a direct impact to sampling parameters & security.

Spec: ethereum/consensus-specs#3832

Proposed Changes:

On startup, we construct NetworkGlobals and compute sampling subnets and sampling columns from NodeId - this remains the same but the computation logic has been updated to compute these from custody groups. (node_id => custody_groups => subnets and columns)

Introduce "custody groups" and compute custody subnets and columns with custody group.
Change custody_subnet_count (csc) to custody_group_count (cgc) in MetaData and ENR. (spec)
Add a custody group to subnet mapping in PeerManager to avoid re-computing it when setting peer subnets based on their custody groups.
The sampling_size computation logic is has been updated to be base on custody groups instead of custody subnets. code (spec update)
Refactored sampling_size and custody_group_count logic into ChainSpec to keep them in one place.

Squashed commit of the following: commit 898d05e Merge: ffbd25e 7e0cdde Author: Jimmy Chen <[email protected]> Date: Tue Dec 24 14:41:19 2024 +1100 Merge branch 'unstable' into refactor-ef-tests-features commit ffbd25e Author: Jimmy Chen <[email protected]> Date: Tue Dec 24 14:40:38 2024 +1100 Fix `SszStatic` tests for PeerDAS: exclude eip7594 test vectors when testing Electra types. commit aa593cf Author: Jimmy Chen <[email protected]> Date: Fri Dec 20 12:08:54 2024 +1100 Refactor spec testing for features and simplify usage.

… custody groups.

jimmygchen · 2025-01-03T02:13:26Z

@dapplion @pawanjay176 I think this PR is ready for review now. The failing spec tests are new electra tests, which Pawan and Michael are currently working on, and the DAS spec tests are all passing.

Please review and let me know what you think 🙏

dapplion · 2025-01-07T06:47:09Z

consensus/types/src/chain_spec.rs

+        let columns_per_custody_group = self
+            .number_of_columns
+            .safe_div(self.number_of_custody_groups)
+            .expect("Custody group count must be greater than 0");


Suggested change

.expect("Custody group count must be greater than 0");

.expect("number_of_custody_groups must be greater than 0");

A user may run LH with a config that has number_of_custody_groups: 0, then LH will panic here. Should there be a sanity check somewhere?

Yeah this is intentional - If they're running a custom testnet and overrides this value to an invalid value (0), I think panicing is fine?

dapplion · 2025-01-07T06:52:03Z

consensus/types/src/chain_spec.rs

+
+        let custody_column_count = columns_per_custody_group
+            .safe_mul(custody_group_count)
+            .expect("Computing sampling size should not overflow");


Are we sure custody_group_count is always checked when read from external sources? Add a comment here saying:

ENR getter checks it's less than X

On ReqResp Metadata messages?

EDIT: Metadata value is checked in https://github.com/sigp/lighthouse/pull/6736/files#diff-bb7fd4c05488f03a20af7fd1d00ccde4438d074889ce1fc20d8e919845837001R1433-R1434

Yeah, so this method is only used for determining the sampling count for the running node, so the assumption here is these spec values MUST be valid, otherwise we panic. We currently don't need to determine the number of samples for other node, and i dont foresee that we'd need this.

If we're paranoid about this, we could make this return Result and panic on the caller site, to prevent potential future usage of this method unintentionally causing panic?

btw re error handling in this PR, I have intentionally:

handle errors when computing using a peer cgc, computing peer custody should never panic.

panic if we're computing custody for our own node (it means invalid spec config)

dapplion · 2025-01-07T06:53:14Z

beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs

@@ -555,7 +555,7 @@ impl<T: BeaconChainTypes> DataAvailabilityCheckerInner<T> {
        if custody_column_count != total_column_count {
            return ReconstructColumnsDecision::No("not required for full node");
        }
-        if received_column_count == self.spec.number_of_columns {
+        if received_column_count == total_column_count {


Why not >=?

I think it's dangerous to allow for unexpected columns, because it means we either have a duplicate or we have a column that we don't expect - which may unintentionally satisfy the custody column count. I think we need to make sure it's an exact match, given that we're only checking the count.

The code here validates for duplicate:

lighthouse/beacon_node/beacon_chain/src/data_availability_checker/overflow_lru_cache.rs

Lines 176 to 178 in e094a24

if !self.data_column_exists(data_column.index()) {

self.verified_data_columns.push(data_column);

}

Gossip validation should ensure we only get the columns we subscribe to.

Do you see any valid scenario where this would be>?

dapplion · 2025-01-07T07:10:55Z

A few comments, else looks solid and good to go

dapplion · 2025-01-07T07:21:35Z

beacon_node/lighthouse_network/src/discovery/enr.rs


-        if csc >= spec.custody_requirement && csc <= spec.data_column_sidecar_subnet_count {
-            Ok(csc)
+        if cgc >= spec.custody_requirement && cgc <= spec.number_of_custody_groups {


Can we make this line equal to the one in compute_peer_custody_groups? It would be easier to follow.

Good eyes! I've change this to the same format: 73cf91a, without the !.

I prefer to avoid negation here for readability, whereas in compute_peer_custody_groups it's a bit easier to follow if we return invalid input early - this is my personal preference though, happy to change

Implement PeerDAS subnet decoupling (aka custody groups).

e094a24

jimmygchen changed the title ~~Implement PeerDAS subnet decoupling (aka custody groups).~~ Implement PeerDAS subnet decoupling (aka custody groups) Dec 20, 2024

jimmygchen added das Data Availability Sampling work-in-progress PR is a work-in-progress labels Dec 20, 2024

jimmygchen mentioned this pull request Dec 24, 2024

Refactor feature testing for spec tests #6737

Open

jimmygchen added 2 commits December 24, 2024 16:52

Merge branch 'unstable' into decouple-subnets

9822696

jimmygchen force-pushed the decouple-subnets branch from b449b85 to d1233f8 Compare January 2, 2025 05:51

Fix build.

de7293f

jimmygchen force-pushed the decouple-subnets branch from d1233f8 to de7293f Compare January 2, 2025 06:08

Add input validation and improve arithmetic handling when calculating…

25444d5

… custody groups.

jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Jan 3, 2025

jimmygchen marked this pull request as ready for review January 3, 2025 02:10

jimmygchen requested review from pawanjay176 and dapplion January 3, 2025 02:10

dapplion reviewed Jan 7, 2025

View reviewed changes

Address review comments re code style consistency.

73cf91a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PeerDAS subnet decoupling (aka custody groups) #6736

Implement PeerDAS subnet decoupling (aka custody groups) #6736

jimmygchen commented Dec 20, 2024 •

edited

Loading

jimmygchen commented Jan 3, 2025

dapplion Jan 7, 2025

dapplion Jan 7, 2025

jimmygchen Jan 7, 2025

dapplion Jan 7, 2025 •

edited

Loading

jimmygchen Jan 7, 2025

jimmygchen Jan 7, 2025

dapplion Jan 7, 2025

jimmygchen Jan 7, 2025

dapplion commented Jan 7, 2025

dapplion Jan 7, 2025

jimmygchen Jan 7, 2025

	.expect("Custody group count must be greater than 0");
	.expect("number_of_custody_groups must be greater than 0");

	if !self.data_column_exists(data_column.index()) {
	self.verified_data_columns.push(data_column);
	}

Implement PeerDAS subnet decoupling (aka custody groups) #6736

Are you sure you want to change the base?

Implement PeerDAS subnet decoupling (aka custody groups) #6736

Conversation

jimmygchen commented Dec 20, 2024 • edited Loading

Issue Addressed

Proposed Changes:

jimmygchen commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dapplion Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dapplion commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimmygchen commented Dec 20, 2024 •

edited

Loading

dapplion Jan 7, 2025 •

edited

Loading