-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## What ❔ Add tests that run multiple Twins scenarios with random number of replicas as twins. - [x] Add `Network::Twins` type with a partition schedule - [x] Add `run_nodes_twins` to execute the test with arbitrary network partitions and potentially multiple ports per validator - [x] Buffer messages that can't be delivered and deliver them sometime later (potentially unless superseded by a newer message from source to target) - [x] Simulate the gossiping of finalized blocks - [x] Run test with no twins and no partitions - [x] Run test with no twins but random partitions - [x] Run test with twins and random partitions ### Investigation ```shell cargo nextest run -p zksync_consensus_bft twins_network --no-capture ``` Initially the test failed when partitions are introduced. I'll try to understand if this is because the leader got isolated and an important message got lost. Would like to understand if eventual delivery is an absolute requirement even if all partitions are at least as large as the quorum size. 🔍 I think the reason for the test failing is because it looks for all nodes having persisted a certain number of blocks, but if a proposal with the payload is missed due to a partition preventing the message from being delivered, then there is no mechanism in the machinery instantiated by the test to procure these later, and those nodes are stuck. 🔧 I implemented a message stashing mechanism in the mock network but it still doesn't finalise blocks 👀 🔍 The problem seems to be that my unstashing mechanism only kicked in when node A tried to send a new message to node B and wasn't blocked any more. However if A didn't try to send, then the stashed messages to B weren't delivered, which causes longer and longer timeouts as nobody is making progress for one reason or another. Meanwhile for example C can be already in a new view, so if we see that, we could conclude that A-to-B should be unstashed as well even if there are no messages from A to B in that round. 🔧 I'll try testing after merging #119 which should trigger unstashes in each round. 🔍 It's still failing after adding the replica-to-replica broadcast. For example one replica A is isolated in a round 1 and doesn't get a LeaderCommit; then in the next round replica B is isolated, and A gets all the missing messages from round 1, plus the new LeaderPrepare, but it doesn't respond with a ReplicaCommit because it doesn't have the block from round 1, and therefore cannot even store proposal 2. The consensus relies on the external gossiping mechanism to eventually propagate the FinalBlock to it; until then the node is stuck. I need to simulate the effect of gossiping in the test. 🔧 I implemented a simulated gossip using the following mechanism: if one node is about to send/unstash a message to another and they are in a gossip relationship, and the message contains a CommitQC, and the sender has the finalized block for the committed height, then it inserts the block directly into the target store. 🔍 The tests now work without twins, but fail with 1 or 2 twins, albeit not on every scenario 🔧 Changed the simulated gossip to push all ancestors of a finalized block into the target blockstore, not just the one in the CommitQC that is in the latest message. This simulates the ability of the target to fetch all missing blocks. ## Why ❔ To check that the consensus does not fail as long as the number of twins does not exceed the tolerable number of Byzantine nodes.
- Loading branch information
Showing
8 changed files
with
713 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters