-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BFT-465: Twins tests #117
Merged
Merged
BFT-465: Twins tests #117
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 28, 2024 08:24
ec75578
to
ae67132
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 28, 2024 13:13
93202ea
to
43fb874
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 28, 2024 20:02
5e1db08
to
6360c12
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 29, 2024 10:31
58d42d9
to
9c06ad4
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 30, 2024 12:25
d926088
to
76cfc9c
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 30, 2024 13:40
76cfc9c
to
78b81f2
Compare
aakoshh
force-pushed
the
bft-465-twins-test
branch
from
May 30, 2024 18:10
a818fb8
to
ea10b1d
Compare
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
pompon0
reviewed
Jun 3, 2024
brunoffranca
approved these changes
Jun 7, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work! As a side-note, really appreciate the quality of the documentation in the code and PR.
pompon0
approved these changes
Jun 10, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What ❔
Add tests that run multiple Twins scenarios with random number of replicas as twins.
Network::Twins
type with a partition schedulerun_nodes_twins
to execute the test with arbitrary network partitions and potentially multiple ports per validatorInvestigation
Initially the test failed when partitions are introduced. I'll try to understand if this is because the leader got isolated and an important message got lost. Would like to understand if eventual delivery is an absolute requirement even if all partitions are at least as large as the quorum size.
🔍 I think the reason for the test failing is because it looks for all nodes having persisted a certain number of blocks, but if a proposal with the payload is missed due to a partition preventing the message from being delivered, then there is no mechanism in the machinery instantiated by the test to procure these later, and those nodes are stuck.
🔧 I implemented a message stashing mechanism in the mock network but it still doesn't finalise blocks 👀
🔍 The problem seems to be that my unstashing mechanism only kicked in when node A tried to send a new message to node B and wasn't blocked any more. However if A didn't try to send, then the stashed messages to B weren't delivered, which causes longer and longer timeouts as nobody is making progress for one reason or another. Meanwhile for example C can be already in a new view, so if we see that, we could conclude that A-to-B should be unstashed as well even if there are no messages from A to B in that round.
🔧 I'll try testing after merging #119 which should trigger unstashes in each round.
🔍 It's still failing after adding the replica-to-replica broadcast. For example one replica A is isolated in a round 1 and doesn't get a LeaderCommit; then in the next round replica B is isolated, and A gets all the missing messages from round 1, plus the new LeaderPrepare, but it doesn't respond with a ReplicaCommit because it doesn't have the block from round 1, and therefore cannot even store proposal 2. The consensus relies on the external gossiping mechanism to eventually propagate the FinalBlock to it; until then the node is stuck. I need to simulate the effect of gossiping in the test.
🔧 I implemented a simulated gossip using the following mechanism: if one node is about to send/unstash a message to another and they are in a gossip relationship, and the message contains a CommitQC, and the sender has the finalized block for the committed height, then it inserts the block directly into the target store.
🔍 The tests now work without twins, but fail with 1 or 2 twins, albeit not on every scenario
🔧 Changed the simulated gossip to push all ancestors of a finalized block into the target blockstore, not just the one in the CommitQC that is in the latest message. This simulates the ability of the target to fetch all missing blocks.
Why ❔
To check that the consensus does not fail as long as the number of twins does not exceed the tolerable number of Byzantine nodes.