-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic QBFT Timeout #37
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,90 @@ | ||||||
| Author | Title | Category | Status | | ||||||
| -------------- | --------------------- | -------- | -------- | | ||||||
| Gal Rogozinski | QBFT timeout | Core | approved | | ||||||
|
||||||
|
||||||
**Summary** | ||||||
Changes to the constant timeouts we defined in [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md) that should improve performance. | ||||||
|
||||||
**Rational** | ||||||
Currently for all duties, we have a `quickTimeout` of 2 seconds before sending a `RoundChange` message and a `slowTimeout` of 2 minutes starting from the 8th round.The timer starts when the QBFT instance starts. This approach has 2 issues: | ||||||
|
||||||
1. Duties that require a pre-consensus stage may start their instances at different times, causing some nodes to timeout sooner than others. This will result in a greater probability of a failed first round. | ||||||
2. Different duties have different timing assumptions, which may cause some duties to execute earlier than others. | ||||||
|
||||||
|
||||||
|
||||||
**Beacon Duty Timing Assumptions** | ||||||
The Beacon chain has some timing assumptions regarding duty execution for maximizing rewards. | ||||||
|
||||||
| Duty | Timing Assumption | Dependency | Description | | ||||||
| -------------- | ----------------- | -------------- | ------------------------------------------------------------------------------------------------------------------- | | ||||||
| Proposal | < 4 Sec | --- | Attestations wait 4 seconds for a block | | ||||||
| Attester | < 4 Sec | Proposal | Attestations wait 4 seconds for a block then need to be propogated within 4 seconds for aggregators to pick them up | | ||||||
| Aggregator | < 4 Sec | Attester | Aggregators start aggregating within 8 seconds from slot start and need to propogate within 4 seconds | | ||||||
| Sync Committee | < 4 sec | Proposal | Sync Committee wait 4 seconds for a block | | ||||||
| Contribution | < 4 sec | Sync Committee | Contributions wait 8 seconds from the slot start | | ||||||
|
||||||
|
||||||
|
||||||
**Specification** | ||||||
|
||||||
From the slot start time after `baseDuration` delay wait 2 seconds before emitting a `RoundChange` message until the 6th round (inclusive). Afterwards, wait 2 minutes. | ||||||
|
||||||
```go | ||||||
var ( | ||||||
// quickTimeoutThreshold is the round after which the timeout duration increases to 2 minutes | ||||||
quickTimeoutThreshold = Round(6) | ||||||
// quickTimeout is the timeout in seconds for the first 6 rounds | ||||||
quickTimeout int64 = 2 // 2 seconds | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to move away from using go libs in SIPs Also for future SIPs I want to start using python like pseudo-code There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just my 2 cents, I would say while we still use Golang might as well do it as Golang prefers to do it (as Nikita mentioned), but the idea to use pseudo-code is the right one ofc (using Python is probably not ideal, but probably better than golang for this kind of thing, as long as we keep away from using advanced Python-specific syntax), the even more important thing though is that spec "provides interface and not implementation" - I elaborate a bit on it here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's why I said "python like pseudo code" :-)
Currently spec has too many implementation details including full fledged consensus impl. |
||||||
// slowTimeout is the timeout in seconds for rounds after the first 6 | ||||||
slowTimeout int64 = 120 // 2 minutes | ||||||
) | ||||||
|
||||||
// RoundTimeout returns the unix epoch time (seconds) in which we should send a RC message | ||||||
// Called for all beacon duties other than proposals | ||||||
func RoundTimeout(round Round, height Height, baseDuration int64, network types.BeaconNetwork) int64 { | ||||||
// Calculate additional timeout in seconds based on round | ||||||
var additionalTimeout int64 | ||||||
additionalTimeout = int64(min(round, quickTimeoutThreshold)) * quickTimeout | ||||||
if round > quickTimeoutThreshold { | ||||||
slowPortion := int64(round-quickTimeoutThreshold) * slowTimeout | ||||||
additionalTimeout += slowPortion | ||||||
} | ||||||
|
||||||
// Combine base duration and additional timeout | ||||||
timeoutDuration := baseDuration + additionalTimeout | ||||||
|
||||||
// Get the unix epoch start time of the duty seconds | ||||||
dutyStartTime := network.EstimatedTimeAtSlot(phase0.Slot(height)) | ||||||
|
||||||
// Calculate the time until the duty should start plus the timeout duration | ||||||
return dutyStartTime + timeoutDuration | ||||||
} | ||||||
``` | ||||||
|
||||||
*Proposals* | ||||||
|
||||||
No changes from [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md). Doesn't rely on the slot/duty start time. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we want to change this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess the issue is that if precon finished at a late time the first RC message should be for a later round? Same issue with Aggregator duty (but very unlikely) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds to me like we do need to do something with proposals. , a dynamic timeout will solve an issue where some nodes might receive the block at different times (different beacon nodes) and then start and end the rounds on different times. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @moshe-blox There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. proposals won't be considered in this SIP |
||||||
|
||||||
*Attestations/Sync Committees* | ||||||
|
||||||
Once the slot starts, wait 4 seconds for a block to be produced then start the timer according to the new rules. | ||||||
|
||||||
```go | ||||||
// AttestationOrSyncCommitteeTimeout returns the unix epoch time (seconds) in which we should send a RC message | ||||||
func AttestationOrSyncCommitteeTimeout(round Round, height Height, network types.BeaconNetwork) int64 { | ||||||
return RoundTimeout(round, height, 4, network) | ||||||
} | ||||||
``` | ||||||
|
||||||
*Aggregator/Contribution* | ||||||
|
||||||
Once the slot starts, wait 8 seconds for a block to be produced then start the timer according to the new rules | ||||||
|
||||||
```go | ||||||
// AggregationOrContributionTimeout returns the unix epoch time (seconds) in which we should send a RC message | ||||||
func AggregationOrContributionTimeout(r Round, height Height, network types.BeaconNetwork) int64 { | ||||||
return RoundTimeout(r, height, 8, network) | ||||||
} | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify btw why there is a need for
slowTimeout
? Is it just purely for qbft or do we have anything in SSV node that would make use of such long timeout (I think most/all duties must finish execution much faster or just will be missed)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv It was introduced in https://github.com/ssvlabs/SIPs/pull/22/files, attester duties may be submitted within 32 slots/6.4m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is simply wait for a very long time if after 8 rounds you didn't reach consensus. This is because we are probably in a very high latency environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I understand the intent, just trying to figure out how exactly it applies/fits to duties,
in particular I've also left a related comment here
yup, true
at any rate, I always thought the spec should define an interface (not an implementation) everybody wants to conform to and implement it as they see fit for their use-case/application, meaning:
qbft/timeout.go
file shouldn't import (know about) any ethereum/ssv-related stuff - currently it does import it to get the access to func(s) that tell it how long a slot lasts etc. (in fact, ideally it doesn't even need to know what a slot is - such parameters should bear names from QBFT domain and duty processor would fill these in as needed)total round count
,round 1 timeout
,round 2 timeout
, ... (seems like these values should not be constants, but rather vars - so it doesn't send the wrong message that "these values are uniform for everyone")It's not really an issue to worry about, just trying to get some consistency (and knowledge) if possible.