-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CosmosChainProcessor - dynamic block time targeting #847
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of subtlety around the timing here. Do you think there is a way to capture any of this behavior in tests?
I also don't think "clock drift" is the correct description of what is being modified here, because we are not trying to synchronize a local wall clock with a remote one, IIUC -- rather we are trying to match an irregular timing to an unpredictable remote state. Maybe "backoff" would be slightly more accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 tests would help. I think this is a nice feature especially to help with log spam. I would entertain its own package, maybe internal
somewhere. I bet there are other consumers (penumbra?) that would appreciate it.
// succeeds, clock drift should be removed, but not as much as is added for the error | ||
// case. Clock drift should also be added when checking the latest height and it has | ||
// not yet incremented. | ||
func (p *queryCyclePersistence) addClockDriftMs(ms int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Go, I always pause if a duration is not type time.Duration
. Other scalar types are often misinterpreted. You can easily get ms from (time.Duration).Milliseconds()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using int64
for averageBlockTimeMs
and timeTrimMs
resulted in the least amount of conversions to/from time.Duration
, but I don't have a major opinion as opposed to using time.Duration
and having more conversions. Will update
One additional question, is there ever a risk of missing a block? E.g. Waiting too long? |
If we wait too long, it will see that multiple blocks need to be queried and query them in sequence. It always starts at |
It's a trim value that is the sum of both clock drift, comparing the block's consensus timestamp against local machine time, and the variable amount of time from then until the block is ready to be queried. So yes maybe |
ff458d0
to
6e36e8b
Compare
// target ideal block query window. | ||
|
||
// Time trim addition when a block query fails | ||
queryFailureTimeTrimAdditionMs = 73 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to hear how you arrived at these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used prime numbers (to avoid fixed multiples) that would allow a steady backoff in the case of errors (up to a limit), and a smaller number for successes to fine tune the window when blocks are expected to be available. This allows it to find the ideal window within ~15 blocks and then remain there. Outlier scenarios like one-off blocks that take an unexpectedly long amount of time to become available will not derail the targeting.
Was just talking to @boojamya about updating clients on channels w/ very little traffic today (i.e. make sure the client gets updated at least once per trusting period) and we had a need for estimated block time. Are we persisting this anywhere? Also this is a cool feature. |
Adds block query targeting mechanism, accounting for clock drift and variable RPC node time from the block timestamp until blocks are ready to be queried. This removes the fixed 1 second
minQueryLoopDuration
to reduce queries by holding a rolling average of the delta time between blocks. This will allow it to target block query times on chains with different consensus timeouts (block times). In addition, it compares the block timestamp against the timestamp when the queries were initiated. It holds a clock drift parameter for fine tuning the time when the next block queries will be initiated. This reduces the queries on the nodes, cleans up the logs, and optimizes so that it can capture blocks as soon as they are ready to be queried.