CosmosChainProcessor - dynamic block time targeting #847

agouin · 2022-07-14T22:46:44Z

Adds block query targeting mechanism, accounting for clock drift and variable RPC node time from the block timestamp until blocks are ready to be queried. This removes the fixed 1 second minQueryLoopDuration to reduce queries by holding a rolling average of the delta time between blocks. This will allow it to target block query times on chains with different consensus timeouts (block times). In addition, it compares the block timestamp against the timestamp when the queries were initiated. It holds a clock drift parameter for fine tuning the time when the next block queries will be initiated. This reduces the queries on the nodes, cleans up the logs, and optimizes so that it can capture blocks as soon as they are ready to be queried.

mark-rushakoff

There is a lot of subtlety around the timing here. Do you think there is a way to capture any of this behavior in tests?

I also don't think "clock drift" is the correct description of what is being modified here, because we are not trying to synchronize a local wall clock with a remote one, IIUC -- rather we are trying to match an irregular timing to an unpredictable remote state. Maybe "backoff" would be slightly more accurate?

DavidNix

+1 tests would help. I think this is a nice feature especially to help with log spam. I would entertain its own package, maybe internal somewhere. I bet there are other consumers (penumbra?) that would appreciate it.

DavidNix · 2022-07-15T14:46:23Z

relayer/chains/cosmos/cosmos_chain_processor.go

+// succeeds, clock drift should be removed, but not as much as is added for the error
+// case. Clock drift should also be added when checking the latest height and it has
+// not yet incremented.
+func (p *queryCyclePersistence) addClockDriftMs(ms int64) {


In Go, I always pause if a duration is not type time.Duration. Other scalar types are often misinterpreted. You can easily get ms from (time.Duration).Milliseconds()

Using int64 for averageBlockTimeMs and timeTrimMs resulted in the least amount of conversions to/from time.Duration, but I don't have a major opinion as opposed to using time.Duration and having more conversions. Will update

relayer/chains/cosmos/cosmos_chain_processor.go

DavidNix · 2022-07-15T15:04:41Z

One additional question, is there ever a risk of missing a block? E.g. Waiting too long?

agouin · 2022-07-15T17:54:48Z

One additional question, is there ever a risk of missing a block? E.g. Waiting too long?

If we wait too long, it will see that multiple blocks need to be queried and query them in sequence. It always starts at persistence.latestQueriedBlock + 1, and persistence.latestQueriedBlock is only updated when a block is successfully processed.

agouin · 2022-07-15T18:02:19Z

I also don't think "clock drift" is the correct description of what is being modified here, because we are not trying to synchronize a local wall clock with a remote one, IIUC -- rather we are trying to match an irregular timing to an unpredictable remote state. Maybe "backoff" would be slightly more accurate?

It's a trim value that is the sum of both clock drift, comparing the block's consensus timestamp against local machine time, and the variable amount of time from then until the block is ready to be queried. So yes maybe backoff or timeTrim would be a better name.

jackzampolin · 2022-07-21T03:25:37Z

relayer/chains/cosmos/cosmos_chain_processor.go

+	// target ideal block query window.
+
+	// Time trim addition when a block query fails
+	queryFailureTimeTrimAdditionMs = 73


I would love to hear how you arrived at these.

I used prime numbers (to avoid fixed multiples) that would allow a steady backoff in the case of errors (up to a limit), and a smaller number for successes to fine tune the window when blocks are expected to be available. This allows it to find the ideal window within ~15 blocks and then remain there. Outlier scenarios like one-off blocks that take an unexpectedly long amount of time to become available will not derail the targeting.

jackzampolin · 2022-07-21T03:27:15Z

Was just talking to @boojamya about updating clients on channels w/ very little traffic today (i.e. make sure the client gets updated at least once per trusting period) and we had a need for estimated block time. Are we persisting this anywhere? Also this is a cool feature.

agouin changed the title ~~CosmosChainProcessor~~ CosmosChainProcessor - query timeout backoff Jul 14, 2022

agouin changed the title ~~CosmosChainProcessor - query timeout backoff~~ CosmosChainProcessor - query timeout backoff retry Jul 14, 2022

agouin marked this pull request as ready for review July 15, 2022 00:02

agouin requested review from jackzampolin, jtieri, boojamya and mark-rushakoff as code owners July 15, 2022 00:02

agouin changed the title ~~CosmosChainProcessor - query timeout backoff retry~~ CosmosChainProcessor - query timeout backoff and block time targeting Jul 15, 2022

agouin changed the title ~~CosmosChainProcessor - query timeout backoff and block time targeting~~ CosmosChainProcessor - query timeout and block time targeting Jul 15, 2022

mark-rushakoff reviewed Jul 15, 2022

View reviewed changes

DavidNix reviewed Jul 15, 2022

View reviewed changes

agouin marked this pull request as draft July 15, 2022 23:05

agouin added 7 commits July 18, 2022 11:39

retry with backoff timeout for block queries in cosmoschainprocessor

cd6c134

fine tune retry

ec44058

dynamic block query time targeting with clock drift tolerance

0d6977a

account for query time

1b5087f

remove unnecessary retry

efde1cc

fine tune clock drift

84c7610

Rename clockDrift to timeTrim. Use timer and stop for ctx.Done

6e36e8b

agouin force-pushed the andrew/cosmoschainprocessor_big_blocks branch from ff458d0 to 6e36e8b Compare July 18, 2022 17:41

agouin changed the title ~~CosmosChainProcessor - query timeout and block time targeting~~ CosmosChainProcessor - dynamic block time targeting Jul 18, 2022

jackzampolin reviewed Jul 21, 2022

View reviewed changes

agouin mentioned this pull request Nov 10, 2022

RPC error -32603 - Internal error: could not find results for height #1037

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CosmosChainProcessor - dynamic block time targeting #847

CosmosChainProcessor - dynamic block time targeting #847

agouin commented Jul 14, 2022 •

edited

Loading

mark-rushakoff left a comment

DavidNix left a comment

DavidNix Jul 15, 2022

agouin Jul 18, 2022

DavidNix commented Jul 15, 2022

agouin commented Jul 15, 2022

agouin commented Jul 15, 2022

jackzampolin Jul 21, 2022

agouin Aug 3, 2022

jackzampolin commented Jul 21, 2022

CosmosChainProcessor - dynamic block time targeting #847

Are you sure you want to change the base?

CosmosChainProcessor - dynamic block time targeting #847

Conversation

agouin commented Jul 14, 2022 • edited Loading

mark-rushakoff left a comment

Choose a reason for hiding this comment

DavidNix left a comment

Choose a reason for hiding this comment

DavidNix Jul 15, 2022

Choose a reason for hiding this comment

agouin Jul 18, 2022

Choose a reason for hiding this comment

DavidNix commented Jul 15, 2022

agouin commented Jul 15, 2022

agouin commented Jul 15, 2022

jackzampolin Jul 21, 2022

Choose a reason for hiding this comment

agouin Aug 3, 2022

Choose a reason for hiding this comment

jackzampolin commented Jul 21, 2022

agouin commented Jul 14, 2022 •

edited

Loading