Tenure extends for increasing tenure budgets #5476

kantai · 2024-11-07T20:09:10Z

kantai
Nov 7, 2024
Maintainer

The goal of the performance improvements (#5430, #5431, #5432) is to make the stacks-node more performant right now, and in particular, free up CPU time in the stacks-node such that if it is spending more time performing block processing, the nodes will continue to be able to stay in sync and responsive on their network interfaces (this is important for stackerdb messages to propagate, signers and miners to stay in sync, etc).

Once these improvements are in place, the tenure budget can be safely increased. This can (and should) be done without consensus changes. This can be done by simply issuing a tenure extend from the miner and the signer set approving it.

The basic idea is to have the miner thread time the length of its tenure, along with a configuration setting that tells it when it should try to perform a tenure extend. The signer set will similarly hold timers (measured from when they last signed off on a block proposal which spent some amount of block budget: the timer isn’t reset when they sign a block with just transfers, but it is reset if they process a block with contract-calls, e.g.), and when the timer expires, they would allow a tenure extend.

This would still allow the “spikiness” in budget consumption that we have today (the spikiness issue is somewhat orthogonal, and would be treated by #5433), but the budget would itself be higher, and the timing of extends would enforce some metering of the spikes (so that contract call budgets would be reset at, e.g., every 5 minutes rather than every bitcoin block, or whatever the timeout is set to). During initial rollout, this timeout will need to be set conservatively, but could be made more aggressive through configuration changes in miners and signers.

hstove · 2024-11-08T01:15:59Z

hstove
Nov 8, 2024
Maintainer

The signer set will similarly hold timers (measured from when they last signed off on a block proposal which spent some amount of block budget: the timer isn’t reset when they sign a block with just transfers, but it is reset if they process a block with contract-calls, e.g.), and when the timer expires, they would allow a tenure extend.

Can you expand on this - specifically, why would the timer only reset when a block with contract calls is processed? Wouldn't that encourage spikiness? My assumption would be that this timer resets when a block includes a TenureExtend.

0 replies

kantai · 2024-11-08T17:42:44Z

kantai
Nov 8, 2024
Maintainer Author

Can you expand on this - specifically, why would the timer only reset when a block with contract calls is processed? Wouldn't that encourage spikiness? My assumption would be that this timer resets when a block includes a TenureExtend.

Yes, I think you’re right, the timer should start when a tenure begins (or the extension begins: basically the timer should start whenever there's a tenure change payload).

However, the signers must also do some metering according to the block evaluation time. Right now, its the case that the tenure budget is expended often with just a few seconds of evaluation, but the cost tracker is an imperfect (and pessimistic) estimator of runtime. Things like cache locality in the MARF definitely impact block evaluation time, and so the signers should take that into account. This is simple enough for them to do naively by just tracking the wall clock time of processing the block proposals.

I think the way to do this is to "bump" the budget timer by the amount of time they spend processing proposals during the tenure: so if a proposal takes them 1 minute to evaluate, they bump the budget timer by 1 minute (so if they would have allowed the budget to be reset at time t, they instead will allow the reset at time t + 60s.

0 replies

jferrant · 2024-11-08T21:39:18Z

jferrant
Nov 8, 2024
Maintainer

Could you have signers track based on the last time they saw a tenure change payload rather than a contract call? I ask because the tenure change payload is always guaranteed to be the very first transaction in the block so might be easier to track that rather than the last block with a contract call.

0 replies

obycode · 2024-11-11T19:09:43Z

obycode
Nov 11, 2024
Maintainer

I had not thought about it this way, but I like the idea of factoring in the actual block processing time. To put it another way, we can think of it as the signer saying, "once I have seen X minutes of downtime, I will allow a tenure extend." So when the tenure starts or extends, I start a countdown at X minutes. When I get a block proposal, I pause the countdown, process the block, send my signature, and resume the countdown. When my countdown reaches 0, I will allow a tenure extension in the next block I process.

One simple way to synchronize between miners and signers could be for the signer to include a flag in its block signature message indicating to the miner that it is ready for a tenure extend. When the miner sees that enough signers have set this flag, it can go ahead and issue one in its next block.

0 replies

hstove · 2024-11-12T21:17:19Z

hstove
Nov 12, 2024
Maintainer

Assuming we do some measurement of wall-clock time on block processing, I just want to note that we should have the node track this and return it to the signer in the HTTP response to the block proposal. If the signer tries to track this, we can end up with too much variance from other reasons for latency.

0 replies

aldur · 2024-11-14T18:04:48Z

aldur
Nov 14, 2024
Collaborator

@obycode, to enlist the help of @hstove and @jferrant can you split this into smaller issues that they can handle in parallel?

0 replies

obycode · 2024-11-15T17:44:18Z

obycode
Nov 15, 2024
Maintainer

EDIT (by @aldur): See below for an updated design, this is left for historical references.

Here are my initial thoughts for the design:

Overview

The task here is to allow a miner to extend its tenure based on time since the last tenure extension. The signers decide when a miner is allowed to extend, so we need some mechanism to communicate this between the miner and the signers. I propose adding a field, extend_countdown into the BlockResponse message that a signer sends to a miner as a result of a block proposal. The value of extend_countdown is the number of seconds of idle time that must pass before the signer will allow a tenure extension. The miner can track these countdowns from all signers and decide when to extend its tenure based on when it thinks it can get 70% of the signers to approve it.

Signer Details

The signer configuration will specify a tenure extend time period. The first version of this to go live on mainnet should start off with this value defaulting to 10 minutes, to ensure minimal impact on the network. As we validate that these tenure extends do not cause any problems, we can spread the word to signers to iteratively lower this number.

When a new burn block arrives, record the current time, idle_start, and initialize an idle_countdown counter to the configured duration. When a block proposal arrives, compute the time passed since the idle_start and subtract it from idle_countdown, then begin evaluation of the proposal. Append the idle_countdown into the BlockResponse for this proposal. Once the response is sent, record the current time again to idle_start. Repeat.

If a block proposal arrives that contains a TenureExtend transaction with cause IdleTimeExtension, check that the current idle_countdown is less than or equal to 0 (letting this value go negative is useful feedback to the miner). If so, process the block as usual, else, reject the block. This rejection would have a new reason code.

Miner Details

The miner needs to now keep track of the signers’ current idle time countdowns and decide when it can refresh its budget with an IdleTimeExtension. The sign coordinator can keep track of the signer countdowns as it receives BlockResponses and report back to the miner. Since the sign coordinator returns as soon as 70% approve the current block, we may need to do something different to handle tracking the countdowns from responses that come in after this threshold is reached. After each round of signing, the miner should record its estimated time to extend. It can compute this by ordering the countdown responses in ascending order, and selecting a time at which > 70% will have reached 0, then adding that to the current time and saving the value. This calculation is needed in the case where the miner is not able to mine any blocks (either because there is no budget or there are no transactions in the mempool), so it will not get any new countdown values from the signers.

Testing

Designing good integration tests for this new behavior is important. We will need to test several different scenarios:

Extend the tenure after all signers report <= 0 countdowns (active mining)
Extend the tenure after estimated countdown has expired (idle miner)
Extend the tenure when < 70% of signers have reached their idle time

0 replies

obycode · 2024-11-15T21:45:14Z

obycode
Nov 15, 2024
Maintainer

Updated design after discussion with @jferrant and @hstove:

Overview

The task here is to allow a miner to extend its tenure based on time since the last tenure extension. The signers decide when a miner is allowed to extend, so we need some mechanism to communicate this between the miner and the signers. I propose adding a field, extend_timestamp into the BlockResponse message that a signer sends to a miner as a result of a block proposal. The value of extend_timestamp is the wall clock time after which the signer will allow a tenure extension. The miner can track these times from all signers and decide when to extend its tenure based on when it thinks it can get 70% of the signers to approve it.

Signer Details

The signer configuration will specify a tenure extend time period. The first version of this to go live on mainnet should start off with this value defaulting to something like 5 minutes. As we validate that these tenure extends do not cause any problems, we can spread the word to signers to iteratively lower this number.

When a new burn block arrives, record the current time, idle_start, and initialize an idle_countdown counter to the configured duration. When a block proposal arrives, record the time, process_start. The block validation endpoint will validate the block and return the cost of that block. If the block has a non-zero cost, subtract (process_start - idle_start) from the idle_countdown. If it has a 0 cost, then subtract (now - idle_start) from the idle_countdown. This difference in how the idle time is computed is important to encourage miners to continue mining blocks with STX transfers after their budget is spent but before enough idle time has passed for a tenure extend.

In the BlockResponse for this proposal, include a timestamp which is current time plus idle_countdown. Once the response is sent, record the current time again to idle_start. Repeat with each block proposal.

We keep track of "idle" time instead of just flat wall time because it allows the signers to factor in how long it actually takes to process the blocks. This will flatten out the total processing time in scenarios where the cost budgeting is overly pessimistic, causing us to see some blocks that can spend the entire budget and be processed in 3 seconds, while others that spend the entire budget take 3 minutes to process.

If a block proposal arrives that contains a TenureExtend transaction and the tenure_consensus_hash is equal to the burn_view_consensus_hash, check that the current idle_countdown is less than or equal to 0 (letting this value go negative is useful feedback to the miner). If so, process the block as usual, else, reject the block. This rejection would have a new reason code.

Miner Details

The miner needs to now keep track of the signers’ current idle timestamps and decide when it can refresh its budget with a tenure extension. A new component will process the StackerDB messages as they arrive, rather than directly in the sign coordinator. This is important because the sign coordinator stops listening for block responses from signers as soon as it hits the 70% threshold, but it is important for the miner to track the idle timestamps of all signers that report it. This component will be responsible for keeping track of the signers' latest idle timestamps, queryable from the miner. It will also provide the sign coordinator with block signatures. After each round of signing, the miner should record its estimated time to extend. It can compute this by ordering the countdown responses in ascending order, and selecting a time at which > 70% of the signing power will have passed their timestamp. Before each attempt to mine a block, check if this timestamp has passed and if so, issue the tenure extension.

Testing

Designing good integration tests for this new behavior is important. We will need to test several different scenarios:

Extend the tenure after all signers report idle timestamps which have passed (active mining)
Extend the tenure after estimated timestamp has passed (idle miner)
Extend the tenure when < 70% of signers have reached their idle time
Validate that STX transfer blocks do not count against the signer's downtime calculation

7 replies

jcnelson Nov 18, 2024
Maintainer

We keep track of "idle" time instead of just flat wall time because it allows the signers to factor in how long it actually takes to process the blocks. This will flatten out the total processing time in scenarios where the cost budgeting is overly pessimistic, causing us to see some blocks that can spend the entire budget and be processed in 3 seconds, while others that spend the entire budget take 3 minutes to process.

If I'm reading this correctly, the goal of these timestamps is for the signer to report how much time it spends idled, right? The invariant you're going for is that a TenureExtend will only be issued if at least 70% of the signers have spent X seconds idled?

If so, then I'm trying to understand the focus on idle time over wall clock time. All signers see wall clock time advance at about the same pace (within a few milliseconds via NTP), but it's possible that signers see their local idle times increase at wildly different paces. For example, if a signer's node is particularly slow, it will report a slowly-advancing idle time relative to a signer with a fast node.

What this would mean is that the slower signers set the rate at which TenureExtends can be submitted. This, in turn, acts as a means of throttling the chain so that slow nodes have a hope of keeping up. Is that the goal?

If so, then it might be easier and just as effective to use only the wall clock time. The tenure budget limits are already computed such that the expected time to process a maximally full tenure is on the order of 30s - 1m. The per-signer idle time is useful for signers to determine when it's safe to increase or necessary to decrease the time between TenureExtends, but that's downstream of deciding the wall-clock time between TenureExtends. Also, signer idle time is only on factor in deciding when to issue a TenureExtend. For example, signers may want to issue a burst of TenureExtends at a (normally-unsustainable) pace in order to clear a congested mempool, and would throttle back the issuance rate after it's cleared in order to meet a target resource consumption rate.

What do you all think?

jcnelson Nov 18, 2024
Maintainer

Ah, you replied to me while I was writing the above.

hstove Nov 18, 2024
Maintainer

@jcnelson I definitely had/have the same thoughts as you regarding using wall clock time. Even regardless of other arguments, it has a nice simplicity to it.

One scenario we want to avoid: let's say, in an extreme, signers will accept a tenure extend after 2 minutes of wall clock time. Hypothetically, there is zero latency, and miners always have a new block ready to propose. In this scenario, it could become the case that all 2 minutes of this time were spent on block processing.

Continuing this scenario, if essentially 100% of this time is spent on block processing, nodes could never catch up to the chain tip. They would spend 2 minutes appending new blocks that themselves were from a 2-minute window. We want to avoid this scenario.

This was the argument that convinced me of the need to track computation time. I'm curious of your thoughts.

This "thought experiment" also means that signers should really only be caring about the "computation time" as it relates to a node catching up to the chain tip. Other work should be disregarded. I think Brice's initial implementation does this reasonably well.

jcnelson Nov 19, 2024
Maintainer

Continuing this scenario, if essentially 100% of this time is spent on block processing, nodes could never catch up to the chain tip.

This is definitely top-of-mind for me as well, in all conversations about TenureExtend.

kantai Nov 19, 2024
Maintainer Author

If so, then I'm trying to understand the focus on idle time over wall clock time. All signers see wall clock time advance at about the same pace (within a few milliseconds via NTP)

To me, the importance of idle time is that the nodes need this idle time in order to do things other than processing blocks, and that's ultimately what is important to the signers about limiting the rate of processing. If it was just "how fast can I process blocks?", they don't really need any logic at all about accepting tenure extends, because they could never validate the proposals faster than they can process blocks.

it's possible that signers see their local idle times increase at wildly different paces. For example, if a signer's node is particularly slow, it will report a slowly-advancing idle time relative to a signer with a fast node.

Yes, and I think that's a feature here: if a signer is slower to process block responses, they should be slower to accept tenure extends.

obycode · 2024-11-19T02:05:42Z

obycode
Nov 19, 2024
Maintainer

The tenure budget limits are already computed such that the expected time to process a maximally full tenure is on the order of 30s - 1m.

The goal is that the budget makes the block take roughly 30s to process, but in practice, some full blocks can process in 2 seconds while others take 2 minutes to process. By measuring the idle time and allowing the tenure extends based on that, we counteract this discrepancy in actual processing time.

4 replies

jcnelson Nov 19, 2024
Maintainer

I see.

In light of @hstove's comment, do you think it would make more sense to measure compute resource usage per wall clock time? Asking because it sounds like the goal here isn't for signers to meet a target idle time; it's to get miners to use resources at an average fixed rate. The key word here is "average."

The reason I'm hung up on using idle time is because signers and miners will perceive variable processing times for a block, due to a whole host of inscrutable reasons that are out of our control. That makes it hard to arrive at a global decision on when a TenureExtend should be issued.

By contrast, the average rate at which each computational resource gets consumed is a global property -- each observer can simply sum up each block's execution costs's dimensions for all blocks produced in the last N seconds, and divide each dimension by N to get the average consumption rate. If the target consumption rate is known to all parties, then miners simply wait to issue a block with a TenureExtend once each dimension's average rate is at or below the global target. There wouldn't need to be a new subsystem for tracking it; signers could simply inform miners what it is via their BlockAccept or BlockReject messages.

What do you think?

obycode Nov 19, 2024
Maintainer

I like the idea, but there is still a lot of variability in the read/write access times, so the cost value that we report doesn't really tell us much about actual processing time. I'm a bit concerned about counting a block with 15,000 cache misses the same as another block with 15,000 hits. Tracking idle time solves that for us.

It could be that the technique you're suggesting is good enough and the fact that it makes the computation more deterministic could be worth the loss of accuracy.

hstove Nov 19, 2024
Maintainer

One problem I see with the approach of tracking "consumption rate per cost metric" is that it's very possible (and likely) for read count to get filled up, and then no other metrics can get consumed. We wouldn't want to "artificially" delay tenure extends in that case. So, if we went with that approach, I think we'd want to do something more like "max by (cost_dimension) (consumption_rate)". I'm still not sure that's any better than the idle time approach.

jcnelson Nov 19, 2024
Maintainer

Yeah -- signers can issue TenureExtends as soon as any resource runs out, but they'd also need to refuse to sign blocks that consume any particular dimension too quickly.

jcnelson · 2024-11-19T04:20:35Z

jcnelson
Nov 19, 2024
Maintainer

Those are fair points. Definitely agree with the idea of treating cache hits differently from cache misses when it can be done. And, that when-condition is where the gaps in my understanding lie. Not all nodes' disk caches are created equal. Signer A could get 100% cache hits while signer B gets 0% on the same block, and the observed cache hit/miss ratio is determined by a vast set of local configuration parameters we can't control. A signer's locally-measured idle time may not reflect what a bootstrapping node encounters when they replay the block, or even reflect what the signer itself sees in the event it replays the block at a later date. Fundamentally, the idle time approach assumes that signers' 30th percentile idle time measurements are a good enough predictor of how much validation wall-clock time other nodes will encounter on a block. This may well be the case, and I'm trying to achieve an understanding of how well-founded this assumption is. In particular, I'm trying to understand how much error there can be in the measurement (and what the bounds on that error will be). The advantage of using a globally-observed resource consumption rate in this case is that nodes at least learn the worst-case processing time. @kantai and I have spoken about this problem in the past regarding how to discount reads that are cache hits. The conclusion we reached was that the only way to do so safely would be to make the caching strategy part of consensus, so nodes are guaranteed to avoid thrashing for want of bigger disk cache.

…

On Mon, Nov 18, 2024, 10:15 PM Brice ***@***.***> wrote: I like the idea, but there is still a *lot* of variability in the read/write access times, so the cost value that we report doesn't really tell us much about actual processing time. I'm a bit concerned about counting a block with 15,000 cache misses the same as another block with 15,000 hits. Tracking idle time solves that for us. It could be that the technique you're suggesting is good enough and the fact that it makes the computation more deterministic could be worth the loss of accuracy. — Reply to this email directly, view it on GitHub <#5476 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQJK5WMILUWD7A3RIIW7D2BKUNBAVCNFSM6AAAAABSAWVR2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRZHE3TKNQ> . You are receiving this because you were mentioned.Message ID: ***@***.*** .com>

2 replies

kantai Nov 19, 2024
Maintainer Author

A signer's locally-measured idle time may not
reflect what a bootstrapping node encounters when they replay the block, or
even reflect what the signer itself sees in the event it replays the block
at a later date.

Right -- my proposed strategy of focusing on idle time is not actually concerned with node bootstrapping. It's focused instead on keeping the nodes at the chain tip healthy. Idle time is necessary for that, because time spent in block processing gets in the way of all the other stuff nodes must do to keep the network healthy (respond to RPC requests, manage the p2p network).

The question of node bootstrapping is somewhat orthogonal to this. That really is just a question of how fast blocks can be processed by a node, and understanding the limits there are much easier -- is a 10x longer genesis sync growth rate acceptable? In which case, the limit on the block production would be at most 10 tenure extends per tenure, and its just a simple check. Similarly, to be on the "safe" side, this could be set to "2" initially and scaled up in much the same way as the idle time requirement is scaled down. My last point on bootstrapping is that there are many techniques which could improve bootstrapping times that don't apply to chain tip processing-- the simplest is that bootstrapping nodes absolutely can use partially trusted snapshots to support parallel block processing during initial sync.

jcnelson Nov 19, 2024
Maintainer

Idle time is necessary for that, because time spent in block processing gets in the way of all the other stuff nodes must do to keep the network healthy (respond to RPC requests, manage the p2p network).

I think this would be addressed by #5431 or something similar, right? Block validation and block mining do not strictly need to block other threads. Should we be designing the throttling logic under the assumption that #5431 will be addressed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tenure extends for increasing tenure budgets #5476

{{title}}

Replies: 10 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tenure extends for increasing tenure budgets #5476

kantai Nov 7, 2024 Maintainer

Replies: 10 comments · 13 replies

hstove Nov 8, 2024 Maintainer

kantai Nov 8, 2024 Maintainer Author

jferrant Nov 8, 2024 Maintainer

obycode Nov 11, 2024 Maintainer

hstove Nov 12, 2024 Maintainer

aldur Nov 14, 2024 Collaborator

obycode Nov 15, 2024 Maintainer

Overview

Signer Details

Miner Details

Testing

obycode Nov 15, 2024 Maintainer

Overview

Signer Details

Miner Details

Testing

jcnelson Nov 18, 2024 Maintainer

jcnelson Nov 18, 2024 Maintainer

hstove Nov 18, 2024 Maintainer

jcnelson Nov 19, 2024 Maintainer

kantai Nov 19, 2024 Maintainer Author

obycode Nov 19, 2024 Maintainer

jcnelson Nov 19, 2024 Maintainer

obycode Nov 19, 2024 Maintainer

hstove Nov 19, 2024 Maintainer

jcnelson Nov 19, 2024 Maintainer

jcnelson Nov 19, 2024 Maintainer

kantai Nov 19, 2024 Maintainer Author

jcnelson Nov 19, 2024 Maintainer

kantai
Nov 7, 2024
Maintainer

Replies: 10 comments 13 replies

hstove
Nov 8, 2024
Maintainer

kantai
Nov 8, 2024
Maintainer Author

jferrant
Nov 8, 2024
Maintainer

obycode
Nov 11, 2024
Maintainer

hstove
Nov 12, 2024
Maintainer

aldur
Nov 14, 2024
Collaborator

obycode
Nov 15, 2024
Maintainer

obycode
Nov 15, 2024
Maintainer

jcnelson Nov 18, 2024
Maintainer

jcnelson Nov 18, 2024
Maintainer

hstove Nov 18, 2024
Maintainer

jcnelson Nov 19, 2024
Maintainer

kantai Nov 19, 2024
Maintainer Author

obycode
Nov 19, 2024
Maintainer

jcnelson Nov 19, 2024
Maintainer

obycode Nov 19, 2024
Maintainer

hstove Nov 19, 2024
Maintainer

jcnelson Nov 19, 2024
Maintainer

jcnelson
Nov 19, 2024
Maintainer

kantai Nov 19, 2024
Maintainer Author

jcnelson Nov 19, 2024
Maintainer