Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This started as a quick bugfix but evolved into a larger piece of work to simplify the way the alarms and autoscaling is working, as it was felt to be confusing and hard to debug in its prior implementation.
The original bug is that scale-down based on backlog was not working at all in the prior state, rather scale-down by CPU was being applied (this is why it hadn't been noticed). This broke when Llama was added because that was using ~7% CPU at rest compared to a 5% CPU threshold, and so even with no backlog it was failing to scale down.
In the new implementation, scaling is split between a logic for 0-1 instance and a logic for anything above 1 instances. This means rules do not overlap - rather scale up/down for backlog is for 0-1 only and scale up/down for CPU/GPU is for 1+ only. Furthermore where possible alarms are created that trigger actions both in alarm state and also an opposing action in non-alarm (ok) state - this helps ensure consistency as there is only one rule to edit instead of two (actually in the end this could only be done for the backlog alarm and not the others).