Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the aggregator to consistently accept past data #958

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bucko909
Copy link
Contributor

@bucko909 bucko909 commented Aug 21, 2024

The discovery of #956 was due to replaying a huge swathe of past data into our aggregator instance and finding the data full of holes. I'd assumed it was due to early-expiry of old IntervalBuffer objects, but after writing this fix I found this wasn't the case. However, the behaviour is still likely incorrect when data is played over an aggregation interval. Suppose the aggregator config is:

a (60) = sum a

And the input is:

a 0 1
...
a 1 19
<aggregation timeout triggers here at time (MAX_AGGREGATION_INTERVALS + 1) * 60 or higher>
a 1 20
...
a 1 59

Then the aggregation will aggregate seconds 0-29 and output this (value 20) to the cache, then expire the interval as it's too old, then in the next timeout, aggregate seconds 31-60, output this to the cache and expire the interval again. The final value is the aggregate of seconds 20-59, which is 40 instead of the expected 60. There is no way to control when aggregation occurs, so replayed data will be inconsistent, and in this case could be any value between 1 and 60, though if the replay is faster than real-time, the most likely will be 60.

After these changes, we allow the system to keep the newest interval prior to the current time-frame, provided it's been submitted within the current time-frame. This means if old data is submitted in-order, as would be expected from a replay, the intervals will always be aggregated in a complete state at least once before expiry. This is somewhat robust to curiosities in the timestamps of the real-time submitting process (for example, if that process happens to round time down, or be pushing a prediction of future data or similar), as the number of kept intervals is based on submission time rather than interval time.

```
== warmup ==
1 2.564000169513747e-06
1000 0.0003257179996580817
10000 0.003369195001141634
100000 0.03210452299936151
1000000 0.31846129199948336
10000000 3.215345189999425

== noop ==
1 1.984999471460469e-06
1000 0.00032863300111785065
10000 0.0032671120006853016
100000 0.03219444500064128
1000000 0.32194886400066025
10000000 3.2092841119992954

== sum ==
1 1.1226000424358062e-05
1000 0.000391326000681147
10000 0.0033218999997188803
100000 0.031986795000193524
1000000 0.32090956100000767
10000000 3.2212803769998573

== fake ==
1 4.441999408300035e-06
1000 0.0003171700009261258
10000 0.0033069499986595474
100000 0.03180618599981244
1000000 0.3161616289999074
10000000 3.176653272999829

```
@bucko909 bucko909 force-pushed the aggregator-replay-past branch from 1618009 to 21be84d Compare August 21, 2024 16:54
@bucko909
Copy link
Contributor Author

Push just reset to preferred email.

@deniszh
Copy link
Member

deniszh commented Aug 24, 2024

@bucko909 : linter says that current_interval is not in use, could you please check?
Ignore Deepsource, it should be removed

```
== warmup ==
1 3.7179997889325023e-06
1000 0.00032810999982757494
10000 0.003592125000068336
100000 0.03169123900079285
1000000 0.31713802899867005
10000000 3.157578217000264

== noop ==
1 2.0410006982274354e-06
1000 0.00042683100036811084
10000 0.003553029999238788
100000 0.031913518001601915
1000000 0.31525603199952457
10000000 3.1432045359997574

== sum ==
1 9.91100023384206e-06
1000 0.00038492900057462975
10000 0.003426478999244864
100000 0.031299194999519386
1000000 0.3173040299989225
10000000 3.1426827399991453

== fake ==
1 4.47000093117822e-06
1000 0.00035055999978794716
10000 0.0035677849991770927
100000 0.031690031000835006
1000000 0.31694292799875257
10000000 3.1394117309992
```

No appreciable difference in benchmark output. If anything, it's faster.
@bucko909 bucko909 force-pushed the aggregator-replay-past branch from 21be84d to 7ea6ce9 Compare August 29, 2024 15:17
@bucko909
Copy link
Contributor Author

Yep, I think it was just a copy/paste error. Apologies for that.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.68%. Comparing base (fdc56f6) to head (7ea6ce9).
Report is 17 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
+ Coverage   50.63%   50.68%   +0.04%     
==========================================
  Files          36       36              
  Lines        3446     3453       +7     
  Branches      535      527       -8     
==========================================
+ Hits         1745     1750       +5     
- Misses       1574     1576       +2     
  Partials      127      127              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants