ConcurrentLfu time-based expiry #516

bitfaster · 2023-11-25T07:11:30Z

Support all time-based expiry modes for ConcurrentLfu.

Extend node types to support the different access order lists as described here
- AccessOrderNode: Existing scheme is AccessOrder, but buffer array ops will be slower if node type is not sealed (due to extra type checking), therefore derive new sealed type (this will be a generic arg to ConcurrentLfu later)
- AccessOrderExpiringNode: support expire after access, just needs a time stamp
- TimeOrderNode: support both expire after write and custom modes - requires time order list + time stamp
- Think of some better names for these
Port TimerWheel from Caffeine
Define node policy interface required to expire items for ExpireAfterWrite, ExpireAfterAccess and ExpireAfter
Plug all the policy hook points into ConcurrentLfuCore
Implement an ExpireAfter policy that uses TimerWheel and IExpiryCalculator
Implement ConcurrentTLfu based on the ExpireAfter policy

BitFaster.Caching/Lfu/TimerWheel.cs

coveralls · 2023-11-25T07:35:41Z

coverage: 99.304% (+0.06%) from 99.242%
when pulling 3e24514 on users/alexpeck/lfuexpire
into 434a7be on main.

ben-manes · 2023-11-25T10:31:42Z

fwiw, I implemented expireAfterAccess and expireAfterWrite prior to having the timing wheel. Otherwise I’d likely not have bothered and made them convenience builder methods that set the expireAfter policy. Instead I didn’t redo working code and left both approaches. Just fyi so you don’t think it was a performance need or something, I simply hadn’t known how to do the variable policy originally in a way that fit my design goals.

bitfaster · 2023-11-25T21:50:23Z

Thanks for the insight, it makes total sense. I'm figuring out how to fit this together - from reading your code in Caffeine you generate a Node class with the required backing fields for each policy to minimize storage overhead.

For expireAfterAccess, since each of window, probation and protected are already in access order it seems like it will be easy to walk the existing lists from the LRU position until a non-expired item is found (I think you had suggested this last year so I had it in my mind this part is much easier to implement). This requires a node type with 1 additional field for the expiry time (so the overhead is prev+next+time).

For expireAfterWrite and expireAfter another linked list is required for time order, so I planned to define a node type with an extra prev/next ptr (so the overhead is prev+next+time+timeprev+timenext) - last night I figured out these can both use the same node class but I hadn't made the mental leap that both could use the timing wheel which would indeed be simpler.

It seems like it's still worth implementing expireAfterAccess separately to piggyback on the existing access order lists and compact the node size.

ben-manes · 2023-11-25T22:33:41Z

A caveat for using the access order lists is that the read buffer is allowed to drop events. That could delay the eviction of the expired entry if the head is pinned to a non-expired item, as this user experienced. This shouldn't typically be a problem if the size is bounded or the workload varies, but if the user unintentionally pins the entry then there is unbounded growth. Of course the lossiness of the read buffer is crucial for concurrent read performance, whereas writes are not lossy for correctness so expireAfterWrite does not suffer from this problem.

In that user's scenario the solution was to switch to the timing wheel based approach. This corrected the problem because the wheels turn independently and, if it encounters a misplaced entry, it will reschedule the entry rather than halt the expiration early. That makes it a more robust of an algorithm for malicious workloads.

I think that the vast majority of expiration should use expireAfterWrite (time-to-live) as a freshness mechanism, so I tend to view most usages of expireAfterAccess (time-to-idle) as a likely mistake. It's certainly very useful when applied correctly, just not commonly needed. That might mean that the optimization to remove those extraneous fields might not be worth your effort. When it works correctly it is very satisfying how elegant and simple it feels, so I am certainly not disfavoring it as a reasonable algorithmic choice if that's what you prefer.

BitFaster.Caching/Lfu/ConcurrentLfu.cs

BitFaster.Caching/Lfu/NodePolicy.cs

bitfaster · 2023-11-26T01:05:18Z

I hadn't anticipated that. I had thought the opposite might occur - the dropped reads would fail to update the access order+expiry time of a Node, and it would be erroneously expired due to having a stale time stamp. Most likely I'm thinking of doing something that won't actually work in practice, and I didn't figure it out yet.

Thanks for all these gems of wisdom. I'm still getting my head around the timer wheel algorithm and before I get too deep into that I need to do a few rounds of changes so I can add this cleanly.

ben-manes · 2023-11-26T02:53:54Z

I probably updated the timestamp on the request since the entry would be buffered for some unknown amount of time and figured that fuzziness of the replay order wouldn't skew it too badly.

The timer wheel is neat but confusing, and I'm not sure how much of that is inherent or my fault. I didn't have a code reference and, since I wanted to make it as fast as I could for fun, I avoid expensive division / modulus for power-of-two time spans which makes the code harder to follow. I later found other implementations and most seem to use a chain-of-responsibility by having wheel objects where items flow downward. Those seemed to be more generic / flexible, but scatter the algorithm across multiple files and perhaps do extra work. I guess at the cost of being brittle and confusing, mine is crazy fast and perfect for what I needed. I'll be interested to see if you'd approach it differently once you get comfortable with it all.

bitfaster · 2023-11-28T05:00:41Z

It's good to start out with the fast version and I expect I will learn a lot from it. Your frequency sketch code translated to C# really well and that was also a fun exercise.

I will try to get deeper into it this week, there are many questions in my mind about how to get this working, and I first need to retrofit the generic node type into ConcurrentLfu without making a mess. I need some focus time to analyze TimerWheel, it looks very interesting and I also want to read the paper you linked to in your source code for background.

BitFaster.Caching/Lfu/NodePolicy.cs

BitFaster.Caching/Lfu/TimerWheel.cs

ben-manes · 2023-12-09T04:58:00Z

BitFaster.Caching/Lfu/TimerWheel.cs

+                        if ((node.GetTimestamp() - time) < 0)
+                        {
+                            cache.Evict(node);
+                        }


fyi, prior to the loop the linked list is detached so I rescheduled the node if not expired. Presumably that could be due to a concurrent update so the reordering had not happened yet or similar scenarios. If you don't reschedule then detaching is probably unnecessary. If you do then it avoids an accidental loop, e.g. the overflow bucket (1+ weeks) might scan over items that expire far into the distant future so it would reschedule back into it. Or maybe if it reschedules at the head instead of tail that would resolves it. Just something to review and consider how you prefer handling these scenarios.

Thanks! You probably just saved me several hours of debugging - so far I had only tested superficially by replicating some of your unit tests and single stepping in the debugger to understand how it works. Now you pointed this out, it seems that I must call schedule here, otherwise nodes would always stay in the same bucket and never cascade down through the wheels. I had distracted myself trying to understand under what conditions a resurrection could occur and missed it.

I'm not sure if this is very useful to you at this point, but I don't think you have a unit test that would detect my mistake (at least in TimerWheelTest.java), since the only test you have calling advance more than once is advance_backwards.

I think I have mapped out how to fit everything together, but I need to do a lot more testing. You made a great point in that all time-based policies can be implemented via TimerWheel, so I will start out with that and add the more specialist after access policy later as an optimization.

good point! I'll look into adding an explicit unit test for this. I hope that it is implicitly covered indirectly by other tests due to the data integrity checks that are performed after a cache test passes.

I suspected you had it covered - that's effectively what I would like to do next, I will make both a higher-level functional test suite and something that is more like concurrent/stress with constant expiry and check integrity - so good to have that link for inspiration.

I ported your test case, works great! 😄

I ported your test case, works great! 😄

It was more or less a copy paste of your other test. It wasn't correct for me - since I am not using nano seconds (Duration can use different time sources with different units to get better perf on different platforms), I had to adjust it a bit to get nodes into each wheel.

I have now added a way to assert on the position and print out the wheel within the tests, next job is the schedule tests. Likely it is completely working with what I have, but I am still poking around to understand it.

BitFaster.Caching/Lfu/ConcurrentLfuCore.cs

BitFaster.Caching.UnitTests/Lfu/ConcurrentTLfuSoakTests.cs

bitfaster · 2023-12-15T11:38:05Z

Baseline

Method	Runtime	Mean	Error	StdDev	Ratio	Allocated
ConcurrentDictionary	.NET 6.0	7.128 ns	0.1347 ns	0.1497 ns	1.00	-
ConcurrentLfuBackground	.NET 6.0	21.053 ns	0.4485 ns	0.9060 ns	2.89	-
ConcurrentLfuForeround	.NET 6.0	48.797 ns	0.9918 ns	1.0185 ns	6.86	-
ConcurrentLfuThreadPool	.NET 6.0	53.980 ns	2.2947 ns	6.7658 ns	6.32	-
ConcurrentLfuNull	.NET 6.0	20.419 ns	0.4308 ns	0.4789 ns	2.87	-

ConcurrentDictionary	.NET Framework 4.8	9.301 ns	0.2144 ns	0.2633 ns	1.00	-
ConcurrentLfuBackground	.NET Framework 4.8	35.731 ns	0.7128 ns	0.9014 ns	3.84	-
ConcurrentLfuForeround	.NET Framework 4.8	72.231 ns	1.3833 ns	1.4206 ns	7.77	-
ConcurrentLfuThreadPool	.NET Framework 4.8	33.419 ns	0.6971 ns	1.3093 ns	3.60	-
ConcurrentLfuNull	.NET Framework 4.8	24.960 ns	0.5225 ns	0.5132 ns	2.68	-

With time-based expiry changes:

Method	Runtime	Mean	Error	StdDev	Ratio	Allocated
ConcurrentDictionary	.NET 6.0	7.309 ns	0.1417 ns	0.1842 ns	1.00	-
ConcurrentLfuBackground	.NET 6.0	21.333 ns	0.4566 ns	0.9926 ns	2.94	-
ConcurrentLfuForeround	.NET 6.0	49.709 ns	0.6833 ns	0.6058 ns	6.75	-
ConcurrentLfuThreadPool	.NET 6.0	34.427 ns	0.6164 ns	0.8841 ns	4.71	-
ConcurrentLfuNull	.NET 6.0	20.406 ns	0.3131 ns	0.2929 ns	2.77	-

ConcurrentDictionary	.NET Framework 4.8	11.454 ns	0.2267 ns	0.2784 ns	1.00	-
ConcurrentLfuBackground	.NET Framework 4.8	37.182 ns	0.7228 ns	0.7733 ns	3.25	-
ConcurrentLfuForeround	.NET Framework 4.8	73.309 ns	1.4388 ns	1.5992 ns	6.41	-
ConcurrentLfuThreadPool	.NET Framework 4.8	51.505 ns	3.6956 ns	10.8964 ns	4.09	-
ConcurrentLfuNull	.NET Framework 4.8	26.443 ns	0.5488 ns	0.5636 ns	2.30	-

bitfaster · 2024-01-27T00:38:20Z

Last piece missing is to defend against frequent updates saturating the write queue. E.g. if the item is updated within 1 second or whatever, don't enqueue/reschedule.

bitfaster · 2024-01-27T00:49:34Z

BitFaster.Caching/Lfu/NodePolicy.cs

+        {
+            var currentExpiry = node.TimeToExpire - current;
+            node.TimeToExpire = current + expiryCalculator.GetExpireAfterRead(node.Key, node.Value, currentExpiry);
+            wheel.Reschedule(node);


Need a guard here to avoid rescheduling the same node into the same bucket if it was read twice or more, or if it was very recently read.

bitfaster · 2024-01-27T00:50:18Z

BitFaster.Caching/Lfu/NodePolicy.cs

+            {
+                var currentExpiry = node.TimeToExpire - current;
+                node.TimeToExpire = current + expiryCalculator.GetExpireAfterUpdate(node.Key, node.Value, currentExpiry);
+                wheel.Reschedule(node);


Need a guard here to avoid rescheduling the same node into the same bucket if it was updated twice or more, or if it was very recently changed.

The simplest cheap thing is to compare currentExpiry to node.TimeToExpire, and only reschedule if expiry has changed. But we can still execute the expiryCalculator many times for the same item, in the same maintenance cycle.

One strategy is to store the time scheduled in each node (either count each time maintenance runs, or store the timestamp), and only re-schedule when the time has changed (or has changed by more than some delta like 15ms, the worst-case clock resolution).

Both AccessOrderNode and TimeOrderNode have 2 bytes of padding when the key and value are a reference type, so using this padding we could count 65,535 update cycles without increasing node memory size. Storing an additional long would give more accuracy but will increase size by 8 bytes.

If a node is not rescheduled (assuming approximate time using the method above and a node is not touched for 65,535 update cycles) and has expired, it will remain in the timer wheel until it is accessed or the bucket it is in is expired. That is only a problem if the expiry calculator can generate an expiry far in the future, then a much shorter expiry for the same item. Not the common case, but something someone might do.

There is scope to combine the bools wasAccessed, wasDeleted and the int Position enum (1 byte + 1 byte + 4 bytes) - this could be packed into to single byte. Position should most likely be switched - this will eliminate the padding.

fyi, I might be misunderstanding.

If the entry was close to expiration and its lifetime was then extended (e.g. by a write), how would the approaches behave? If the call to the expiryCalculator happens during a maintenance cycle then there is a delay, so would the entry expire (and be visibly absent) or would it be resurrected? The latter could be confusing, especially if a load was triggered that could now be abandoned or would it replace the present value?

The naive but less confusing approach is to perform the expiryCalculator during the caller's operation to set the timestamp when the node's expires. Then on every maintenance cycle the accessed nodes would be reordered, possibly unnecessarily. That's hopefully ignorable since the cost is a cheap O(1) reordering.

Like you mentioned, there is a flaw of a missed reordering to shrink the expiry time. I think that could happen either in either approach. It would only happen to due GetExpireAfterRead shrinking the duration because those events can be discarded by the buffer, whereas the GetExpireAfterUpdate won't as writes cannot be lossy. I think its an okay scenario to ignore and rely on size eviction if a problem. A fix could be to promote the read event to instead submit into the write buffer if the duration shrinks by a significant amount. Then it would rarely take the penalty of a write buffer promotion and handle this edge case, so hopefully users wouldn't encounter it often enough to notice.

I got back to this at last :) But now it's not fresh in my mind.

I have probably diverged somewhat from your code - this is a summary:

I deferred calling expiryCalculator so that it happens during maintenance. That way lookups only need to compare expiry time to current time, in addition to writing to the buffer. My thinking was to favor optimizing lookups at all costs (I realize this is into the hair-splitting zone). This also means that only 1 thread is ever updating the expiry time, because maintenance is already protected by a lock.

Each time maintenance runs, the current time is evaluated only once, and all items in the read and write buffers will reflect the timestamp for that maintenance call. This assumes that the delay between reading/writing an item and maintenance running is very small.

Now, the read/write buffer could contain a hundred instances of the same node, with the access timestamp. Assuming the expiry calculator produces an identical expiry for the same inputs, this would mean needlessly doing the same computation hundreds of times.

In summary, I was thinking about how to defend against the read buffer having a high number of duplicates and optimizing this busy work. Just as you wrote your reply I was studying your code to see if you had a similar guard - I guessed it would be in the onAccess method here.

Another simple thing I considered is just storing a pointer to the last node I processed during maintenance, so if you get a bunch in a row it would collapse them and do expiry calc + reschedule fewer times.

Good point that the shrinking expiry time is anyway possible for dropped reads, I was too deep into the impact of this hypothetical optimization to realize it already occurs. I feel like it's a benign case because it falls back to a misprediction/cache pollution, rather than something bad like unbounded growth.

I'm fairly rusty on this all too and happy to learn when you diverge.

I didn't optimize for duplicates and decided that the cost was too small to be worth optimizing. There is some hashing, pointer swaps, etc. which we tried to make very cheap. Then if I had to coalesce duplicates into a single sketch increment, for example, that likely means some extra hashing for building an intermediate multiset. Since each read buffer has a max of 16 elements * NCPUs, that's 256 read buffer entries on a 16-core machine per maintenance run. I think that I tried a naive version and didn't see any difference in a benchmark, which made sense since lossy buffers avoid a client-side penalty, so I couldn't convince myself it would be beneficial and dropped the idea.

I made a fix here #603, my update code path can have a similar problem where we might not write the updated node to the write buffer, but that can only occur under load.

The overhead of the time APIs in .NET is inconsistent across operating systems - I currently have user space calls on Windows/Linux and a system call on MacOS (because unfortunately all the built in .NET time APIs are mapped to system calls on Mac).

I got some focus time to look at this properly, now I fully understand the consequences of updating expiry during maintenance:

Any entry dropped when the buffer is full (either a read or update) will have a stale expiry time.

Maintenance must run after read & update without being deferred. For write heavy workloads, this doesn't change much. For read heavy workloads with infrequent cache activity, maintenance will run on every read increasing overhead.

The fix I did yesterday only addresses part of 2. Handling 1 & 2 with this design will increase complexity. With hindsight updating expiry as part of the lookup as you have done is the better tradeoff, I will fix this completely next weekend.

I hesitated with this feature back in December because I had some gaps, then came back recently and did stress tests where this wasn't apparent. Thanks for pointing me in the right direction, I will get it fixed.

Your other option is to honor the entry as expiring early, so it’s still linearizable on queries by not reappearing. That might mean a slightly lower hit rate for a slightly faster read. I’d likely prefer the hit rate but it seems rare enough to be personal preference.

I agree - I would prefer better hit rate at the expense of a tiny per read overhead. Perf aside, fixing this also makes it much more intuitive.

This is now fixed in v2.5.1. Thanks again for the heads up.

wheel+nodes

518bc9f

bitfaster marked this pull request as draft November 25, 2023 07:11

github-advanced-security bot found potential problems Nov 25, 2023

View reviewed changes

fix typo

d7207d8

64bit

51d8b34

bitfaster mentioned this pull request Nov 25, 2023

Add TrailingZeroCount and LongCeilingPowerOfTwo #517

Merged

Alex Peck added 2 commits November 25, 2023 14:57

merge

acd21e4

+ConcurrentLfuCore

deb2ecf

github-advanced-security bot found potential problems Nov 26, 2023

View reviewed changes

BitFaster.Caching/Lfu/ConcurrentLfu.cs Fixed Show fixed Hide fixed

BitFaster.Caching/Lfu/ConcurrentLfu.cs Fixed Show fixed Hide fixed

BitFaster.Caching/Lfu/NodePolicy.cs Fixed Show fixed Hide fixed

BitFaster.Caching/Lfu/NodePolicy.cs Fixed Show fixed Hide fixed

Alex Peck added 3 commits November 25, 2023 16:08

undo bitops

6cfa3cc

undo bitops

5c8c38c

node

31eb6b7

bitfaster mentioned this pull request Dec 2, 2023

Extract ConcurrentLfuCore with generic node and policy #520

Merged

merge

1b7f5ce

github-advanced-security bot found potential problems Dec 3, 2023

View reviewed changes

BitFaster.Caching/Lfu/NodePolicy.cs Fixed Show fixed Hide fixed

cleanup merge

73cdbae

github-advanced-security bot found potential problems Dec 3, 2023

View reviewed changes

BitFaster.Caching/Lfu/NodePolicy.cs Fixed Show fixed Hide fixed

Alex Peck added 2 commits December 3, 2023 01:29

simplify generics

1fc15fa

outline tests

4b35e07

github-advanced-security bot found potential problems Dec 4, 2023

View reviewed changes

BitFaster.Caching/Lfu/TimerWheel.cs Fixed Show fixed Hide fixed

BitFaster.Caching/Lfu/TimerWheel.cs Fixed Show fixed Hide fixed

Alex Peck added 3 commits December 8, 2023 16:52

more tests

dc49f73

comments

ff36c1a

merge

728f14a

ben-manes reviewed Dec 9, 2023

View reviewed changes

Alex Peck added 3 commits December 9, 2023 13:13

merge

97cc90a

nullability static analysis

e4a81fc

schedule

a7ba4c8

ben-manes added a commit to ben-manes/caffeine that referenced this pull request Dec 10, 2023

added test case from bitfaster/BitFaster.Caching#516 (comment)

7bfba42

ben-manes added a commit to ben-manes/caffeine that referenced this pull request Dec 10, 2023

added test case from bitfaster/BitFaster.Caching#516 (comment)

f172a2c

Alex Peck added 5 commits December 11, 2023 19:06

assert wheel pos

4634517

port all tests

d486add

test e2e

52ee735

policy

5950e1a

cleanup

2e3056a

github-advanced-security bot found potential problems Dec 14, 2023

View reviewed changes

BitFaster.Caching/Lfu/ConcurrentLfuCore.cs Fixed Show fixed Hide fixed

BitFaster.Caching.UnitTests/Lfu/ConcurrentTLfuSoakTests.cs Dismissed Show dismissed Hide dismissed

Alex Peck added 2 commits December 13, 2023 22:03

test coverage

94bcf44

explicit interface impl

462e515

Alex Peck added 2 commits December 15, 2023 03:42

rem comment

50cb404

Merge branch 'main' into users/alexpeck/lfuexpire

1f64737

bitfaster marked this pull request as ready for review December 15, 2023 11:42

Merge branch 'main' into users/alexpeck/lfuexpire

44ebb9b

bitfaster commented Jan 27, 2024

View reviewed changes

ben-manes mentioned this pull request Apr 28, 2024

About expiration performance maypok86/otter#90

Closed

bitfaster and others added 4 commits May 6, 2024 12:56

Merge branch 'main' into users/alexpeck/lfuexpire

24fcff6

Merge branch 'main' into users/alexpeck/lfuexpire

769d9c1

merge

3196417

mem layout

3e24514

bitfaster merged commit de72842 into main May 8, 2024
13 checks passed

bitfaster deleted the users/alexpeck/lfuexpire branch May 10, 2024 06:38

bitfaster mentioned this pull request Jun 1, 2024

LFU after access fails to update expiry when read buffer is not full #603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConcurrentLfu time-based expiry #516

ConcurrentLfu time-based expiry #516

bitfaster commented Nov 25, 2023 •

edited

Loading

coveralls commented Nov 25, 2023 •

edited

Loading

ben-manes commented Nov 25, 2023

bitfaster commented Nov 25, 2023

ben-manes commented Nov 25, 2023

bitfaster commented Nov 26, 2023

ben-manes commented Nov 26, 2023 •

edited

Loading

bitfaster commented Nov 28, 2023

ben-manes Dec 9, 2023

bitfaster Dec 9, 2023

ben-manes Dec 9, 2023

bitfaster Dec 9, 2023

ben-manes Dec 10, 2023

bitfaster Dec 12, 2023

bitfaster commented Dec 15, 2023

bitfaster commented Jan 27, 2024

bitfaster Jan 27, 2024 •

edited

Loading

bitfaster Jan 27, 2024

bitfaster May 7, 2024

ben-manes May 7, 2024

bitfaster May 7, 2024

ben-manes May 7, 2024

bitfaster Jun 1, 2024

bitfaster Jun 2, 2024

ben-manes Jun 3, 2024

bitfaster Jun 5, 2024

bitfaster Jun 9, 2024

ConcurrentLfu time-based expiry #516

ConcurrentLfu time-based expiry #516

Conversation

bitfaster commented Nov 25, 2023 • edited Loading

coveralls commented Nov 25, 2023 • edited Loading

ben-manes commented Nov 25, 2023

bitfaster commented Nov 25, 2023

ben-manes commented Nov 25, 2023

bitfaster commented Nov 26, 2023

ben-manes commented Nov 26, 2023 • edited Loading

bitfaster commented Nov 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitfaster commented Dec 15, 2023

bitfaster commented Jan 27, 2024

bitfaster Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitfaster commented Nov 25, 2023 •

edited

Loading

coveralls commented Nov 25, 2023 •

edited

Loading

ben-manes commented Nov 26, 2023 •

edited

Loading

bitfaster Jan 27, 2024 •

edited

Loading