Skip to content

ConcurrentLru

Alex Peck edited this page Sep 20, 2022 · 44 revisions

The ConcurrentLru replacement policy is a pseudo LRU similar to the TuQ segmented LRU algorithm found in OpenBSD (code) and memcached (docs, PR). ConcurrentLru has no background thread performing maintenance, LRU item order is updated only on cache write and is very low overhead. Cache hits do not alter item order, so continuous high hit rate on the same 'hot' item is very close in performance to a ConcurrentDictionary lookup.

ConcurrentLru approximates LRU item order using three FIFO queues. All queue operations are O(1) lock free to give the best possible performance when used concurrently by multiple threads. Items flow through the queues like this:

When an item is fetched, it is first added to the hot queue. When the hot queue is full and an item is fetched, the last item in the hot queue is moved to either warm or cold depending on whether it has been accessed since it was added. The same process is followed for the warm and cold queues. Each time an item is fetched, it is marked as touched. When the queues are cycled, touched items in the warm queue and cold queues are moved back to the head of warm. Untouched items in warm are moved to cold. Untouched items in cold are evicted.

Capacity and discarded items

No items are discarded until capacity is reached. This makes it straightforward to reason about maximum item count/memory usage, but remember that items will not be released if they are not accessed - no item is evicted until the number of unique items fetched exceeds capacity, at which point items are cycled through the queues until evicted. Therefore no active item in ConcurrentLru is subject to Garbage Collection.

TotalCapacity = HotCapacity + WarmCapacity + ColdCapacity

Scanning

A scan is a sequence of one-time access requests. LRUs perform badly when presented with a sequential scan of every key when the number of keys is larger than the size of the cache.

The warm queue in ConcurrentLru provides some resistance to scanning when mixed into something like a Zipf distributed access pattern. However, if the access pattern is a pure sequential scan, and the scan count is greater than 2 * (capacity/3), cache miss rate will be 100%.

This is because every cached item is evicted before it is accessed for the second time. Below is a worked example to illustrate how this happens step by step.

Let's say we create a ConcurrentLru with capacity = 3, and fetch 3 keys in sequence twice in a row.

var lru = new ConcurrentLru<int, int>(3);

// sequential pass 1
for (int i = 0; i < 3; i++)
{
    lru.GetOrAdd(i, k => k);
}

// sequential pass 2
for (int i = 0; i < 3; i++)
{
    lru.GetOrAdd(i, k => k);
}

Initially, the hot (H), warm (W) and cold (C) queues are empty and each has a capacity of 1 item:

H[], W[], C[]

On the first sequential pass the internal state of the queues is as follows at each step:

i = 0 H[0], W[], C[]
i = 1 H[1], W[], C[0]
i = 2 H[2], W[], C[1]

Note that item 0 was evicted when fetching item 2. On the second sequential pass there is no step where the LRU already contains the key that is requested:

i = 0 H[0], W[], C[2]
i = 1 H[1], W[], C[0]
i = 2 H[2], W[], C[1]

When the capacity is greater the same problem still occurs when all keys are fetched in an identical sequence - fetching new items pushes the old items sequentially through the hot and cold queues until they are evicted. The same problem affects the classic implementation of LRU based on linked list order when the number of scanned items is equal to the LRU size, so this is a general weakness of LRU algorithms.

In most practical applications, a sequential scan is not the primary access pattern, so this is not a significant limitation. MemoryCache given the same access pattern will likely load all values into memory. This results in a different and potentially worse problem: unbounded memory growth and potential total failure due to out of memory.