Skip to content

Latency

Alex Peck edited this page Mar 11, 2023 · 12 revisions

In these benchmarks, a cache miss is essentially free. These tests exist purely to compare the raw execution speed of the cache bookkeeping code. In a real setting, where a cache miss is presumably quite expensive, the relative overhead of the cache will be very small.

DISCLAIMER: Always measure performance in the context of your application. The results provided here are intended as a guide.

Benchmarks are based on BenchmarkDotNet, so are single threaded. Both ConcurrentLru and ConcurrentLfu scale well with concurrent workloads. The relative ranking of each cache implementation is stable across .NET Framework/Core/5/6 and on the CPU architectures available in Azure (e.g. Intel Skylake, AMD Zen). Absolute performance can vary.

Raw Lookup speed

In this test the same items are fetched repeatedly, no items are evicted. Representative of high hit rate scenario, when there are a low number of hot items. ConcurrentDictionary lookup is used to establish a baseline.

  • The ConcurrentLru family does not move items in the queues, it is just marking as accessed for pure cache hits and is therefore very fast. This test highlights the difference in performance between FastConcurrentLru, ConcurrentLru, a FastConcurrentLru with atomic GetOrAdd (AtomicFastLru) and ConcurrentTLru. All share the same underlying cache algorithm but have successively more features enabled.
  • ConcurrentLfu is performing a dictionary lookup and logging the read in the read buffer. It has been configured with the BackgroundThreadScheduler in this test.
  • ClassicLru must maintain item order, and is internally splicing the fetched item to the head (MRU position) of the linked list.
  • Both the older runtime MemoryCache and new extensions memory cache are tested. Since the key is an integer, the runtime cache must convert to a string and the extensions version must box the integer, hence there are memory allocations).

image

Tabular Benchmark Data
Method Runtime Mean StdDev Ratio Allocated
ConcurrentDictionary .NET 6.0 7.414 ns 0.2003 ns 1.00 -
FastConcurrentLru .NET 6.0 10.132 ns 0.1374 ns 1.36 -
ConcurrentLru .NET 6.0 16.768 ns 0.3520 ns 2.26 -
AtomicFastLru .NET 6.0 20.682 ns 0.3037 ns 2.78 -
FastConcurrentTLru .NET 6.0 12.261 ns 0.1368 ns 1.65 -
ConcurrentTLru .NET 6.0 18.277 ns 0.4559 ns 2.46 -
ConcurrentLfu .NET 6.0 30.369 ns 0.9932 ns 4.11 -
ClassicLru .NET 6.0 49.094 ns 0.5863 ns 6.59 -
RuntimeMemoryCacheGet .NET 6.0 114.814 ns 1.3774 ns 15.41 32 B
ExtensionsMemoryCacheGet .NET 6.0 64.554 ns 3.0551 ns 8.57 24 B
ConcurrentDictionary .NET Framework 4.8 14.250 ns 0.2224 ns 1.00 -
FastConcurrentLru .NET Framework 4.8 15.120 ns 0.3588 ns 1.07 -
ConcurrentLru .NET Framework 4.8 23.980 ns 0.5742 ns 1.69 -
AtomicFastLru .NET Framework 4.8 38.954 ns 0.5826 ns 2.74 -
FastConcurrentTLru .NET Framework 4.8 45.314 ns 0.8630 ns 3.19 -
ConcurrentTLru .NET Framework 4.8 48.843 ns 0.4182 ns 3.42 -
ConcurrentLfu .NET Framework 4.8 54.933 ns 0.9048 ns 3.86 -
ClassicLru .NET Framework 4.8 60.762 ns 1.1949 ns 4.27 -
RuntimeMemoryCacheGet .NET Framework 4.8 288.522 ns 3.1063 ns 20.26 32 B
ExtensionsMemoryCacheGet .NET Framework 4.8 123.936 ns 1.0675 ns 8.70 24 B

Lookup keys with a Zipf distribution

Take 1000 samples of a Zipfian distribution over a set of keys of size N and use the keys to lookup values in the cache. If there are N items, the probability of accessing an item numbered i or less is (i / N)^s.

s = 0.86 (yields approx 80/20 distribution)
N = 500

Cache size = N / 10 (so we can cache 10% of the total set). ConcurrentLru has approximately the same computational overhead as a standard LRU in this single threaded test.

Method Mean Error StdDev Ratio RatioSD
ClassicLru 175.7 ns 2.75 ns 2.43 ns 1.00 0.00
FastConcurrentLru 180.2 ns 2.55 ns 2.26 ns 1.03 0.02
ConcurrentLru 189.1 ns 3.14 ns 2.94 ns 1.08 0.03
FastConcurrentTLru 261.4 ns 4.53 ns 4.01 ns 1.49 0.04
ConcurrentTLru 266.1 ns 3.96 ns 3.51 ns 1.51 0.03