Replies: 1 comment
-
I have only checked for regressions when making changes that might affect performance. It is a little more time consuming to set up because a stable benchmark requires a dedicated server-class machine that has no erroneous background processes / noisy neighbors. However the main reason is because I don't think it would be helpful and communicates the wrong message when judging performance. The concurrent benchmarks are meant to help answer if the cache satisfies your performance budget and if it scales with your hardware. Once it has reached the goal of being fast enough then how much better quickly diminishes in importance compared to the actual application work. For example 150M vs 200M reads/s is 1.67ns which is negligible and therefore equivalent. It is the cases like a synchronized LinkedHashMap stuck at ~10M/s that can becomes a bottleneck due to lock contention. Similarly a read/write lock might look fast if only measuring reads, but in a mixed workload the lock's throughput drops due to writers obtaining exclusive access. Therefore only a handful of benchmarks are provided to show that Caffeine shouldn't be a bottleneck, scales with you, and that reads are not negatively impacted by writes. The focus for system performance then shifts to the hit rate, where maximizing it reduces the average latency. While the hit penalty is a few nanoseconds, the miss penalty is measured in many milliseconds. The efficiency benchmarks show a variety of workload types and ability to adapt if the pattern changes in order to indicate that the cache will be competitively stay at the highest hit rate. As an eviction policy makes predictions that are not always correct, we can't win every round but we can be robustly competitive across a large variety of workloads without any significant loses. Since the workload changes over the lifetime of the cache we don't want users to manually tune it except for the maximum size, so adaptivity is important for the set-and-forget reality of development. Lastly for a design perspective we chose only amortized O(1) algorithms. A benchmark with a small cache can hide a bottleneck once larger, so we want to avoid any surprising performance regressions. Similarly we also have to mitigate adversarial conditions like hash flooding. For Caffeine this means adding a small penalty on reads so that writes can remain O(1), which is acceptable if we continue to satisfy are prior performance requirements. That means figuring out a satisfactory algorithms is required before adding a feature, such as our use of a timer wheel for expiration, whereas others used a logarithmic priority queue or polluted the cache with dead entries. We don't want frustrating problems like GC pause times or being surprised by quicksort-style quadratic complexity scenario. I apologize if that was too long. I hope it explains why I show a few meaningful benchmarks and discuss the internal algorithms. I try to not focus attention on metrics that believe won't helps end users and could lead them astray. Once your options greatly surpass your performance needs then your decision should be based on other factors like the community, api flexibility, maturity, development activity, etc. Thankfully in the large Java ecosystem this means that you have multiple good choices. |
Beta Was this translation helpful? Give feedback.
-
The benchmarks that are provided for Caffeine are impressive to look at. The comparison with other cache-like implementations is interesting. Is there data available on the progression of performance between different releases of Caffeine? Eg: how, generically, does performance of Caffeine 2.7.0 compare to 2.9.3?
Beta Was this translation helpful? Give feedback.
All reactions