-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v2 contracts #74
Comments
The approximate api of the builder. type Builder[K comparable, V any] interface {
// It seems better to separate cost and size after all.
// In the experience of ristretto and communication with otter users,
// people do not perceive specifying this with one parameter well.
//
// Also, with this api, we will be able to prohibit the simultaneous use of the MaximumSize and Cost functions.
MaximumSize(maximumSize int) Builder[K, V]
MaximumCost(maximumCost int) Builder[K, V]
CollectStats() Builder[K, V]
InitialCapacity(initialCapacity int) Builder[K, V]
Cost(costFunc func(key K, value V) uint32) Builder[K, V]
DeletionListener(deletionListener func(key K, value V, cause DeletionCause)) Builder[K, V]
// We are slightly reducing the cognitive load on the user.
TTL(ttl time.Duration) Builder[K, V]
// It seems that the use of a variable ttl is almost never required and
// the user should have at least some rule by which he wants to calculate the ttl.
//
// In principle, Set(key, value, ttl) can be transferred to Extension.
//
// It also becomes more difficult to use it when it is not needed. :)
VarTTL(ttlFunc func(key K, value V) uint32) Builder[K, V]
Build() (Cache[K, V], error)
}
func NewBuilder[K comparable, V any]() Builder[K, V] {
...
} |
type Cache[K comparable, V any] interface {
Has(key K) bool
Get(key K) (V, bool)
Set(key K, value V) bool
// the existing value?
// I also want to return the previous value, but then how to take into account the rejection due to the size?
SetIfAbsent(key K, value V) bool
Delete(key K)
DeleteByFunc(f func(key K, value V) bool)
Range(f func(key K, value V) bool)
Clear()
Close()
Size() int
Stats() Stats
Extension() Extension[K, V]
} |
I'd vote for changing terms from TTL is pretty well known term, but what "live" means depends on the implementation. Some use "time since creation" whereas others use "time since last write". Or sometime people will say TTL as the time for the client receiving it, like an http expiry duration header. So I think the acronym could be misinterpreted vs a more generic term like
|
It may be worth stopping returning bool in case of rejections. Then the API will become a little simpler. Still, otter guarantees that it can reject an item only because of the eviction policy, and not as a ristretto because of contention. type Cache[K comparable, V any] interface {
Set(key K, value V)
SetIfAbsent(key K, value V) (V, bool)
} |
It doesn't really matter to me here. It can be replaced with weight.
Oh, that's the truth. I have not yet seen any implementation on Go that uses "expire after last access". Except, goburrow, but it explicitly separates this from "expire after create/last write". But if we separate the different versions by
I need to answer one question. Can these functions be used for a single cache instance? It seems to me that this is absolutely useless, but if there is some kind of use case, then there will be a very unpleasant problem with the expiration policy with a variable ttl. And in fact, a similar problem is very likely to be in type Builder[K comparable, V any] interface {
ExpireAfterCreate(duration time.Duration) Builder[K, V]
ExpireAfterWrite(duration time.Duration) Builder[K, V]
ExpireAfterAccess(duration time.Duration) Builder[K, V]
ExpireAfter(/* what??? */) Builder[K, V]
} It seems that there is a need to use an interface with several methods. But
Ideally, we would avoid using the same interface for all three functions. Could it be something like that? type Builder[K comparable, V any] interface {
ExpireAfter(opts ...ExpiryOption[K, V]) Builder[K, V]
}
type ExpiryOption[K comparable, V any] func(*expiry[K, V]) But this again looks doubtful.
Yes, goroutine is used for the worker. In general, the use of |
They shouldn't be. Guava/Caffeine allow it prior to variable support and naivety as it seemed to have little cost, but I'd stop allowing multiple expiration policies if possible. Usually users seem to set both
Right, I have an expiry interface with create, update, read methods. Ideally it would be on the builder for easiest discovery where all of the options funnel into variable expiration. You can see the discussion in ben-manes/caffeine#1499, as users often made mistakes with the interface and of course it was pretty verbose. If I could start anew then I'd make it convenient and fail at runtime if multiple where selected Caffeine.newBuilder() // user can chose one
.expireAfterCreate(Duration.ofMinutes(5))
.expireAfterCreate((K key, V value) -> duration)
.expireAfterWrite(Duration.ofMinutes(5))
.expireAfterWrite((K key, V value) -> duration)
.expireAfterAccess(Duration.ofMinutes(5))
.expireAfterAccess((K key, V value) -> duration)
.expireAfter(new Expiry...) For backwards compatibility I went with static factories on the interface so they can do one of, .expireAfter(Expiry.creating((Key key, Graph graph) -> duration))
.expireAfter(Expiry.writing((Key key, Graph graph) -> duration))
.expireAfter(Expiry.accessing((Key key, Graph graph) -> duration)) They can define their own
Does that need to be long-lived? I use a state machine to schedule the maintenance worker so only one should be in-flight. This submits a task to an I'd avoid finalizers, as at least in Java they are considered a misfeature and are being deprecated for removal. However java offers other options like weak and phantom references that cover the clean up needs better, without penalizing the GC or quirks like object resurrection. Maybe they're cleaner in Go. |
Great, then this simplifies the situation a bit. .expireAfterCreate(Duration.ofMinutes(5))
.expireAfterCreate((K key, V value) -> duration) There is a small problem. There is no method overloading in Go and it even makes sense. If we can do without What about this?
Actually, no, that's largely why I asked about removing goroutines, but most likely I didn't formulate it very clearly. It would be cool to abandon this, but so far I don't understand the benefits for users from this, unlike adding In fact, finalizers are incredibly often used when working with cgo. Moreover, there are simply no other good ways to clean up resources other than finalizers and |
Yeah, LoadingCache is really complicated. type LoadingCache[K comparable, V any] interface {
...
Get(ctx context.Context, key K) (V, error)
BulkGet(ctx context.Context, keys []K) (map[K]V, error)
// extension?
Extension() LoadingExtension[K, V]
}
type LoadingExtension[K comparable, V any] interface {
GetQuietly(ctx context.Context, key K) (V, error)
GetEntry(ctx context.Context, key K) (Entry[K, V], error)
GetEntryQuietly(ctx context.Context, key K) (Entry[K, V], error)
}
// :(
type SingleLoader[K comparable, V any] interface {
Load(ctx context.Context, key K) (V, error)
}
type BulkLoader[K comparable, V any] interface {
BulkLoad(ctx context.Context, keys []K) (map[K]V, error)
}
type SingleBulkLoader[K comparable, V any] interface {
SingleLoader[K, V]
BulkLoader[K, V]
}
type Builder[K comparable, V any] interface {
// and we are trying to cast to the BulkLoader inside,
// otherwise we use the standard implementation of BulkLoad.
BuildLoading(loader SingleLoader[K, V]) (LoadingCache[K, V], error)
} |
This seems more obvious, not sure about Go idioms if that is a common naming style.
Unfortunately Go api design is outside my expertise. I was able to use Java features to layer it in a way to reduce user complexity, e.g. using default methods. The context wasn't needed since Java typically is paired with dependency injection with request-scoped providers, e.g. to get the logged in user bound to that thread. If not, then the user likely will use a loading |
The main problem is that
Since it changes from request to request, it should also be used in loader. And the idea of casting the interface is actually often used in the standard library. For example, here. I need to think about it, but at least the loss of timeouts, spans and information in the logs look very critical. |
It sounds like func GetIfPresent(key K) (V, bool)
func Get(ctx context.Context, key K, mappingFunction func(K) V) (V, error)
func GetAll(ctx context.Context, keys []K, mappingFunction func([]K) map[K]V) (map[K]V, error) |
if |
To be honest, I didn't really understand :(. Why do I need to specify the loader every time I use the method? type (
LoaderFunc[K comparable, V any] func(ctx context.Context, key K) (V, error)
BulkLoaderFunc[K comparable, V any] func(ctx context.Context, keys []K) (map[K]V, error)
)
type Cache[K comparable, V any] interface {
Has(key K) bool
Get(key K) (V, bool)
BulkGet(keys []K) (map[K]V, bool)
// We are trying to use the loader passed when creating the cache instance,
// if nothing was passed to Load.
// But what if the loader has not been passed anywhere?
Load(ctx context.Context, key K, loader ...LoaderFunc[K, V]) (V, error)
BulkLoad(ctx context.Context, keys []K, bulkLoader ...BulkLoaderFunc[K, V]) (map[K]V, error)
} Or we can even use a more canonical `Option'. Unfortunately, this is one allocation per request. Load(ctx context.Context, key K, opts ...LoadOption[K, V]) (V, error)
BulkLoad(ctx context.Context, keys []K, opts ...LoadOption[K, V]) (map[K]V, error) But what to do with the methods in the It is also interesting what needs to be done to allow the use of explicit ttl. We can hide this in the @proost for me, the methods in For the user, Load only automatically downloads data from an external source if nothing is found. Although with the integration of the API, the line is really blurred. |
Also, if we use WithMaximumSize[K, V](1000) I used the builder to create a more typed cache and add |
When I was trying to make In terms of the API
That is, if the second returned value is false, users will expect the first returned value to be the V already in the cache. However, in case of rejection, there can be no such V in the cache and users have no way to tell. If we want to keep the rejection behavior, I think a slightly complicated API is needed, such as |
I guess, it has something to do with developer experience. Many times, it is hard to prepare the loader function at initialization. Or the loader function may need some conditional dependencies that can't be determined at initialization. |
Since the entry hangs around until GC'd anyway, it didn't seem too awful to me to handle the rejection within the eviction policy logic instead of at the map insertion. The add / update tasks shuffle an entry that is too large for immediate eviction to avoid flushing the cache. Then it is handled however the entry was added (load, put, replace), its linearizable, and simple to understand. I think the only immediate rejection that I do is for AsyncCache when inserting a completed and failed future value, but that's quite minor since it has the callback handler otherwise if it fails after insert. |
@rueian Yes, it seems like a good solution.
This is true, but I was referring specifically to the requirement of specifying a loader. That is, instead of Load(ctx context.Context, key K, loader LoaderFunc[K, V]) (V, error) it seems better to optionally handle loader Load(ctx context.Context, key K, loader ...LoaderFunc[K, V]) (V, error) But it's very likely a special effect from ChatGPT. But something bothers me much more. What should we do if loader was not passed either in the builder and explicitly in the function? It seems easiest to just return some kind of predefined error. |
In the case of cache, err := otter.NewBuilder[int, int]().
MaximumSize(1000).
ExpireAfterWrite(time.Hour).
CollectStats().
Loader(func(ctx context.Context, key K) (V, error) {
...
}).
BulkLoader(func(ctx context.Context, keys []K) (map[K]V, error) {
...
}).
BuildLoading() |
JCache is an alternative standardized Java api which I don't like, but might be a helpful reference. They have a more powerful loader called |
To be honest, it looks overloaded. Actually, I quite like the alternative with options, but I'm still not completely sure about this decision and I want to hear someone else's opinion. Of course, this approach has drawbacks, but in my opinion it looks quite good. The only thing that bothers me a little is the use of
Optiontype GetOption[K comparable, V any] func(*getOptions[K, V])
func WithRecordStats[K comparable, V any](recordStats bool) GetOption[K, V] {
return func(o *getOptions[K, V]) {
o.recordStats = recordStats
}
}
func WithQuietly[K comparable, V any]() GetOption[K, V] {
return func(o *getOptions[K, V]) {
o.isQuietly = true
}
}
func WithLoader[K comparable, V any](loader func(ctx context.Context, key K) (V, error)) GetOption[K, V] {
return func(o *getOptions) {
o.loader = loader
}
}
type SetOption[K comparable, V any] func(*setOptions[K, V])
func WithTTL[K comparable, V any](duration time.Duration) SetOption[K, V] {
return func(so *setOptions[K, V]) {
so.ttl = duration
}
} Cachetype basicCache[K comparable, V any] interface {
Set(key K, value V, opts ...SetOption[K, V])
SetIfAbsent(key K, value V, opts ...SetOption[K, V]) (V, bool)
Delete(key K)
DeleteByFunc(f func(key K, value V) bool)
Range(f func(key K, value V) bool)
Clear()
Size() int
Stats() Stats
}
type Cache[K comparable, V any] interface {
basicCache[K, V]
Has(key K, opts ...GetOption[K, V]) bool
Get(key K, opts ...GetOption[K, V]) (V, bool)
GetEntry(key K, opts ...GetOption[K, V]) (Entry[K, V], bool)
BulkGet(keys []K, opts ...GetOption[K, V]) (map[K]V, bool)
}
type LoadingCache[K comparable, V any] interface {
basicCache[K, V]
Get(ctx context.Context, key K, opts ...GetOption[K, V]) (V, error)
GetEntry(ctx context.Context, key K, opts ...GetOption[K, V]) (Entry[K, V], error)
BulkGet(ctx context.Context, key K, opts ...GetOption[K, V]) (map[K]V, error)
} Buildertype Builder[K comparable, V any] interface {
MaximumSize(maximumSize int) Builder[K, V]
MaximumWeight(maximumWeight int) Builder[K, V]
CollectStats() Builder[K, V]
InitialCapacity(initialCapacity int) Builder[K, V]
Weigher(weigher func(key K, value V) uint32) Builder[K, V]
DeletionListener(deletionListener func(key K, value V, cause DeletionCause)) Builder[K, V]
ExpireAfterCreate(duration time.Duration) Builder[K, V]
ExplicitExpireAfterCreate() Builder[K, V]
ExpireAfterWrite(duration time.Duration) Builder[K, V]
ExplicitExpireAfterWrite() Builder[K, V]
ExpireAfterAccess(duration time.Duration) Builder[K, V]
ExplicitExpireAfterAccess() Builder[K, V]
Loader(loader func(ctx context.Context, key K) (V, error)) Builder[K, V]
BulkLoader(bulkLoader func(ctx context.Context, keys []K) (map[K]V, error)) Builder[K, V]
Build() (Cache[K, V], error)
BuildLoading() (LoadingCache[K, V], error)
} Examples// get quietly
value, ok := cache.Get(key, otter.WithQuietly[K, V]())
// explicit ttl
cache, err := otter.NewBuilder[int, int]().
MaximumSize(1000).
ExplicitExpireAfterWrite().
CollectStats().
Build()
...
cache.Set(key, value, otter.WithTTL[int, int](time.Hour))
// loader
loader := ...
cache.Get(ctx, key, otter.WithLoader(loader)) |
How about letting the loader return TTL as well? I think, in many cases, TTL can only be known when loading the entry. Loader(loader func(ctx context.Context, key K) (V, time.Duration, error)) Builder[K, V]
BulkLoader(bulkLoader func(ctx context.Context, keys []K) (map[K]WithTTL[V], error)) Builder[K, V] |
Oh, I can hardly imagine cases in which the ttl is not known in advance. It seems that in such cases, network requests are not needed and such a function in the builder is enough. type Builder[K comparable, V any] interface {
ExpireAfterCreateFunc(f func(key K, value V) time.Duration) Builder[K, V]
} And when the ttl is returned in the loader, it is unlikely that the user will understand what needs to be returned as ttl there if it is not used. |
Okay, it seems that such an api should be good enough. So I can start developing.) |
Was just reading this thread and saw that you want to go with "Go options" style, but don't want to pay the cost in allocations. You can allocate the options on the stack in basically the same manner and save the allocation. Might fit your use case, might not, but threw up an example for you: Happy coding. |
Haha, it really works. I tried to allocate on the stack and pass a pointer, but it leaked onto the heap and I didn't try to fix the escape analysis decision somehow. Thank you very much! I have almost no free time for otter right now, but I hope there will be more of it later and I will release v2 anyway. :( |
Trick I came up a while ago. Was working on something where I spent so much time getting rid of allocations, allocating the options seemed a bad idea. And I get the time thing, have a lot of that problem. Cheers! |
sturdyc might be a helpful reference. It has loading, request coalescing (“buffering”), fallback caching (“passthrough”), and negative caching ("non-existent") but lacks good eviction/expiration. Their internal inFlight is a future and the missing record is an optional, which is how caffeine lets callers do these. Since that’s not idiomatic in go, their api does seem nice (as a non-go developer). |
Thoughts about Otter v2Disclaimer: I'm sorry for such a long message. I have noted questionable (relatively) points with Basic concepts and featuresI would like to achieve the following goals in v2:
Builder vs Functional optionsFunctional options are much more common in Go than builder and are considered more canonical by many people, but I absolutely don't like how they look with generics. Functional optionscache, err := otter.New[string, string](100,
WithCollectStats[string, string](),
WithTTL[string, string](time.Hour),
WithInitialCapacity[string, string](100),
) vs Buildercache, err := otter.MustBuilder[string, string](100).
CollectStats().
TTL(time.Hour).
InitialCapacity(100).
Build() So for now, I'm still decided to use the builder to construct the cache. CapacityThe user is obliged to immediately specify the cache capacity in the builder now. After that, he can also specify the function of calculating the cost of the item. The sum of all costs <= capacity specified in the builder. By default, the cost of any item is 1. Now the API looks like this. cache, err := otter.MustBuilder[string, string](int(10_000_000)).
Cost(func(key string, value string) uint32 {
return uint32(len(key) + len(value))
}).
InitialCapacity(int(1000)).
Build() It should become something like that. A cache that specifies the capacity in the number of items. cache, err := otter.NewBuilder[string, string]().
MaximumSize(uint64(10_000)).
Build() A cache that specifies the capacity in the maximum sum of entry weights. cache, err := otter.NewBuilder[string, string]().
MaximumWeight(uint64(10_000_000)).
Weigher(func(key string, value string) uint32 {
return uint32(len(key) + len(value))
}).
InitialCapacity(int(1000)).
Build()
DeletionListenerThe user may want to do something when deleting an entry. Now it's implemented like this: // DeletionCause the cause why a cached entry was deleted.
type DeletionCause uint8
const (
// Explicit the entry was manually deleted by the user.
Explicit DeletionCause = iota
// Replaced the entry itself was not actually deleted, but its value was replaced by the user.
Replaced
// Size the entry was evicted due to size constraints.
Size
// Expired the entry's expiration timestamp has passed.
Expired
)
type Builder[K comparable, V any] interface {
DeletionListener(deletionListener func(key K, value V, cause DeletionCause)) Builder[K, V]
} I like that such an API allows the user not to use a huge number of callbacks, but the user has to carefully choose the causes he needs. From what I've seen, in most cases the user is only interested in "was the entry automatically deleted or not?". So it might be worth adding such a method: func (dc DeletionCause) WasEvicted() bool {
switch dc {
case Size, Expired:
return true
default:
return false
}
} StatsCurrently, the user cannot specify a custom implementation of the statistics collector in the builder. He can only get a snapshot of the statistics from the cache. type Stats interface {
Hits() int64
Misses() int64
Ratio() float64
RejectedSets() int64
EvictedCount() int64
EvictedWeight() int64
}
...
cache, err := otter.MustBuilder[string, string](10_000).
CollectStats().
Build()
stats := cache.Stats() I think this API option is good. type Stats interface {
Hits() uint64
Misses() uint64
HitRatio() float64
MissRatio() float64
RejectedSets() uint64
Evictions() uint64
EvictionWeight() uint64
}
// general contract
type StatsCollector interface {
CollectHits(count int)
CollectMisses(count int)
CollectEviction(weight uint32, cause DeletionCause)
Snapshot() Stats
}
// optional
type LoadingStatsCollector interface {
CollectLoadSuccess(loadDuration time.Duration)
CollectLoadFailure(loadDuration time.Duration)
}
type Builder[K comparable, V any] interface {
// if the collectors are not specified, then we use the default one.
// if one collector is specified, then we use it.
// if more than one collector is specified, then we return an error.
CollectStats(statsCollector ...StatsCollector) Builder[K, V]
}
...
// basic usage example
cache, err := otter.NewBuilder[string, string]().
MaximumSize(10_000).
// enable the stats collector with concurrent counters
CollectStats().
Build()
// with custom collector
prometheusAdapter := ...
cache, err := otter.NewBuilder[string, string]().
MaximumSize(10_000).
CollectStats(prometheusAdapter).
Build() It may be worth allowing multiple collectors to be specified or not requiring the implementation of a I really like the API more obvious to the user. type Builder[K comparable, V any] interface {
CollectStats(statsCollector StatsCollector) Builder[K, V]
}
// usage example
cache, err := otter.NewBuilder[string, string]().
MaximumSize(10_000).
CollectStats(otter.NewStatsCounter()).
Build() CloseIdeally, I would like the experience of using the in-memory on-heap cache to coincide as much as possible with the use of hashmaps, but since otter uses background goroutines, otter has to have the It would be great to get rid of this method, but the options for achieving this are very unpleasant.
So I guess I'll leave ExpirationAlmost all cache libraries in Go allow you to specify TTL for entries, but all have problems:
It is proposed to switch from the term type Builder[K comparable, V any] interface {
ExpireAfterCreate(duration time.Duration) Builder[K, V]
// it is useful if the ttl is specified in the http header
// or to simulate a probabilistic early expiration.
ExpireAfterCreateFunc(f func(key K, value V) time.Duration) Builder[K, V]
// write = create + update
ExpireAfterWrite(duration time.Duration) Builder[K, V]
ExpireAfterWriteFunc(f func(key K, value V) time.Duration) Builder[K, V]
// access = read + write
ExpireAfterAccess(duration time.Duration) Builder[K, V]
ExpireAfterAccessFunc(f func(key K, value V) time.Duration) Builder[K, V]
} The user can select only one of the expiration methods. In the first iteration, it's planned to add expiration only for We can try to do something like this based on chaining. type Builder[K comparable, V any] interface {
ExpireAfter() ExpireAfterBuilder[K, V]
}
type ExpireAfterBuilder[K comparable, V any] interface {
Create(duration time.Duration) Builder[K, V]
CreateFunc(f func(key K, value V) time.Duration) Builder[K, V]
Write(duration time.Duration) Builder[K, V]
WriteFunc(f func(key K, value V) time.Duration) Builder[K, V]
Access(duration time.Duration) Builder[K, V]
AccessFunc(f func(key K, value V) time.Duration) Builder[K, V]
Read(duration time.Duration) Builder[K, V]
ReadFunc(f func(key K, value V) time.Duration) Builder[K, V]
Update(duration time.Duration) Builder[K, V]
UpdateFunc(f func(key K, value V) time.Duration) Builder[K, V]
} But I have a strong desire to force users to pass the function always type Builder[K comparable, V any] interface {
ExpireAfter() ExpireAfterBuilder[K, V]
}
type ExpireAfterBuilder[K comparable, V any] interface {
Create(f func(key K, value V) time.Duration) Builder[K, V]
Write(f func(key K, value V) time.Duration) Builder[K, V]
Access(f func(key K, value V) time.Duration) Builder[K, V]
Read(f func(key K, value V) time.Duration) Builder[K, V]
Update(f func(key K, value V) time.Duration) Builder[K, V]
}
...
// usage example
cache, err := otter.NewBuilder[string, string]().
MaximumWeight(10_000_000).
Weigher(func(key string, value string) uint32 {
return uint32(len(key) + len(value))
}).
CollectStats(prometheusAdapter).
ExpireAfter().Write(func(key string, value string) time.Duration {
return time.Hour
}).
Build() P.S. We can still do something like this. Maybe it's even better than the rest. Anyway, I like it better :). type ExpiryStrategy uint32
const (
ExpireAfterCreate ExpiryStrategy = iota
ExpireAfterWrite
ExpireAfterAccess
ExpireAfterRead
ExpireAfterUpdate
)
type Builder[K comparable, V any] interface {
Expire(strategy ExpiryStrategy, duration time.Duration) Builder[K, V]
ExpireFunc(strategy ExpiryStrategy, f func(key K, value V) time.Duration) Builder[K, V]
}
...
// usage example
cache, err := otter.NewBuilder[string, string]().
MaximumSize(10_000).
Expire(otter.ExpireAfterWrite, time.Hour).
Build() LoaderThe main innovation of v2 should be the ability to specify a loader, which will allow the user to receive an entry from a slow resource only once, and not send several identical requests. This usually greatly improves the tail latency. There are several libraries in Go that allow the user to specify loaders, but only to get one entry. But usually requests are made in batches and the user wants to be able to receive a batch immediately. This is what a typical loader that libraries ask for looks like: func(key K) (V, error) {
} Some notes about the wishlist:
package otter
import "context"
type Loader[K comparable, V any] interface {
Load(ctx context.Context, key K) (V, error)
}
type LoaderFunc[K comparable, V any] func(ctx context.Context, key K) (V, error)
func (lf LoaderFunc[K, V]) Load(ctx context.Context, key K) (V, error) {
return lf(ctx, key)
}
type BulkLoader[K comparable, V any] interface {
BulkLoad(ctx context.Context, keys []K) (map[K]V, error)
}
type BulkLoaderFunc[K comparable, V any] func(ctx context.Context, keys []K) (map[K]V, error)
func (blf BulkLoaderFunc[K, V]) BulkLoad(ctx context.Context, keys []K) (map[K]V, error) {
return blf(ctx, keys)
}
type Cache[K comparable, V any] interface {
Load(ctx context.Context, loader Loader[K, V], key K) (V, error)
BulkLoad(ctx context.Context, bulkLoader BulkLoader[K, V], keys []K) (map[K]V, error)
} It's expected that the loader will be implemented once during cache initialization, but the user still has the ability to change the behavior using closures. P.S. I still decided to abandon the implementation of the loader via // goroutine 1
loader := func(ctx context.Context, key string) (string, error) {
return db.Get(ctx, key)
}
value, err := cache.Load(ctx, loader, key1)
...
// goroutine 2
db.Update(key1)
cache.Delete(key1) It's expected that there will be no entry with key1 in the cache, but there is a high probability that the entry in the cache will remain with a naive implementation using LoggerWhen calling the loader and using other future features, sometimes the user will need to log various errors, so a contract for the logger is needed. It is supposed to use the slog API, but with context. The user can simply ignore the context when implementing the logger. type Logger interface {
Warn(ctx context.Context, msg string, args ...any)
Error(ctx context.Context, msg string, args ...any)
}
type Builder[K comparable, V any] interface {
Logger(logger Logger) Builder[K, V]
} Which logger should I use by default? I suspect that it's better to make at least one based on slog. Extension && MapSometimes it may happen that the user cannot solve his problem using the available methods. In this case,
Also, these structures will allow us to hide a huge set of functions from users, otherwise the cache API will become simply huge. So far, it's supposed to do something like this. type Cache[K comparable, V any] interface {
AsMap() Map[K, V]
Extension() Extension[K, V]
}
type Map[K comparable, V any] interface {
SetIfAbsent(key K, value V) (V, bool)
DeleteFunc(f func(key K, value V) bool)
Range(f func(key K, value V) bool)
}
type Extension[K comparable, V any] interface {
GetEntry(key K) (Entry[K, V], bool)
GetQuietly(key K) (V, bool)
GetEntryQuietly(key K) (Entry[K, V], bool)
Expiry() ExpiryExtension[K, V]
Eviction() EvictionExtension[K, V]
}
// immutable
type Entry[K comparable, V any] interface {
Key() K
Value() V
Weight() uint32
ExpiresAt() int64
ExpiresAfter() time.Duration
HasExpired() bool
}
type ExpiryExtension[K comparable, V any] interface {
// It's useful if the user cannot use the function to dynamically calculate the TTL.
Set(key K, value V, duration time.Duration) (V, bool)
SetIfAbsent(key K, value V, duration time.Duration) (V, bool)
// it should help those who aren't satisfied with the API provided by loader.
SetExpiresAfter(key K, duration time.Duration) bool
}
type EvictionExtension[K comparable, V any] interface {
GetCapacity() uint64
SetCapacity(capacity uint64) bool
} PackagesIf I put some of the code in small (sometimes even too much) packages, it seems that the API can be simplified a bit, but I'm not sure that many people will like it. Instead of package deletion
// Cause the cause why a cached entry was deleted.
type Cause uint8
const (
// Explicit the entry was manually deleted by the user.
Explicit Cause = iota + 1
// Replaced the entry itself was not actually deleted, but its value was replaced by the user.
Replaced
// Size the entry was evicted due to size constraints.
Size
// Expired the entry's expiration timestamp has passed.
Expired
)
func (c Cause) String() string {
switch c {
case Explicit:
return "Explicit"
case Replaced:
return "Replaced"
case Size:
return "Size"
case Expired:
return "Expired"
default:
panic("unknown deletion cause")
}
}
func (c Cause) WasEvicted() bool {
switch c {
case Size, Expired:
return true
default:
return false
}
} Instead of
package stats
import (
"time"
"github.com/maypok86/otter/v2/deletion"
)
type Collector interface {
CollectHits(count int)
CollectMisses(count int)
CollectEviction(weight uint32, cause deletion.Cause)
Snapshot() Stats
}
type LoadCollector interface {
CollectLoadSuccess(loadDuration time.Duration)
CollectLoadFailure(loadDuration time.Duration)
}
type Stats struct {
...
}
type counter struct {
...
}
func NewCounter() Collector {
} Instead of package expiry
type Strategy uint8
const (
AfterCreate Strategy = iota + 1
AfterWrite
AfterAccess
AfterRead
AfterUpdate
) Then we can get a pretty nice API. package main
import (
"context"
"fmt"
"time"
"unsafe"
"github.com/maypok86/otter/v2"
"github.com/maypok86/otter/v2/deletion"
"github.com/maypok86/otter/v2/expiry"
"github.com/maypok86/otter/v2/stats"
)
func main() {
ctx := context.Background()
loader, err := ...
logger, err := ...
cache, err := otter.NewBuilder[int, int]().
MaximumWeight(100 * 1024 * 1024).
Weigher(func(key int, value int) uint32 {
return uint32(unsafe.Sizeof(key) + unsafe.Sizeof(value))
}).
DeletionListener(func(key int, value int, cause deletion.Cause) {
fmt.Println(key, value, cause)
}).
Expire(expiry.AfterAccess, time.Hour).
CollectStats(stats.NewCounter()).
Logger(logger).
Build()
key := 1
value, err := cache.Load(ctx, loader, key)
cache.Extension().Expiry().Set(key, value, time.Minute)
cache.AsMap().SetIfAbsent(key, value)
cache.AsMap().DeleteFunc(func(key int, value int) bool {
return true
})
} Or a simpler example. package main
import (
"time"
"github.com/maypok86/otter/v2"
"github.com/maypok86/otter/v2/expiry"
"github.com/maypok86/otter/v2/stats"
)
func main() {
cache, err := otter.NewBuilder[int, int]().
MaximumSize(10_000).
Expire(expiry.AfterAccess, time.Hour).
CollectStats(stats.NewCounter()).
Build()
key := 1
value := 2
cache.Set(key, value)
value, ok := cache.Get(key)
cache.AsMap().DeleteFunc(func(key int, value int) bool {
return true
})
} It seems that the only question is the increased number of imports. |
The most embarrassing dilemmas for me are
P.S. Refreshing will most likely go to the next version after v2. I'll post the sketches later, but I will need comments from Ben (there may be problems with implementation and use cases) and someone from the Go world, because there is a chance to touch on ideological things in the API. |
Sorry I can't give much advise beyond saying your general direction looks good. I always review the Bumper-Sticker API Design rules before cementing a public api, hoping that I got more right than wrong. Writing unit tests helps me use them and feel their pain, though sometimes just wanting to get it done has caused a few small mistakes where I should have known better.
|
Builder vs Functional options: Indeed generics makes that ugly. At first I got caught in the same mental trap you are in, because you are used to seeing it designed in a certain way (and many other people in fact). But that new form is only required "if and only if" you try to modify the generic type. You only need to add a level of abstraction to get back to clean options with almost zero cost. So, instead of modifying your generic Here is an example: https://go.dev/play/p/BOAX-1a9rih For the user, that should come out cleaner than the Builder method. |
About refreshing and caching asynchronous computations.Let's start with the fundamental problem first. Future/Promise in GoThey simply don't exist in Go and any attempts by people to write them based on channels are usually pelted with tomatoes. The reason is that channels are already the For example, Okay, then why not just keep the channels in the cache for this? Because you can get data from the channel only once. After that, reads from the channel will be blocked or return a zero value if the channel is closed. That is, there is simply no point in caching channels and some kind of wrapper is needed. Then what should the asynchronous computing type have? type Result[V any] struct {
isDone atomic.Bool
done chan struct{}
cancel context.CancelFunc
value V
err error
}
func NewResult[K comparable, V any](ctx context.Context, key K, call func(ctx context.Context, key K) (V, error)) *Result[V] {
ctx, cancel := context.WithCancel(ctx)
r := &Result[V]{
done: make(chan struct{}, 1),
cancel: cancel,
}
go func() {
r.value, r.err = call(ctx, key)
if err := ctx.Err(); err != nil && r.err == nil {
r.err = err
}
r.done <- struct{}{}
close(r.done)
r.isDone.Store(true)
}()
return r
}
// Done is needed so that Result can be used in select.
func (r *Result[V]) Done() <-chan struct{} {
return r.done
}
// Get should be blocked until the computation of the results is complete.
func (r *Result[V]) Get() (V, error) {
if r.isDone.Load() {
return r.value, r.err
}
<-r.done
return r.value, r.err
}
func (r *Result[V]) Value() V {
value, _ := r.Get()
return value
}
func (r *Result[V]) Error() error {
_, err := r.Get()
return err
}
// Cancel is needed so that the user can cancel the computation.
func (r *Result[V]) Cancel() {
r.cancel()
} As a result, we get quite a pleasant type. And now about what I don't like. Thanks to the asynchronous computing type, we can ask the user to implement a custom type AsyncReloader[K comparable, V any] interface {
AsyncReload(ctx context.Context, key K) (*Result[V], error)
}
type Reloader[K comparable, V any] interface {
Reload(ctx context.Context, key K) (V, error)
} Is the Also, as Ben pointed out, bulkLoader can be implemented based on But I have big questions about such a cache in Go.
I actually don't like what caffeine does with
RefreshingIn fact, Ben has already written almost everything about it. I was only interested in options other than P.S. It seems that I came up with a linearizable implementation with |
Yes, but unfortunately, options must also be with generics. type options[K comparable, V any] struct {
maximumSize uint64
maximumWeight uint64
weigher func(key K, value V) uint32
deletionListener func(key K, value V, cause DeletionCause)
...
}
Because type StatsCollector interface {
CollectHits(count int)
CollectMisses(count int)
CollectEviction(weight uint32, cause DeletionCause)
Snapshot() Stats
} |
You can still do it without instantiating options as generic, but it does mean you have to use Basically, deletionListener becomes |
Caffeine calls
In a refresh scenario on a synchronous cache, the future value is held in a secondary map while in-flight and upon completion it updates the cache if still valid. In an AsyncCache it holds the future as the mapped value since the returned entry might be canceled, used in an atomic asMap operator (where equality is by identity), generating a new future to wrap the completed value per read adds GC overhead, and likely many other quirks to juggle. I suspect most cases of AsyncCache are fairly small and larger caches use the synchronous one simply by nature of their relative use-cases, and that the synchronous Cache is much more common. ConcurrentHashMap and Guava's Cache both manage the underlying hash table and use an intermediate entry type while the load is in-flight (
The author of ConcurrentHashMap had ideas for transitioning to per-item computes to avoid this bucket issue, but I don't think anything has or will be done there. I think because the number of buckets increases with capacity so it is easy to reduce the chance of collisions, it could be complex rework, he is a busy department chair (and perhaps nearing retirement), and most users who have that problem seem fine with losing linearization to instead load within a future value. That's all to say that I get the benefit of blaming the JDK so users don't expect me to deal with it myself.
Oh yeah! That was by a user request who wanted it in their metrics (for whatever reason). But they could also simulate it by the removal listener so it was just a nice to have refinement (deprecated and removed the older variants). Originally it was merely |
Okay, I need to try to implement If anyone is interested, then I really don't want to expose So far, I've started with edits to the statistics collection and it would be interesting to hear someone's opinion. I would like to make collecting most of the statistics optional, that is, the builder will accept only a minimal type Builder[K comparable, V any] interface {
CollectStats(statsCollector Collector) Builder[K, V]
}
type Collector interface {
CollectHits(count int)
CollectMisses(count int)
Snapshot() Stats
}
type EvictionCollector interface {
CollectEviction(weight uint32)
}
type RejectedSetsCollector interface {
CollectRejectedSets(count int)
}
type LoadCollector interface {
CollectLoadSuccess(loadTime time.Duration)
CollectLoadFailure(loadTime time.Duration)
}
func (c *Cache[K, V]) Stats() Stats {
return c.collector.Snapshot()
} Here's what confuses me. Does the cache need the I ask this because it is very likely that the user will not like most of the data in P.S. So far, I've decided to use the builder. I don't want to lose type safety just for the sake of functional options. |
fwiw, my approach was to capture stats that only the cache knows or are universally desirable. Then defer to the user to add their own instrumentation, e.g. puts or explicit removes. The few stats means less cpu/memory overhead and bike shedding, and a few disappointed users who dislike writing a few lines of extra code. Guava's I don't know what the "default implementation" would do if there was no way to capture those metrics from |
I rather meant that the default implementation ( import (
"github.com/maypok86/otter/v2/stats"
"github.com/maypok86/otter/v2"
)
type AwesomeCache struct {
cache *otter.Cache[string, string]
statsCounter *stats.Counter
}
func NewAwesomeCache(maximumSize uint64, ttl time.Duration) (*Cache[string, string], error) {
statsCounter := stats.NewCounter()
cache, err := otter.NewBuilder[string, string]().
MaximumSize(maximumSize).
ExpireAfterWrite(ttl).
CollectStats(statsCounter).
Build()
if err != nil {
return nil, err
}
return &AwesomeCache{
cache: cache,
statsCounter: statsCounter,
}, nil
}
func (ac *AwesomeCache) Get(key string) (string, bool) {
return ac.cache.Get(key)
}
func (ac *AwesomeCache) Stats() stats.Stats {
return ac.statsCounter.Snapshot()
} |
This should cause less confusion because 'cost' is often used in the context of latency/miss penalty when fetching an item. See #74 (comment) for more details.
@maypok86 Thank you for your efforts. Overall looks good. I’m not english native speaker, so maybe i am wrong. Does Current |
I couldn't think of a better name. Perhaps
When some metrics are changed. Maybe an example of calls to func (c *Cache[K, V]) Get(key K) (V, bool) {
value, ok := c.hashmap.Get(key)
if !ok {
c.statsCollector.CollectMisses(1)
return zeroValue[V](), false
}
c.statsCollector.CollectHits(1)
return value, true
}
I'm not a native speaker either, so I don't know :). I don't really like Ideally, the causes should be associated with a cache entry ( I'm not sure it's worth finding fault with very much because there are quite a lot of controversial places in the deletion API.
type Builder[K comparable, V any] interface {
DeletionHandler(deletionHandler func(e DeletionEvent[K, V])) Builder[K, V]
}
type DeletionEvent[K comparable, V any] struct {
Key K
Value V
Cause DeletionCause
}
func (e DeletionEvent[K, V]) WasEvicted() bool {
return e.Cause.WasEvicted()
}
type Builder[K comparable, V any] interface {
AtomicDeletionHandler(deletionHandler func(e DeletionEvent[K, V])) Builder[K, V]
}
// or change the names and do something like that
type Builder[K comparable, V any] interface {
OnDeletion(deletionHandler func(e DeletionEvent[K, V])) Builder[K, V]
OnAtomicDeletion(deletionHandler func(e DeletionEvent[K, V])) Builder[K, V]
} |
fwiw, here's the historical evolution,
This is what we did in Guava's Cache, where it was the
Caffeine's AsyncCache made this tricky because it could be an explicit removal of an in-flight future value. It doesn't make sense to block to wait for the future to complete nor is it desirable to disallow that listener combination. Since an in-flight future is not evictable, I narrowed the atomic case down to
An alternative name might be
|
This should cause less confusion because 'cost' is often used in the context of latency/miss penalty when fetching an item. See #74 (comment) for more details.
Just random thoughts about new contracts...
The text was updated successfully, but these errors were encountered: