Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add refresh after(Expiry) #360

Closed
anuraaga opened this issue Oct 31, 2019 · 9 comments
Closed

Add refresh after(Expiry) #360

anuraaga opened this issue Oct 31, 2019 · 9 comments

Comments

@anuraaga
Copy link

Currently it's possible to set a value-based exipry for expireAfter but refresh only supports a cache-wide constant from what I can tell. Would it be possible to support value-based refresh too?

@ben-manes
Copy link
Owner

I think the next big sprint will be on improving refresh, such as for this capability (#272 asks for this as well). However I can't offer a time-frame as this project is unrelated to my work, and I tend to have bursts of energy when frustrated at not being able to code for too long.

Currently refreshAfterWrite piggy-backs on expireAfterWrite timestamp. Some juggling in the generated Node classes is used to map to the minimal fields, as we want to minimize both jar and runtime bloat. Ideally expireAfterAccess and expireAfterWrite would be implemented natively via expireAfter(expiry), but they predate that feature. Guava allowed both to be enabled at once, which we replicate, and I believe blocks us from backing it by expireAfter. Either we determine it is safe or break compatibility, or a new field is required. If the latter, then we'd still want to juggle to only do so when absolutely necessary.

The minor left-over is if we can bikeshed a name for this configuration class, as we probably don't want to overload Expiry though it might be identical.

@anuraaga
Copy link
Author

Thanks for the detailed explanation and sorry for not finding that dupe. Will go ahead and close this issue since it's already there. No worries about the timeframe :)

@ben-manes
Copy link
Owner

As a reminder to myself,

I think the answer to the above question of codegen is to simply disallow refreshAfter + expireAfterWrite + expireAfterAccess. We generate up to 2 timestamps and should not introduce a 3rd. If we disallow that unusual combination for this new feature then it won't negatively impact anyone, avoids runtime / jar bloat, and is backwards compatible.

@ben-manes
Copy link
Owner

Consolidating into #504

@krishna81m
Copy link

krishna81m commented Aug 12, 2021

@ben-manes, we were considering migrating from Guava to spread eviction based refresh times by adding random buffer to TTL per entry in expireAfterCreate(), however we cache very large entries that can take seconds to load.

We can use caffeines refreshAfterWrite() every few minutes to continue access to old data while fetching new ones. If many records are refreshed at the same time as most are cached on start, can refreshAfterWrite() be throttled to control number of refreshes at a time async without impacting other refreshes initiated when their entry is queried like a separate fixed pool executor? Otherwise, we need to spread the refresh times per entry like the OP is asking to avoid contention while fetching. Also, would refresh/expiration times be reset if we throw an exception to keep the old value.

If we use staggered TTLs, is there a way to access expired key and "old value" to return the old value if loader refresh fails? Can create a separate thread for this.

@ben-manes
Copy link
Owner

A refresh invokes CacheLoader#asyncReload(key, oldValue, executor), which delegates to CacheLoader#reload(key, oldValue), which delegates to CacheLoader#load(key). You can override at any step to take further control, such as using a different executor for the asynchronous load.

Some users have implemented coalescing cache loaders (see contrib, #537). The intent is to gather multiple independent loads over a short time period (maybe up to a batch size) and then perform a bulk load. Most often only reloads make sense for coalescing rather than load-on-miss to avoid impacting user-facing latencies.

A reload triggered by refreshAfterWrite is done by inspecting the write time and acquiring a soft lock. If the reload throws an exception and fails then that timestamp is not updated. If the oldValue is returned then the timestamp is updated.

Depending on your needs, a victim cache may be useful. An expired entry can be moved into that and a cache loader might resurrect from it. That may not always be elegant, but it's a simple workaround when useful.

@krishna81m
Copy link

Thanks @ben-manes , let me look into overloading these methods. Will try passing a custom executor but some refreshes may be dropped if it reaches the executor buffer limit and time reset as number of entries to be refreshed could be huge. And this could repeat impacting some entries. Coalescing Cache Loader looks interesting but complicated for our case, we just need to refresh one at a time.

Is it possible to get a hook into refreshAfterWrite() inspecting write time to spread the refreshes then or insert with a custom writeTime?

Unfortunately victim cache may not work for us as our data will soon go beyond heap size and will need space based LRU eviction as well.

One work around until refresh supports value based refresh we intended to do was check writeTime() on read and refresh a Set ourselves in a separate Executor to stagger and throttle number of refreshes before expiry and expireAfterWrite() handles the rarely used ones in the worst case.

@ben-manes
Copy link
Owner

There isn't any other hook during refreshIfNeeded to coerce the writeTime.

You can inspect the metadata in Policy.FixedRefresh#FixedRefresh(key), but not modify it. If you manage refresh externally, I suppose that might be helpful (or else track the timestamp yourself).

I'm not sure when per-entry refresh support will be added. A hang-up is what that interface name and definition should be. Otherwise it's a lot of grunt work to get the entry codegen setup, integrate, write unit tests, etc.

@krishna81m
Copy link

krishna81m commented Aug 16, 2021

Thanks @ben-manes, our POC to refresh ourselves (discussed above) based on key seems to be working. It would have been nicer if we piggy backed a CaffeineCache method to force refresh() automatically instead of a separate get and put.

Will replace this with per-entry refresh afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants