Cache strategy to reduce calls to the people API #447

vincent-zurczak · 2021-05-03T14:15:43Z

With Inbox, all the email authors are searched through the people API.
It occurs when the page opens up or when we scroll and retrieve older emails. This API is used to find people names and avatars.

It is not clear at the moment if the people API will remain and if there will be common services. But the short-term deadlines require to improve the current implementation, and using the browser's local storage to store information would save some requests. When there are thousands of users, saving invocations makes a change.

The idea is that inbox should save a map with user names and their avatar's address. If the author of an email is not found locally, then it will be asked to the people API and added into the cache. Even users that are not known by the people API (i.e. they are not part of the domain, work for another company, ...) should be added in the local cache.

Cache entries should have a time-to-live (3 days seems reasonable).

chamerling · 2021-05-03T18:17:47Z

Some comments:

Better than caching just user name and avatar URL, just cache the Person object
The call is probably made by the email item display in the email list, and so people API calls are made again and again when loading the mailbox and when the user scrolls down
There are probably other calls when opening an email

The ESN already provides a Cache service, but it caches objects in memory with a TTL. When the item is not available in cache or when cache expires, it will call a function which resolve with a resource (the person in our case).

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

There are pretty good libraries which abstracts storage and support queries, one of them is dexie (IndexedDB wrapper, I already used it in a side project, pretty cool to use). You can use whatever you want, but please do not write your own serializer (ie do not JSON.stringify stuff yourself, this is time consuming and useless), use a library which does things for you.

chibenwa · 2021-05-04T01:34:18Z

Please specify the calls that we want to cache...

GET -> A caching strategy can likely be very effective as there will be a lot of commonly shared entities resulting in a lot of hits, and a lot of mongo reads avoided.
I am more circonspect regarding SEARCH. SEARCH patterns have a higher variability, overall reducing chances of a HIT (if we base ourselves on the searched string as a cache entry). Is auto-complete in the scope @vincent-zurczak ?

Anyway we should plan a way to review the usefulness of such a cache, and closely monitor hit rates. I imagine we need applicative metrics to monitor this...

Also, a cache partially solves the problem upon search. Suppose that I search bt that auto-completes in btellier and btallendier. Now imagine that only btellier is cached, and not btallendier. Reads to the People API will be needed to get the missing search results. The only thing that can be done is optimistically return the hits we know for sure first. Which is not possible with an HTTP transport mechanism.

To be fair, a good idea would be streaming the results (eg over webSocket). Each sources of the people API would stream its search results at its own pace, and we would only need to add a search result provider for the PEOPLE CACHE. That way, we would no longer be impacted by the slowest result

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

👍

Arsnael · 2021-05-04T03:20:06Z

@chibenwa : I think initially @vincent-zurczak seemed quite clear on the scope of the issue. By search, he meant the retrieve of people's profiles to match for example the author email with his name and avatar that you can see when in your message list in inbox, this part:

He is not talking about the search search function :)

Which means we have the mail author, we know what user it is, it should be easy to cache and retrieve instead of constantly calling the API, which is what Inbox seems to do for now.

chibenwa · 2021-05-04T03:35:53Z

True but I remember the audit pin-pointed also the people search API

Arsnael · 2021-05-04T04:01:24Z

True but I remember the audit pin-pointed also the people search API

That should be an other issue I believe then, as it is much more complex as you explained above

vincent-zurczak · 2021-05-04T07:13:43Z

There are several actions undertaken to improve the people API.
We even discussed its removal or replacement. Fore sure, a streaming approach would be better and a final target. But the deadline is not so far and there are lots of things to do. The biggest issue is who is responsible for this API? Inbox? Calendar?

Right now, it is used by both.
And LinShare and Twake could also be users of this API. Streaming goes along with a (server-side) cache approach, because even with streaming, we would not handle the load IMO. That would be for feature team stamped "common services".

Like @Arsnael said, this issue is about improving user information display (users are retrieved one by one), not searching users by general criteria. We are not redesigning this API at this stage. :)

chamerling · 2021-05-04T07:15:59Z

OK, we do not speak about search here, we speak about the People resolve endpoint (https://host:port/api/people/resolve/emailaddress/[email protected]) which is called for each element of the email list.
Such call can be easily cached, because we do not care if the user changed its avatar and if we update its avatar in the email list some hours/days after.

chibenwa · 2021-05-04T08:37:30Z

Then I completly buy

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

There are pretty good libraries which abstracts storage and support queries, one of them is dexie (IndexedDB wrapper, I already used it in a side project, pretty cool to use). You can use whatever you want, but please do not write your own serializer (ie do not JSON.stringify stuff yourself, this is time consuming and useless), use a library which does things for you.

Sorry for mixing topics @chamerling , I only saw the other ticket after...

fabienmoyon added the bug Something isn't working label May 31, 2021

fabienmoyon added this to the Polish sprint milestone May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache strategy to reduce calls to the people API #447

Cache strategy to reduce calls to the people API #447

vincent-zurczak commented May 3, 2021

chamerling commented May 3, 2021

chibenwa commented May 4, 2021

Arsnael commented May 4, 2021 •

edited

Loading

chibenwa commented May 4, 2021

Arsnael commented May 4, 2021

vincent-zurczak commented May 4, 2021

chamerling commented May 4, 2021

chibenwa commented May 4, 2021

Cache strategy to reduce calls to the people API #447

Cache strategy to reduce calls to the people API #447

Comments

vincent-zurczak commented May 3, 2021

chamerling commented May 3, 2021

chibenwa commented May 4, 2021

Arsnael commented May 4, 2021 • edited Loading

chibenwa commented May 4, 2021

Arsnael commented May 4, 2021

vincent-zurczak commented May 4, 2021

chamerling commented May 4, 2021

chibenwa commented May 4, 2021

Arsnael commented May 4, 2021 •

edited

Loading