Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache strategy to reduce calls to the people API #447

Open
vincent-zurczak opened this issue May 3, 2021 · 8 comments
Open

Cache strategy to reduce calls to the people API #447

vincent-zurczak opened this issue May 3, 2021 · 8 comments
Labels
bug Something isn't working
Milestone

Comments

@vincent-zurczak
Copy link

With Inbox, all the email authors are searched through the people API.
It occurs when the page opens up or when we scroll and retrieve older emails. This API is used to find people names and avatars.

It is not clear at the moment if the people API will remain and if there will be common services. But the short-term deadlines require to improve the current implementation, and using the browser's local storage to store information would save some requests. When there are thousands of users, saving invocations makes a change.

The idea is that inbox should save a map with user names and their avatar's address. If the author of an email is not found locally, then it will be asked to the people API and added into the cache. Even users that are not known by the people API (i.e. they are not part of the domain, work for another company, ...) should be added in the local cache.

Cache entries should have a time-to-live (3 days seems reasonable).

@chamerling
Copy link
Contributor

Some comments:

  • Better than caching just user name and avatar URL, just cache the Person object
  • The call is probably made by the email item display in the email list, and so people API calls are made again and again when loading the mailbox and when the user scrolls down
  • There are probably other calls when opening an email

The ESN already provides a Cache service, but it caches objects in memory with a TTL. When the item is not available in cache or when cache expires, it will call a function which resolve with a resource (the person in our case).

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

There are pretty good libraries which abstracts storage and support queries, one of them is dexie (IndexedDB wrapper, I already used it in a side project, pretty cool to use). You can use whatever you want, but please do not write your own serializer (ie do not JSON.stringify stuff yourself, this is time consuming and useless), use a library which does things for you.

@chibenwa
Copy link
Member

chibenwa commented May 4, 2021

Please specify the calls that we want to cache...

  • GET -> A caching strategy can likely be very effective as there will be a lot of commonly shared entities resulting in a lot of hits, and a lot of mongo reads avoided.

  • I am more circonspect regarding SEARCH. SEARCH patterns have a higher variability, overall reducing chances of a HIT (if we base ourselves on the searched string as a cache entry). Is auto-complete in the scope @vincent-zurczak ?

Anyway we should plan a way to review the usefulness of such a cache, and closely monitor hit rates. I imagine we need applicative metrics to monitor this...

Also, a cache partially solves the problem upon search. Suppose that I search bt that auto-completes in btellier and btallendier. Now imagine that only btellier is cached, and not btallendier. Reads to the People API will be needed to get the missing search results. The only thing that can be done is optimistically return the hits we know for sure first. Which is not possible with an HTTP transport mechanism.

To be fair, a good idea would be streaming the results (eg over webSocket). Each sources of the people API would stream its search results at its own pace, and we would only need to add a search result provider for the PEOPLE CACHE. That way, we would no longer be impacted by the slowest result

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

👍

@Arsnael
Copy link
Member

Arsnael commented May 4, 2021

@chibenwa : I think initially @vincent-zurczak seemed quite clear on the scope of the issue. By search, he meant the retrieve of people's profiles to match for example the author email with his name and avatar that you can see when in your message list in inbox, this part:
screenshot-openpaas linagora com-2021 05 04-10_18_26

He is not talking about the search search function :)

Which means we have the mail author, we know what user it is, it should be easy to cache and retrieve instead of constantly calling the API, which is what Inbox seems to do for now.

@chibenwa
Copy link
Member

chibenwa commented May 4, 2021

True but I remember the audit pin-pointed also the people search API

@Arsnael
Copy link
Member

Arsnael commented May 4, 2021

True but I remember the audit pin-pointed also the people search API

That should be an other issue I believe then, as it is much more complex as you explained above

@vincent-zurczak
Copy link
Author

There are several actions undertaken to improve the people API.
We even discussed its removal or replacement. Fore sure, a streaming approach would be better and a final target. But the deadline is not so far and there are lots of things to do. The biggest issue is who is responsible for this API? Inbox? Calendar?

Right now, it is used by both.
And LinShare and Twake could also be users of this API. Streaming goes along with a (server-side) cache approach, because even with streaming, we would not handle the load IMO. That would be for feature team stamped "common services".

Like @Arsnael said, this issue is about improving user information display (users are retrieved one by one), not searching users by general criteria. We are not redesigning this API at this stage. :)

@chamerling
Copy link
Contributor

OK, we do not speak about search here, we speak about the People resolve endpoint (https://host:port/api/people/resolve/emailaddress/[email protected]) which is called for each element of the email list.
Such call can be easily cached, because we do not care if the user changed its avatar and if we update its avatar in the email list some hours/days after.

@chibenwa
Copy link
Member

chibenwa commented May 4, 2021

Then I completly buy

I suggest that the ESN provides a new Cache storage based on some browser storage, which means that we can keep the same Cache API but choose to use the in memory one, or the storage based one.

There are pretty good libraries which abstracts storage and support queries, one of them is dexie (IndexedDB wrapper, I already used it in a side project, pretty cool to use). You can use whatever you want, but please do not write your own serializer (ie do not JSON.stringify stuff yourself, this is time consuming and useless), use a library which does things for you.

Sorry for mixing topics @chamerling , I only saw the other ticket after...

@fabienmoyon fabienmoyon added the bug Something isn't working label May 31, 2021
@fabienmoyon fabienmoyon added this to the Polish sprint milestone May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants