RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866

orta · 2017-12-19T18:57:52Z

This was an idea @yuki24 mentioned in our graphql lunch & learn.

Right now we ask for total counts for pagination so that we can know about before or afters:

export const parseRelayOptions = options => {
  const { limit: size, offset } = getPagingParameters(options)
  const page = (size + offset) / size
  const gravityArgs = omit(options, ["first", "after", "last", "before"])
  return Object.assign({}, { page, size, offset }, gravityArgs)
}

Perhaps we can move this to instead get x + 1 ( or x -1 ) to handle knowing if there's another page?

alloy · 2017-12-19T19:35:33Z

I like the idea in general, but also very much in the case of our infrastructure where counts seem inherently untrustworthy, e.g. a count returned from ES can still include works that are since unpublished and will get filtered by Gravity before sending to the client. I don’t particularly like that we have APIs that return inconsistent data, but if we can avoid it all together then that definitely has my preference 👍

I guess the one question one could ask is if we’d prefer to avoid the count issue by overfetching vs somehow ensuring consistent counts /cc @joeyAghion

mzikherman · 2017-12-20T01:39:31Z

On the one hand, we really should just be able to return consistent counts. In the specific case of ES filtering out works that are unpublished after they're returned, ES itself should already be doing that, and the counts should be correct. I believe the guard was put in place in the event of an ES delay in indexing (which can definitely happen, the cluster is slow/down, there's a backup in the delayed queue, etc.).

However, that was added prior to any sort of caching to the ES-backed endpoints, which all now have configurable (50 minute) caches. This should pretty much alleviate that likelihood of an inconsistent count. The original intent would have been to unpublish a work, and be able to refresh an artist filter (for example) and see that work disappear right away. We just don't support that degree of real-timeness anymore, we've definitely set out expectations that unpublishing a work will result in a small delay from it disappearing off the site. So we might even remove that guard, and rely on ES being updated in a reasonable timeline (and always being consistent itself).

But, I see the need for a pattern we might follow in dealing with inconsistent counts, if those are a fact of life.

How about something like: https://github.com/artsy/reaction/blob/1eb550f09786197cff90b250800b8964d5422ba9/src/Components/Gene/GeneArtworksContent.tsx#L27-L35

This checks if we get back less than what we asked for, while still having a hasNextPage: true (based on counts). If so, we stop pagination.

I think this logic can go upstream, and Metaphysics can know to return a blank cursor, and set hasNextPage: false. This would conform to the GraphQL connection spec, and should totally work fine with Relay clients too.

alloy · 2017-12-20T07:11:57Z

Interesting. I agree that the pattern implemented in the end client at least makes a lot of sense and the one additional spinner the user might get to see isn’t that bad, but I’m not yet sure I understand how we’d have MP do that determination consistently. MP wouldn’t know to return hasNextPage: false If the actual count is exactly dividable by the page size until it would try to fetch the next page, meaning we’d still need to also have this check in the end-client, right?

Seeing as we apperantly could be getting closer to correct counts, maybe we should just keep doing that with proper checks in the end client?

mzikherman · 2017-12-20T15:07:15Z

MP wouldn’t know to return hasNextPage: false If the actual count is exactly dividable by the page size until it would try to fetch the next page, meaning we’d still need to also have this check in the end-client, right?

I don't think so. Basically all our backing APIs use page/size or size/offset, which are integers. So, barring gaps in the data or other anomalies, if there are more pages left, you should always get back exactly what size you requested. Since MP knows what the size it sent over, it can just check the length of the data returned. If it's less than the requested size, it can just set hasNextPage: false and null the cursor.

Does that make sense? I think that's doable and consistent in MP, but it could use a POC perhaps.

alloy · 2017-12-20T16:21:15Z

Oh I see, yeah yeah I think you’re on the right track. Let’s give this a try And if this pans out it’s time to refactor and have our own little connection definition lib that does this and other things we commonly do.

yuki24 · 2017-12-20T20:01:02Z

MP wouldn’t know to return hasNextPage: false If the actual count is exactly dividable by the page size until it would try to fetch the next page, meaning we’d still need to also have this check in the end-client, right?

This is correct. For example, if the total count is 100 and the page size is 20 (== "the actual count is exactly dividable by the page size"), the last page would have 20 results and incorrectly returns hasNextPage: true, which is why the +1 addition is important. See this test in kaminari.

orta · 2017-12-21T16:47:44Z

I just chatted with @mzikherman - I forgot to note in the OP that a key point of this is to avoid requesting the total count. 👍

alloy · 2017-12-21T16:50:57Z

And what’s the main reason(s) to avoid requesting a total count?

alloy · 2017-12-21T16:52:13Z

But yeah, reading my own comment with a fresh mind, I come back to my original comment and agree with @yuki24 that this couldn’t work if the actual available amount is exactly dividable by page size.

mzikherman · 2017-12-22T15:44:04Z

Ah whoops! I didn't understand the initial premise here- that we dont want to be requesting total counts all the time, and this is an alternative.

My suggestion was purely as a backup to also having counts- it would simply work to stop paginating early if there's an inconsistency. But it absolutely still relies on having the total count. In the example @yuki24 posed, if we werent fetching total counts, this scheme would yield one extra empty pagination call. I was thinking we would use total counts (which in this example would yield hasNextPage: false), and only in the case where the total count implies hasNextPage: true would we check our response.

@alloy the reason for not requesting counts is: its not 'free' to request them (more work by the database), and we wind up requesting it on every page. I think it's probably a micro-optimization to not request them, but for purposes of this discussion let's assume we want to avoid requesting them if not needed.

In that case I'm totally down with @yuki24 suggestion (always request 1 more than what was asked, use that to determine hasNextPage, and make sure to drop that extra element when resolving).

alloy · 2017-12-22T19:10:52Z

The main reason I asked for a motivation is that, assuming it’s indeed meant as optimization, I for one don’t know if it’s more costly to return a count or an extra record of data.

orta assigned alloy, yuki24 and mzikherman Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866

RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866

orta commented Dec 19, 2017 •

edited

Loading

alloy commented Dec 19, 2017

mzikherman commented Dec 20, 2017

alloy commented Dec 20, 2017

mzikherman commented Dec 20, 2017

alloy commented Dec 20, 2017

yuki24 commented Dec 20, 2017 •

edited

Loading

orta commented Dec 21, 2017

alloy commented Dec 21, 2017

alloy commented Dec 21, 2017

mzikherman commented Dec 22, 2017

alloy commented Dec 22, 2017

RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866

RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866

Comments

orta commented Dec 19, 2017 • edited Loading

alloy commented Dec 19, 2017

mzikherman commented Dec 20, 2017

alloy commented Dec 20, 2017

mzikherman commented Dec 20, 2017

alloy commented Dec 20, 2017

yuki24 commented Dec 20, 2017 • edited Loading

orta commented Dec 21, 2017

alloy commented Dec 21, 2017

alloy commented Dec 21, 2017

mzikherman commented Dec 22, 2017

alloy commented Dec 22, 2017

orta commented Dec 19, 2017 •

edited

Loading

yuki24 commented Dec 20, 2017 •

edited

Loading