-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Update metaphysics to not grab total counts but to grab x + 1 items #866
Comments
I like the idea in general, but also very much in the case of our infrastructure where counts seem inherently untrustworthy, e.g. a count returned from ES can still include works that are since unpublished and will get filtered by Gravity before sending to the client. I don’t particularly like that we have APIs that return inconsistent data, but if we can avoid it all together then that definitely has my preference 👍 I guess the one question one could ask is if we’d prefer to avoid the count issue by overfetching vs somehow ensuring consistent counts /cc @joeyAghion |
On the one hand, we really should just be able to return consistent counts. In the specific case of ES filtering out works that are unpublished after they're returned, ES itself should already be doing that, and the counts should be correct. I believe the guard was put in place in the event of an ES delay in indexing (which can definitely happen, the cluster is slow/down, there's a backup in the delayed queue, etc.). However, that was added prior to any sort of caching to the ES-backed endpoints, which all now have configurable (50 minute) caches. This should pretty much alleviate that likelihood of an inconsistent count. The original intent would have been to unpublish a work, and be able to refresh an artist filter (for example) and see that work disappear right away. We just don't support that degree of real-timeness anymore, we've definitely set out expectations that unpublishing a work will result in a small delay from it disappearing off the site. So we might even remove that guard, and rely on ES being updated in a reasonable timeline (and always being consistent itself). But, I see the need for a pattern we might follow in dealing with inconsistent counts, if those are a fact of life. How about something like: https://github.com/artsy/reaction/blob/1eb550f09786197cff90b250800b8964d5422ba9/src/Components/Gene/GeneArtworksContent.tsx#L27-L35 This checks if we get back less than what we asked for, while still having a I think this logic can go upstream, and Metaphysics can know to return a blank cursor, and set |
Interesting. I agree that the pattern implemented in the end client at least makes a lot of sense and the one additional spinner the user might get to see isn’t that bad, but I’m not yet sure I understand how we’d have MP do that determination consistently. MP wouldn’t know to return Seeing as we apperantly could be getting closer to correct counts, maybe we should just keep doing that with proper checks in the end client? |
I don't think so. Basically all our backing APIs use Does that make sense? I think that's doable and consistent in MP, but it could use a POC perhaps. |
Oh I see, yeah yeah I think you’re on the right track. Let’s give this a try And if this pans out it’s time to refactor and have our own little connection definition lib that does this and other things we commonly do. |
This is correct. For example, if the total count is 100 and the page size is 20 (== "the actual count is exactly dividable by the page size"), the last page would have 20 results and incorrectly returns |
I just chatted with @mzikherman - I forgot to note in the OP that a key point of this is to avoid requesting the total count. 👍 |
And what’s the main reason(s) to avoid requesting a total count? |
But yeah, reading my own comment with a fresh mind, I come back to my original comment and agree with @yuki24 that this couldn’t work if the actual available amount is exactly dividable by page size. |
Ah whoops! I didn't understand the initial premise here- that we dont want to be requesting total counts all the time, and this is an alternative. My suggestion was purely as a backup to also having counts- it would simply work to stop paginating early if there's an inconsistency. But it absolutely still relies on having the total count. In the example @yuki24 posed, if we werent fetching total counts, this scheme would yield one extra empty pagination call. I was thinking we would use total counts (which in this example would yield @alloy the reason for not requesting counts is: its not 'free' to request them (more work by the database), and we wind up requesting it on every page. I think it's probably a micro-optimization to not request them, but for purposes of this discussion let's assume we want to avoid requesting them if not needed. In that case I'm totally down with @yuki24 suggestion (always request 1 more than what was asked, use that to determine |
The main reason I asked for a motivation is that, assuming it’s indeed meant as optimization, I for one don’t know if it’s more costly to return a count or an extra record of data. |
This was an idea @yuki24 mentioned in our graphql lunch & learn.
Right now we ask for total counts for pagination so that we can know about before or afters:
here
Perhaps we can move this to instead get x + 1 ( or x -1 ) to handle knowing if there's another page?
The text was updated successfully, but these errors were encountered: