-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we implement subsetting in gRPC? #6370
Comments
We recently added a configuration knob to Please see: https://github.com/grpc/proposal/blob/master/A62-pick-first.md |
We've had a bunch of internal discussions about subsetting. At the moment, we're using an approach that involves doing the subsetting on the management server (i.e., sending a different EDS resource to each client). That does cause some problems for xDS resource cacheability, but since we don't (yet?) have any case where we need to cache the EDS resources, it's a workable approach. In the past, we have talked about building an LB policy to do subsetting on the client side, but we haven't pursued it. The thing that generally seems hard about subsetting is making it stable as the set of endpoints changes (e.g., when k8s pods are created or destroyed via auto-scaling), so that you don't cause a bunch of unnecessary connection churn on the clients. In the existing xDS ecosystem, that tends to be a bit harder on the client side, since there is no unique identifier for each endpoint instance that you can use in the subsetting algorithm to avoid this kind of churn. |
This won't help as we already have randomized DNS responses per client. Still with pick_first some of our users reported big imbalance on the server side, especially during server rollouts. I guess we can make it better by setting low MaxConnectionAge, but this generates unnecessary connection churn and doesn't fix the issue entirely. |
Thanks @markdroth this make sense. I, however, still don't fully understand this part However, something like twitter aperture is exactly what we need. I think I understand what you want from xDS api here If we come up with an alternative envoy PR that implements your suggestions and modifies only xDS API + gRFC for an aperture implementation + an actual implementation in Go, would it be something you can potentially accept? Alternatively we can follow your suggestion and implement subsetting logic on the management server, but the complexity of doing it on the server in my opinion is the same as doing it on the client. However, if we do it on the client we can benefit from community support. We can also combine it with ORCA to achieve even better load distribution. |
What I mean by a unique identifier is a name for a given endpoint that remains constant even if that endpoint moves to a different address (e.g., if a machine in a k8s cluster fails and the pods that were running on that machine get moved to a different machine). Internally, we have some subsetting algorithms that use that kind of unique identifier to avoid unnecessary connection churn in cases like that, because the endpoint changing addresses will not result in changing which clients are assigned to that endpoint. These algorithms can also handle dynamic adjustment of the subset when some of the initially chosen endpoints are down, without breaking load distribution. I don't think you can get those properties with twitter aperature subsetting -- although to be fair, you can't get some of it with subsetting on the management server either. In any case, I have no objection to supporting something like twitter aperature subsetting in xDS or in gRPC. In principle, I think it's totally reasonable to have a "parent" LB policy that performs subsetting, and I think people can experiment with a variety of such policies that provide different subsetting algorithms, some of which may be better in some situations than others. I think it's a shame that the Envoy PR you mentioned didn't move forward, but I wasn't able to get the contributor to understand what I was asking for, which was just a clean separation of the subsetting piece from the actual load balancing piece. I just don't think it's a good idea to hard-code the two pieces together by providing them both in a single LB policy, because that's unnecessarily inflexible. In principle, I think there can be a variety of subsetting policies and a variety of load balancing policies, and it should be possible to mix and match them in any way that may be desirable (e.g., use aperature subsetting with WRR load balancing, or use some other subsetting algorithm with P2C load balancing). So yes, if you would like to put forward a proposal for a "parent" LB policy that supports aperature subsetting and/or a "leaf" policy that supports P2C load balancing, I'd be happy to review. I'd suggest starting with a gRFC describing both the gRPC functionality and the proposed xDS protos. Once we have consensus on that, you can put together a PR for the xDS proto changes, which I can help review; the gRFC will probably be useful context for getting the xDS proto change through. I hope this info is helpful. Please let me know if you have any questions. |
@s-matyukevich : Are you satisfied with the answers for your question here. Can we close this? Thanks. |
Yes, thanks for the answers! We are still discussing internally if we are going to try to implement aperture subletting in grpc or proceed with some other solution. Closing this. |
I created a gRFC grpc/proposal#383 and a POC in Go #6488 |
In our organization most of the teams use round_robin with healthchecks on the client to establish a reliable connection to the servers and achieve even load distribution. This works great, but after some point this doesn't scale very well.
The main problem with this setup is that the overal number of connections grows exponentially while the number of clients increases. This problem is especially bad in grpc-go, because grpc-go has issues with managing memory overhead per connection. Not only memory is wasted as other resources are consumed on processing health-checks and and maintaining connections at OS level.
There are a few options that we have tried or are considering:
Here are the subsetting implementation options that we have tried or considered:
We tried the first 2 options and it works ok, but we still have some imbalance on the server and we have to maintain the custom LB. This code is really generic - if this is the recommended way of doing subsetting we can create a PR and donate our subsetting LB to grpc-go. I guess we can also combine this with some ORCA load reporting functionality to deal with server imbalance and provide a generic reusable solution.
We didn't try the xDS option yet, but it will definitely increase the overhead of managing EDS keys on our management server. Ideally we would like to reuse the same EDS response for every client and make the client to pick a subset based on some parameters we send from xDS control plane, Is that something feasible? A related envoy PR was closed.
Are there any other options that we didn't consider? Any guidance here will be greatly appreciated.
cc @markdroth
The text was updated successfully, but these errors were encountered: