-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A68: Random subsetting with rendezvous hashing LB policy #423
Conversation
A68-random-subsetting.md
Outdated
* When the lb policy is initialized it also creates a random 32-byte long `salt` string. | ||
* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm: | ||
* Concatenate `salt` to each address in the list. | ||
* For every resulting entity compute [MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) hash, which produces 128-byte output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no dependency on murmur from grpc, at least in Go, as of today. You can use xxhash which is depended upon by ring_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use xxhash, This changes the algorithms slightly as we can use random pre-generated seed instead of concatenating salt to each address. The new version is even simpler.
A68-random-subsetting.md
Outdated
|
||
### Handling Parent/Resolver Updates | ||
|
||
When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should replace addresses with endpoints to take A61 into consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
A68-random-subsetting.md
Outdated
|
||
## Proposal | ||
|
||
Introduce a new LB policy, `random_subsetting`. This policy selects a subset of addresses and passes them to the child LB policy. It maintains 2 important properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to replace addresses
with endpoints
to account for https://github.com/grpc/proposal/blob/master/A61-IPv4-IPv6-dualstack-backends.md, where each endpoint may have multiple addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
A68-random-subsetting.md
Outdated
* The policy receives a single configuration parameter: `subset_size`, which must be configured by the user. | ||
* When the lb policy is initialized it also creates a random 32-byte long `salt` string. | ||
* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm: | ||
* Concatenate `salt` to each address in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to decide which address, in case the endpoint has more than one (I think you can use the first address?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the doc to use the first address.
As described in [gRFC A52](https://github.com/grpc/proposal/blob/master/A52-xds-custom-lb-policies.md), gRPC has an LB policy registry, which maintains a list of converters. Every converter translates xDS LB policy to the corresponding service config. In order to allow using the Random subsetting LB policy via xDS, the only thing that needs to be done is providing a corresponding converter function. The function implementation will be trivial as the fields in the xDS LB policy will match exactly the fields in the service config. | ||
|
||
## Rationale | ||
### Alternatives Considered: Deterministic subsetting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably discuss the trade offs of doing this kind of subsetting in the control plane, since it was discussed in the original proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I posted a link to the tl;dr; of the discussion, so you think this is not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of the option of doing random subsetting in the control plane by sending different EDS responses (with different subsets) to each dataplane, or the equivalent with other resolvers. It is simple to implement with xDS and works for Envoy and gRPC. IIRC the main argument for not going that route is the need to have an xDS infrastructure (this is a big barrier for our orgs, and probably others), and existing limitations of https://github.com/envoyproxy/go-control-plane.
This was discussed in https://github.com/grpc/proposal/pull/383/files#r1308024474.
In order to understand this proposal, I think users will need to understand the trade off of doing it as a balancer in each data plane rather than directly in service discovery.
Co-authored-by: Antoine Tollenaere <[email protected]>
Co-authored-by: Antoine Tollenaere <[email protected]>
Co-authored-by: Antoine Tollenaere <[email protected]>
… into random-subsetting
Bump on this. It has been almost a month since the proposal was submitted and no one from gRPC maintainers commented on it yet. cc @markdroth and @ejona86 since you both reviewed previous version of the proposal and have full context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay reviewing this! Overall, it looks very good -- my comments here are mostly minor.
One of the reasons for the delay was that we were having a bunch of dicussions internally about ways to potentially reduce the number of unnecessary idle connections with policies other than pick_first, and I wanted to make sure that where we landed on that didn't conflict with this proposal. Just as an FYI, let me sketch out what we have in mind to eventually do for that, so that you know where we're heading.
The general problem that we're trying to solve is that LB policies like RR and WRR proactively maintain connections to all endpoints and therefore waste a lot of resources on idle connections when the client is idle. To deal with this, we will eventually want to add an LB policy that tracks the number of simultaneous RPCs in flight on the channel over the last N minutes and scale the number of connections accordingly. This new policy would operate similarly to the subsetting policy described in this gRFC: it would filter the set of addresses passed down to the child policy.
One of the things we were discussing was how this theoretical new policy would interact with the subsetting policy described in this gRFC. We wound up deciding that we would probably just put this theoretical new policy underneath the subsetting policy, so that the two policies can basically be independent of each other. Obviously, this will probably make the actual connection distribution a bit less even, but it will do so only in the face of idle clients, and it will not actually break the distribution imposed by the subsetting policy -- it would just eliminate some of the connections that would have been chosen by the subsetting policy but would have been idle.
Anyway, as I mentioned, this gRFC looks very good! I'd like to get reviews from @ejona86 and @dfawley as well, but I don't foresee any significant blockers.
Please let me know if you have any questions. Thanks!
A68-random-subsetting.md
Outdated
The `random_subsetting` LB policy config will be as follows. | ||
|
||
```proto | ||
message LoadBalancingConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Once we're happy with this gRFC, we'll also want a PR to make these changes to service_config.proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR grpc/grpc-proto#157
|
||
Random subsetting LB won't depend on xDS in any way. People may choose to initialize it by directly providing service config. We will only provide a corresponding xDS policy wrapper to allow configuring this LB via xDS. | ||
|
||
#### Changes to xDS API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth chatting with @wbpcode to make sure this design would also work for Envoy. I don't know of any reason it wouldn't, but might make sense to check just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not hard to implement that in envoy. Just go ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To deal with this, we will eventually want to add an LB policy that tracks the number of simultaneous RPCs in flight on the channel over the last N minutes and scale the number of connections accordingly.
One thing for me was how similar the subsetting would be between the two load balancer policies. But the subsetting algorithm here could be used directly by the other LB policy as well. (The other LB policy could maybe just delegate to this, changing its subset_size over time; or we extend this to support dynamic subset size. All the options remain on the table with this gRFC.)
This is also the general problem that we are trying to solve with this gRFC and #430. And making the subset size dynamic would be ideal, but seems much harder, as the subset size impacts load balancing, and server load must be reasonably predictable. I don't know of any published work on making the number of connections dynamic (no fixed subset size) while also keeping the server load reasonably balanced. So I'm very curious where you are at with those discussion and if you have rough ideas on how you would address load balancing. |
Basically the theory is you would size a minimum subset like you are doing here to a degree. But if the client causes a lot of load, then you scale up, potentially to all backends. IIRC, we were thinking of defining load as "concurrent RPCs." Most of the discussion was spent avoiding wildly scaling up/down, like with bursty clients, causing damage. |
I think the approach that we have in mind would probably result in slightly less ideally balanced server load than the results you've seen, specifically because the connections will no longer be perfectly balanced. But in cases where the RPC rate from different clients diverge wildly, or where RPC rates are very bursty, maintaining unnecessary idle connections can cost significant amounts of memory and some additional CPU. I think there are situations where the costs of that unnecessary overhead is expensive enough that it's worth taking steps to reduce it, even if it results in slightly less optimally balanced server load. Like anything else in this space, it's a bit of a balancing act (pun intended). There will definitely be cases where the approach that we have in mind won't work well, but there will be cases where it will. But we think it will be a useful tool to have in our toolbox at some point. |
What are the next steps here? Can someone with the right permissions merge this? |
@dfawley, Mark is the listed approver currently, but I'm fine merging this as it seems things are in order. Since the first implementation is in Go, you may want to take a look, but you agree the approvers seems ready? |
This LGTM overall, but I'm a little concerned about this comment from Mark and its impact on this gRFC: https://github.com/grpc/proposal/pull/423/files#r1676227354 Have you had this discussion yet, @s-matyukevich? Otherwise, should we wait to merge this until that's been finalized (more of a Q for @ejona86)? @markdroth did approve, so maybe he didn't see it as a blocker. |
I don't really anticipate any problems on the Envoy side, but it would be good to get a sanity check from @wbpcode before merging this, just to be safe. I'll ping him offline to request his input. |
Thanks for confirmation, @wbpcode! I think that's all the open questions here, so I'll go ahead and merge. Thanks! |
Replaces #383
Related to #430 which describes an LB policy that could be used in combination with random subsetting to correct the resulting server-side load imbalance.