A68: Random subsetting with rendezvous hashing LB policy #423

s-matyukevich · 2024-04-11T20:15:53Z

Replaces #383

Related to #430 which describes an LB policy that could be used in combination with random subsetting to correct the resulting server-side load imbalance.

A68-random-subsetting.md

atollena · 2024-04-15T12:54:35Z

A68-random-subsetting.md

+* When the lb policy is initialized it also creates a random 32-byte long `salt` string. 
+* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm:
+  * Concatenate `salt` to each address in the list.
+  * For every resulting entity compute [MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) hash, which produces 128-byte output.


There is no dependency on murmur from grpc, at least in Go, as of today. You can use xxhash which is depended upon by ring_hash.

updated to use xxhash, This changes the algorithms slightly as we can use random pre-generated seed instead of concatenating salt to each address. The new version is even simpler.

A68-random-subsetting.md

atollena · 2024-04-15T13:09:42Z

A68-random-subsetting.md

+
+### Handling Parent/Resolver Updates
+
+When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one. 


I think you should replace addresses with endpoints to take A61 into consideration.

atollena · 2024-04-15T13:11:18Z

A68-random-subsetting.md

+
+## Proposal
+
+Introduce a new LB policy, `random_subsetting`. This policy selects a subset of addresses and passes them to the child LB policy. It maintains 2 important properties:


I think you need to replace addresses with endpoints to account for https://github.com/grpc/proposal/blob/master/A61-IPv4-IPv6-dualstack-backends.md, where each endpoint may have multiple addresses.

A68-random-subsetting.md

atollena · 2024-04-15T13:17:57Z

A68-random-subsetting.md

+* The policy receives a single configuration parameter: `subset_size`, which must be configured by the user.
+* When the lb policy is initialized it also creates a random 32-byte long `salt` string. 
+* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm:
+  * Concatenate `salt` to each address in the list.


You'll have to decide which address, in case the endpoint has more than one (I think you can use the first address?).

Updated the doc to use the first address.

atollena · 2024-04-15T13:20:02Z

A68-random-subsetting.md

+As described in [gRFC A52](https://github.com/grpc/proposal/blob/master/A52-xds-custom-lb-policies.md), gRPC has an LB policy registry, which maintains a list of converters. Every converter translates xDS LB policy to the corresponding service config. In order to allow using the Random subsetting LB policy via xDS, the only thing that needs to be done is providing a corresponding converter function. The function implementation will be trivial as the fields in the xDS LB policy will match exactly the fields in the service config.
+
+## Rationale
+### Alternatives Considered: Deterministic subsetting


You should probably discuss the trade offs of doing this kind of subsetting in the control plane, since it was discussed in the original proposal.

Yeah, but I posted a link to the tl;dr; of the discussion, so you think this is not enough?

I'm thinking of the option of doing random subsetting in the control plane by sending different EDS responses (with different subsets) to each dataplane, or the equivalent with other resolvers. It is simple to implement with xDS and works for Envoy and gRPC. IIRC the main argument for not going that route is the need to have an xDS infrastructure (this is a big barrier for our orgs, and probably others), and existing limitations of https://github.com/envoyproxy/go-control-plane.

This was discussed in https://github.com/grpc/proposal/pull/383/files#r1308024474.

In order to understand this proposal, I think users will need to understand the trade off of doing it as a balancer in each data plane rather than directly in service discovery.

Co-authored-by: Antoine Tollenaere <[email protected]>

… into random-subsetting

s-matyukevich · 2024-05-06T16:52:35Z

Bump on this. It has been almost a month since the proposal was submitted and no one from gRPC maintainers commented on it yet. cc @markdroth and @ejona86 since you both reviewed previous version of the proposal and have full context.

markdroth

Sorry for the delay reviewing this! Overall, it looks very good -- my comments here are mostly minor.

One of the reasons for the delay was that we were having a bunch of dicussions internally about ways to potentially reduce the number of unnecessary idle connections with policies other than pick_first, and I wanted to make sure that where we landed on that didn't conflict with this proposal. Just as an FYI, let me sketch out what we have in mind to eventually do for that, so that you know where we're heading.

The general problem that we're trying to solve is that LB policies like RR and WRR proactively maintain connections to all endpoints and therefore waste a lot of resources on idle connections when the client is idle. To deal with this, we will eventually want to add an LB policy that tracks the number of simultaneous RPCs in flight on the channel over the last N minutes and scale the number of connections accordingly. This new policy would operate similarly to the subsetting policy described in this gRFC: it would filter the set of addresses passed down to the child policy.

One of the things we were discussing was how this theoretical new policy would interact with the subsetting policy described in this gRFC. We wound up deciding that we would probably just put this theoretical new policy underneath the subsetting policy, so that the two policies can basically be independent of each other. Obviously, this will probably make the actual connection distribution a bit less even, but it will do so only in the face of idle clients, and it will not actually break the distribution imposed by the subsetting policy -- it would just eliminate some of the connections that would have been chosen by the subsetting policy but would have been idle.

Anyway, as I mentioned, this gRFC looks very good! I'd like to get reviews from @ejona86 and @dfawley as well, but I don't foresee any significant blockers.

Please let me know if you have any questions. Thanks!

A68-random-subsetting.md

markdroth · 2024-07-12T16:24:18Z

A68-random-subsetting.md

+The `random_subsetting` LB policy config will be as follows.
+
+```proto
+message LoadBalancingConfig {


Note: Once we're happy with this gRFC, we'll also want a PR to make these changes to service_config.proto.

Sure, I'll do that.

Here is the PR grpc/grpc-proto#157

A68-random-subsetting.md

markdroth · 2024-07-12T17:20:23Z

A68-random-subsetting.md

+
+Random subsetting LB won't depend on xDS in any way. People may choose to initialize it by directly providing service config. We will only provide a corresponding xDS policy wrapper to allow configuring this LB via xDS.
+
+#### Changes to xDS API


It's probably worth chatting with @wbpcode to make sure this design would also work for Envoy. I don't know of any reason it wouldn't, but might make sense to check just in case.

It is not hard to implement that in envoy. Just go ahead.

ejona86

To deal with this, we will eventually want to add an LB policy that tracks the number of simultaneous RPCs in flight on the channel over the last N minutes and scale the number of connections accordingly.

One thing for me was how similar the subsetting would be between the two load balancer policies. But the subsetting algorithm here could be used directly by the other LB policy as well. (The other LB policy could maybe just delegate to this, changing its subset_size over time; or we extend this to support dynamic subset size. All the options remain on the table with this gRFC.)

A68-random-subsetting.md

atollena · 2024-07-18T07:55:36Z

The general problem that we're trying to solve is that LB policies like RR and WRR proactively maintain connections to all endpoints and therefore waste a lot of resources on idle connections when the client is idle. To deal with this, we will eventually want to add an LB policy that tracks the number of simultaneous RPCs in flight on the channel over the last N minutes and scale the number of connections accordingly. This new policy would operate similarly to the subsetting policy described in this gRFC: it would filter the set of addresses passed down to the child policy.

This is also the general problem that we are trying to solve with this gRFC and #430. And making the subset size dynamic would be ideal, but seems much harder, as the subset size impacts load balancing, and server load must be reasonably predictable. I don't know of any published work on making the number of connections dynamic (no fixed subset size) while also keeping the server load reasonably balanced. So I'm very curious where you are at with those discussion and if you have rough ideas on how you would address load balancing.

ejona86 · 2024-07-18T15:06:15Z

I don't know of any published work on making the number of connections dynamic (no fixed subset size) while also keeping the server load reasonably balanced.

Basically the theory is you would size a minimum subset like you are doing here to a degree. But if the client causes a lot of load, then you scale up, potentially to all backends. IIRC, we were thinking of defining load as "concurrent RPCs." Most of the discussion was spent avoiding wildly scaling up/down, like with bursty clients, causing damage.

markdroth · 2024-07-19T22:06:30Z

This is also the general problem that we are trying to solve with this gRFC and #430. And making the subset size dynamic would be ideal, but seems much harder, as the subset size impacts load balancing, and server load must be reasonably predictable. I don't know of any published work on making the number of connections dynamic (no fixed subset size) while also keeping the server load reasonably balanced. So I'm very curious where you are at with those discussion and if you have rough ideas on how you would address load balancing.

I think the approach that we have in mind would probably result in slightly less ideally balanced server load than the results you've seen, specifically because the connections will no longer be perfectly balanced. But in cases where the RPC rate from different clients diverge wildly, or where RPC rates are very bursty, maintaining unnecessary idle connections can cost significant amounts of memory and some additional CPU. I think there are situations where the costs of that unnecessary overhead is expensive enough that it's worth taking steps to reduce it, even if it results in slightly less optimally balanced server load.

Like anything else in this space, it's a bit of a balancing act (pun intended). There will definitely be cases where the approach that we have in mind won't work well, but there will be cases where it will. But we think it will be a useful tool to have in our toolbox at some point.

s-matyukevich · 2024-07-23T20:56:03Z

What are the next steps here? Can someone with the right permissions merge this?

ejona86 · 2024-07-24T20:54:20Z

@dfawley, Mark is the listed approver currently, but I'm fine merging this as it seems things are in order. Since the first implementation is in Go, you may want to take a look, but you agree the approvers seems ready?

A68-random-subsetting.md

dfawley · 2024-07-25T21:57:14Z

@dfawley, Mark is the listed approver currently, but I'm fine merging this as it seems things are in order. Since the first implementation is in Go, you may want to take a look, but you agree the approvers seems ready?

This LGTM overall, but I'm a little concerned about this comment from Mark and its impact on this gRFC: https://github.com/grpc/proposal/pull/423/files#r1676227354

Have you had this discussion yet, @s-matyukevich? Otherwise, should we wait to merge this until that's been finalized (more of a Q for @ejona86)? @markdroth did approve, so maybe he didn't see it as a blocker.

markdroth · 2024-08-16T22:16:43Z

I don't really anticipate any problems on the Envoy side, but it would be good to get a sanity check from @wbpcode before merging this, just to be safe. I'll ping him offline to request his input.

markdroth · 2024-08-19T15:28:53Z

Thanks for confirmation, @wbpcode!

I think that's all the open questions here, so I'll go ahead and merge. Thanks!

s-matyukevich added 7 commits April 10, 2024 15:41

Random subsetting with rendezvous hashing LB policy

97c8e73

Random subsetting with rendezvous hashing LB policy

1058d36

rename folder

ea7db4b

more images

4c2ca73

More images

c1b1b6e

review suggestion

8fd9fff

review comments

d58f609

s-matyukevich mentioned this pull request Apr 11, 2024

A68: Deterministic Subsetting LB policy #383

Closed

s-matyukevich changed the title ~~Random subsetting with rendezvous hashing LB policy~~ A68: Random subsetting with rendezvous hashing LB policy Apr 11, 2024

Add discussion link

5ce1497

atollena reviewed Apr 15, 2024

View reviewed changes

s-matyukevich and others added 6 commits April 15, 2024 08:02

Update A68-random-subsetting.md

802fde1

Co-authored-by: Antoine Tollenaere <[email protected]>

Update A68-random-subsetting.md

31cdd68

Co-authored-by: Antoine Tollenaere <[email protected]>

Update A68-random-subsetting.md

bcd293c

Co-authored-by: Antoine Tollenaere <[email protected]>

replace addresses with endpoints

5c859ca

Merge branch 'random-subsetting' of github.com:s-matyukevich/proposal…

b710a6f

… into random-subsetting

add pseudocode

8796292

s-matyukevich mentioned this pull request Apr 26, 2024

A84: PID LB policy #430

Open

markdroth reviewed Jul 12, 2024

View reviewed changes

s-matyukevich added 2 commits July 12, 2024 12:30

Merge branch 'master' of github.com:grpc/proposal into random-subsetting

bf588db

adress markdroth review comments

c8a28d4

ejona86 reviewed Jul 16, 2024

View reviewed changes

A68-random-subsetting.md Outdated Show resolved Hide resolved

A68-random-subsetting.md Outdated Show resolved Hide resolved

adress review comments

e90c2eb

ejona86 reviewed Jul 17, 2024

View reviewed changes

A68-random-subsetting.md Outdated Show resolved Hide resolved

fix typo

29290f1

ejona86 approved these changes Jul 17, 2024

View reviewed changes

markdroth approved these changes Jul 19, 2024

View reviewed changes

s-matyukevich mentioned this pull request Jul 23, 2024

service config: add Random Subsetting LB Policy (gRFC A68) grpc/grpc-proto#157

Open

dfawley reviewed Jul 25, 2024

View reviewed changes

A68-random-subsetting.md Show resolved Hide resolved

A68-random-subsetting.md Show resolved Hide resolved

Add early return condition to filter_endpoints

439768d

wbpcode approved these changes Aug 17, 2024

View reviewed changes

markdroth merged commit 12d272d into grpc:master Aug 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A68: Random subsetting with rendezvous hashing LB policy #423

A68: Random subsetting with rendezvous hashing LB policy #423

s-matyukevich commented Apr 11, 2024 •

edited

Loading

atollena Apr 15, 2024

s-matyukevich Apr 15, 2024

atollena Apr 15, 2024

s-matyukevich Apr 15, 2024

atollena Apr 15, 2024

s-matyukevich Apr 15, 2024

atollena Apr 15, 2024

s-matyukevich Apr 15, 2024

atollena Apr 15, 2024

s-matyukevich Apr 15, 2024

atollena Apr 15, 2024

s-matyukevich commented May 6, 2024

markdroth left a comment

markdroth Jul 12, 2024

s-matyukevich Jul 12, 2024

s-matyukevich Jul 23, 2024

markdroth Jul 12, 2024

wbpcode Aug 17, 2024 •

edited

Loading

ejona86 left a comment

atollena commented Jul 18, 2024 •

edited

Loading

ejona86 commented Jul 18, 2024

markdroth commented Jul 19, 2024

s-matyukevich commented Jul 23, 2024

ejona86 commented Jul 24, 2024

dfawley commented Jul 25, 2024

markdroth commented Aug 16, 2024

markdroth commented Aug 19, 2024


		### Handling Parent/Resolver Updates

		When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one.


		## Proposal

		Introduce a new LB policy, `random_subsetting`. This policy selects a subset of addresses and passes them to the child LB policy. It maintains 2 important properties:


		Random subsetting LB won't depend on xDS in any way. People may choose to initialize it by directly providing service config. We will only provide a corresponding xDS policy wrapper to allow configuring this LB via xDS.

		#### Changes to xDS API

A68: Random subsetting with rendezvous hashing LB policy #423

A68: Random subsetting with rendezvous hashing LB policy #423

Conversation

s-matyukevich commented Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s-matyukevich commented May 6, 2024

markdroth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wbpcode Aug 17, 2024 • edited Loading

Choose a reason for hiding this comment

ejona86 left a comment

Choose a reason for hiding this comment

atollena commented Jul 18, 2024 • edited Loading

ejona86 commented Jul 18, 2024

markdroth commented Jul 19, 2024

s-matyukevich commented Jul 23, 2024

ejona86 commented Jul 24, 2024

dfawley commented Jul 25, 2024

markdroth commented Aug 16, 2024

markdroth commented Aug 19, 2024

s-matyukevich commented Apr 11, 2024 •

edited

Loading

wbpcode Aug 17, 2024 •

edited

Loading

atollena commented Jul 18, 2024 •

edited

Loading