Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A68: Random subsetting with rendezvous hashing LB policy #423

Merged
merged 19 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions A68-random-subsetting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
A68: Random subsetting with rendezvous hashing LB policy.
----
* Author(s): @s-matyukevich
* Approver:
s-matyukevich marked this conversation as resolved.
Show resolved Hide resolved
* Status: Draft
* Implemented in: PoC in Go
* Last updated: 2024-04-15
* Discussion at: https://groups.google.com/g/grpc-io/c/oxNJT1GgjEg

## Abstract

Add support for the `random_subsetting` load balancing policy.

## Background

Currently, gRPC is lacking a way to select a subset of endpoints available from the resolver and load-balance requests between them. Out of the box, users have the choice between two extremes: `pick_first` which sends all requests to one random backend, and `round_robin` which sends requests to all available backends. `pick_first` has poor connection balancing when the number of client is not much higher than the number of servers. The problem is exacerbated during rollouts because `pick_first` does not change endpoint on resolver updates if the current subchannel remains `READY`. `round_robin` results in every server having as many connections open as there are clients, which is unnecessarily costly when there are many clients, and makes local load balancing decisions (such as outlier detection) less precise.
ejona86 marked this conversation as resolved.
Show resolved Hide resolved

### Related Proposals:
* [gRFC A27: A52: gRPC xDS Custom Load Balancer Configuration](https://github.com/grpc/proposal/blob/master/A52-xds-custom-lb-policies.md)
s-matyukevich marked this conversation as resolved.
Show resolved Hide resolved

## Proposal

Introduce a new LB policy, `random_subsetting`. This policy selects a subset of addresses and passes them to the child LB policy. It maintains 2 important properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to replace addresses with endpoints to account for https://github.com/grpc/proposal/blob/master/A61-IPv4-IPv6-dualstack-backends.md, where each endpoint may have multiple addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* The policy tries to distribute connections among servers as equally as possible. The higher `(N_clients*subset_size)/N_servers` ratio is, the closer the resulting server connection distribution is to uniform.
* The policy minimizes the amount of connection churn generated during server scale-ups by using [rendezvous hashing](https://en.wikipedia.org/wiki/Rendezvous_hashing)

### Subsetting algorithm

* The policy receives a single configuration parameter: `subset_size`, which must be configured by the user.
markdroth marked this conversation as resolved.
Show resolved Hide resolved
* When the lb policy is initialized it also creates a random 32-byte long `salt` string.
* After every resolver update the policy picks a new subset. It does this by implementing `rendezvous hashing` algorithm:
dfawley marked this conversation as resolved.
Show resolved Hide resolved
* Concatenate `salt` to each address in the list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to decide which address, in case the endpoint has more than one (I think you can use the first address?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc to use the first address.

* For every resulting entity compute [MurmurHash3](https://en.wikipedia.org/wiki/MurmurHash) hash, which produces 128-byte output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no dependency on murmur from grpc, at least in Go, as of today. You can use xxhash which is depended upon by ring_hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use xxhash, This changes the algorithms slightly as we can use random pre-generated seed instead of concatenating salt to each address. The new version is even simpler.

* Sort all entities by hash.
* Pick first `subset_size` values from the list.
* Pass the resulting subset to the child LB policy.
* If the number of addresses is less than `subset_size` always use all available addresses.

### Characteristics of the selected algorithm

#### Uniform connection distribution on high scale

When `(N_clients*subset_size)/N_servers` ratio is high, the resulting connection distribution between servers is close to uniform. This is because the chosen hash function is uniform and every server has equal probability to be chosen by every client.
Though it could be done, we don't provide any mathematical guaranties about the resulting connection distribution. Any such guarantees will be probabilistic and have limited value in practice. Instead, we can give you a few samples:
* N_clients = 100, N_servers = 100, subset_size = 5
![State Diagram](A68_graphics/subsetting100-100-5.png)
* N_clients = 100, N_servers = 100, subset_size = 25
![State Diagram](A68_graphics/subsetting100-100-25.png)
* N_clients = 100, N_servers = 10, subset_size = 5
![State Diagram](A68_graphics/subsetting100-10-5.png)
* N_clients = 500, N_servers = 10, subset_size = 5
![State Diagram](A68_graphics/subsetting500-10-5.png)
* N_clients = 2000, N_servers = 10, subset_size = 5
![State Diagram](A68_graphics/subsetting2000-10-5.png)

#### Low connection churn during server rollouts
markdroth marked this conversation as resolved.
Show resolved Hide resolved

The graphs provided in the previous section prove this is the case in practice (we rollout all servers in the middle of every test, and there is no visible increase in the number of connections per server) Low connection churn during server rollouts is the primary motivation why rendezvous hashing was used as the subsetting algorithm: it guaranties that if a single server is either added or removed to the IP address list, every client will update at most 1 entry in its subset. This is because the hashes for all unaffected servers remain the same, which guarantees that the order of the servers after sorting also remains stable. The same logic applies to the situation when multiple servers got updated.

### LB Policy Config and Parameters
markdroth marked this conversation as resolved.
Show resolved Hide resolved

The `random_subsetting` LB policy config will be as follows.

```proto
message LoadBalancingConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Once we're happy with this gRFC, we'll also want a PR to make these changes to service_config.proto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the PR grpc/grpc-proto#157

oneof policy {
RandomSubsettingLbConfig random_subsetting = 21 [json_name = "random_subsetting"];
}
}
message RandomSubsettingLbConfig {
// subset_size indicates how many backends every client will be connected to.
// Default is 20.
markdroth marked this conversation as resolved.
Show resolved Hide resolved
google.protobuf.UInt32Value subset_size = 1;
markdroth marked this conversation as resolved.
Show resolved Hide resolved

// The config for the child policy.
repeated LoadBalancingConfig child_policy = 2;
}
```


### Handling Parent/Resolver Updates

When the resolver updates the list of addresses, or the LB config changes, Random subsetting LB will run the subsetting algorithm, described above, to filter the endpoint list. Then it will create a new resolver state with the filtered list of the addresses and pass it to the child LB. Attributes and service config from the old resolver state will be copied to the new one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should replace addresses with endpoints to take A61 into consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


### Handling Subchannel Connectivity State Notifications

Random subsetting LB will simply redirect all requests to the child LB without doing any additional processing. This also applies to all other callbacks in the LB interface, besides the one that handles resolver and config updates (which is described in the previous section). This is possible because random subsetting LB doesn't store or manage sub-connections - it acts as a simple filter on the resolver state, and that's why it can redirect all actual work to the child LB.

### xDS Integration

Random subsetting LB won't depend on xDS in any way. People may choose to initialize it by directly providing service config. We will only provide a corresponding xDS policy wrapper to allow configuring this LB via xDS.

#### Changes to xDS API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth chatting with @wbpcode to make sure this design would also work for Envoy. I don't know of any reason it wouldn't, but might make sense to check just in case.

Copy link

@wbpcode wbpcode Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not hard to implement that in envoy. Just go ahead.


`random_subsetting` will be added as a new LB policy.

```proto
package envoy.extensions.load_balancing_policies.random_subsetting.v3;
message RandomSubsetting {
google.protobuf.UInt32Value subset_size = 1;
markdroth marked this conversation as resolved.
Show resolved Hide resolved
repeated LoadBalancingConfig child_policy = 2;
markdroth marked this conversation as resolved.
Show resolved Hide resolved
}
```

As you can see, the fields in this policy match exactly the fields in the random subsetting LB service config.
markdroth marked this conversation as resolved.
Show resolved Hide resolved

#### Integration with xDS LB Policy Registry
As described in [gRFC A52](https://github.com/grpc/proposal/blob/master/A52-xds-custom-lb-policies.md), gRPC has an LB policy registry, which maintains a list of converters. Every converter translates xDS LB policy to the corresponding service config. In order to allow using the Random subsetting LB policy via xDS, the only thing that needs to be done is providing a corresponding converter function. The function implementation will be trivial as the fields in the xDS LB policy will match exactly the fields in the service config.

## Rationale
### Alternatives Considered: Deterministic subsetting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably discuss the trade offs of doing this kind of subsetting in the control plane, since it was discussed in the original proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I posted a link to the tl;dr; of the discussion, so you think this is not enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of the option of doing random subsetting in the control plane by sending different EDS responses (with different subsets) to each dataplane, or the equivalent with other resolvers. It is simple to implement with xDS and works for Envoy and gRPC. IIRC the main argument for not going that route is the need to have an xDS infrastructure (this is a big barrier for our orgs, and probably others), and existing limitations of https://github.com/envoyproxy/go-control-plane.

This was discussed in https://github.com/grpc/proposal/pull/383/files#r1308024474.

In order to understand this proposal, I think users will need to understand the trade off of doing it as a balancer in each data plane rather than directly in service discovery.


We explored the possibility of using deterministic subsetting in https://github.com/grpc/proposal/pull/383 and got push-back on this for the reasons explained [here](https://github.com/grpc/proposal/pull/383#discussion_r1334587561)

## Implementation
DataDog will provide Go and Java implementations.

Binary file added A68_graphics/subsetting100-10-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A68_graphics/subsetting100-100-25.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A68_graphics/subsetting100-100-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A68_graphics/subsetting2000-10-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added A68_graphics/subsetting500-10-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.