Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer: rewrite the consistent hashring balancer to avoid recomputations #1310

Merged
merged 1 commit into from
May 18, 2023

Conversation

ecordell
Copy link
Contributor

The previous implementation was recomputing the hashring any time a subconnection moved from ready->idle or back, which happened frequently.

The new implementation includes idle and connecting subconns in the ring, and triggers a connection if one is selected. It also adds/removes from a long-lived ring instead of recomputing a ring from scratch each time.

ReplicationFactor and Spread can now be passed in as part of the service config instead of registered globally with the balancer

@github-actions github-actions bot added the area/CLI Affects the command line label May 11, 2023
@ecordell ecordell force-pushed the idlepickerhashring branch 2 times, most recently from 47996e3 to 1ca1002 Compare May 11, 2023 18:32
@ecordell ecordell changed the title balancer: rewrite the consistent hashring balancer to avoid recomputates balancer: rewrite the consistent hashring balancer to avoid recomputations May 11, 2023
@ecordell ecordell force-pushed the idlepickerhashring branch from 1ca1002 to 53c6a19 Compare May 12, 2023 14:56
@github-actions github-actions bot added the area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) label May 12, 2023
@ecordell ecordell force-pushed the idlepickerhashring branch from 53c6a19 to 23c2ed8 Compare May 12, 2023 15:01
@ecordell ecordell marked this pull request as ready for review May 12, 2023 15:01
@ecordell ecordell requested a review from a team May 12, 2023 15:01
@ecordell ecordell force-pushed the idlepickerhashring branch 3 times, most recently from b79aac4 to 36d5e41 Compare May 15, 2023 14:29
@josephschorr josephschorr self-requested a review May 17, 2023 23:51
@ecordell ecordell force-pushed the idlepickerhashring branch from 36d5e41 to 501590b Compare May 17, 2023 23:51
ReplicationFactor: 100,
Spread: 1,
}
var DefaultBalancerServiceConfigJSON = defaultBalancerServiceConfig.MustToServiceConfigJSON()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc comments on exported stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we find a lint rule for this? I feel like we started it because of lint rules but no linter complains anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the config for golangci-lint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Initialize picker to a picker that always returns
// ErrNoSubConnAvailable, because when state of a SubConn changes, we
// may call UpdateState with this picker.
bal.picker = base.NewErrPicker(balancer.ErrNoSubConnAvailable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this done this way vs just doing so in the struct above?

logger.Infof("parsed balancer config %s", js)

if lbCfg.ReplicationFactor == 0 {
lbCfg.ReplicationFactor = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have these defaults be in constants?

if len(info.ReadySCs) == 0 {
return base.NewErrPicker(balancer.ErrNoSubConnAvailable)
func (b *ConsistentHashringBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
if logger.V(2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we do this kind of log level check anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is using the grpc logging framework, not logrus

logger.Infof("consistentHashringPicker: Build called with info: %v", info)
if len(info.ReadySCs) == 0 {
return base.NewErrPicker(balancer.ErrNoSubConnAvailable)
func (b *ConsistentHashringBalancer) UpdateClientConnState(s balancer.ClientConnState) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a few more comments describing the workflow here?

b.cc.UpdateState(balancer.State{ConnectivityState: b.state, Picker: b.picker})
}

// Close is a nop because base balancer doesn't have internal state to clean up,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-op

// Note: this is testing picker behavior and not the hashring
// behavior itself, see `pkg/consistent` for tests of the hashring.
func TestConsistentHashringPickerPick(t *testing.T) {
rnd := rand.New(rand.NewSource(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here that we are pinning the seed for below

t.Errorf("Pick() error = %v, wantErr %v", err, tt.wantErr)
return
}
if !reflect.DeepEqual(got, tt.want) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.True(reflect.DeepEqual(got, tt.want), "Pick() got ...")?

wantErr bool
}{
{
name: "sets rf and spread",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one test case in a table-driven test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some more

Spread: tt.spread,
}
got, err := c.ToServiceConfigJSON()
if (err != nil) != tt.wantErr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer to do

  // ensure error
  return
}

// ensure non-error case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it; I sometimes use goland's test scaffolding which writes assertions like this, and I don't always change them to more idiomatic testify

@ecordell ecordell force-pushed the idlepickerhashring branch 3 times, most recently from ef2f607 to 5bb867e Compare May 18, 2023 03:05
// ParseConfig satisfies balancer.ConfigParser and is used to parse new
// Service Config json. The results are stored on the builder so that
// subsequently built Balancers use the config.
func (b *ConsistentHashringBuilder) ParseConfig(js json.RawMessage) (serviceconfig.LoadBalancingConfig, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems weird that ParseConfig stores a value, but since that's what the interface requires, guess its okay

}
// Successful resolution; clear resolver error and ensure we return nil.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

; -> :

// any that have been removed since the last update are removed from the
// hashring.
addrsSet := resolver.NewAddressMap()
for _, a := range s.ResolverState.Addresses {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use addr?

if _, ok := addrsSet.Get(a); !ok {
b.cc.RemoveSubConn(sc)
b.subConns.Delete(a)
// Keep the state of this sc in b.scStates until sc's state becomes Shutdown.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd spacing here

return balancer.ErrBadResolverState
}

// if the overall connection sate is not in transient failure, we return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sate -> state

// Pick satisfies balancer.Picker and returns a subconnection to use for a
// request based on the request info. The value stored in CtxKey is hashed
// into the hashring, and the resulting subconnection is used. Note that
// there is no fallback behavior if the subconnection; this prevents the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the subconnection... ?

@ecordell ecordell force-pushed the idlepickerhashring branch from 5bb867e to 302ba2b Compare May 18, 2023 16:07
Copy link
Member

@josephschorr josephschorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ecordell ecordell force-pushed the idlepickerhashring branch from 302ba2b to 89100db Compare May 18, 2023 16:14
@ecordell ecordell enabled auto-merge May 18, 2023 16:14
The previous implementation was recomputing the hashring any time a
subconnection moved from ready->idle or back, which happened frequently.

The new implementation includes idle and connecting subconns in the ring,
and triggers a connection if one is selected. It also adds/removes from
a long-lived ring instead of recomputing a ring from scratch each time.

ReplicationFactor and Spread can now be passed in as part of the
service config instead of registered globally with the balancer
@ecordell ecordell force-pushed the idlepickerhashring branch from 89100db to 94145c0 Compare May 18, 2023 16:20
@ecordell ecordell added this pull request to the merge queue May 18, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 18, 2023
@ecordell ecordell added this pull request to the merge queue May 18, 2023
Merged via the queue into authzed:main with commit 0026188 May 18, 2023
@ecordell ecordell deleted the idlepickerhashring branch May 18, 2023 17:10
@github-actions github-actions bot locked and limited conversation to collaborators May 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/CLI Affects the command line area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants