-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce unnecessary tikvServerBusy backoff when able to try next replica #1184
Conversation
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
…-1166 Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
internal/locate/region_request.go
Outdated
// the state is changed in accessFollower.next when leader is unavailable. | ||
func (s *replicaSelector) canFallback2Follower() bool { | ||
// canSkipServerIsBusyBackoff returns true if the request can be sent to next replica and can skip ServerIsBusy backoff. | ||
func (s *replicaSelector) canSkipServerIsBusyBackoff() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can reuse this function handling region unavailable and changing it's name in the future. Not this PR's work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'm thinking perhaps the better structure is to determine whether to backoff just before the next attempt, instead of just after encountering error, so that what @zyguan tried to do in #990 becomes a unified way.
Btw I just noticed that it seems that this PR and #990 looks trying to solve the same problem 🤔 cc @zyguan
/hold |
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Nice catch. #990 introduces pending backoff for fast retry, which is great idea. After discussing with @zyguan, I introduce the pending backoff idea into this PR and unified the fast retry logic. And later, we can add more backoff to pending to do fast retry. |
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
Signed-off-by: crazycs520 <[email protected]>
…into issue-1166
Signed-off-by: crazycs520 <[email protected]>
/unhold |
Signed-off-by: crazycs520 <[email protected]>
|
||
// canFastRetry returns true if the request can be sent to next replica. | ||
func (s *replicaSelector) canFastRetry() bool { | ||
accessLeader, ok := s.state.(*accessKnownLeader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just do fast retry when tikv_client_read_timeout
is configured? It looks a little bit dangerous to use
if not fast retry
return false
return true
as most of the request would be fast retried .
Besides, maybe we could consult slow information to decide whether fast retry should work. /cc @zyguan @MyonKeminta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the original logic of canFallback2Follower is trying to do fast backoff when retry next replica, so fast backoff is not only work for tikv_client_read_timeout
is configured.
cc @you06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the original logic of canFallback2Follower is trying to do fast backoff when retry next replica, so fast backoff is not only work for
tikv_client_read_timeout
is configured. cc @you06
@crazycs520 It the current logic the same? If so we could continue first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…-1166 Signed-off-by: crazycs520 <[email protected]>
close #1166
PR 923 introduce the skip
tikvServerBusy
backoff logic, but it it only work for stale-read, and ifserverIsBusy.EstimatedWaitMs > 0
is true, it can't skip.This PR expanded the scope of skips
tikvServerBusy
backoff logic, make non-leader read requests can skiptikvServerBusy
backoff, and ifserverIsBusy.EstimatedWaitMs > 0
is true, non-leader read requests can skip too.