-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support receving health feedback #1153
Conversation
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
fbdbfd9
to
55e9683
Compare
Signed-off-by: MyonKeminta <[email protected]>
55e9683
to
cce241a
Compare
Signed-off-by: MyonKeminta <[email protected]>
8dc5d28
to
f769b67
Compare
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
e6a2673
to
1ddc60d
Compare
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
internal/locate/region_cache.go
Outdated
c.forEachStore(func(store *Store) { | ||
store.updateSlowScoreStat() | ||
slowScoreMetrics[store.storeID] = float64(store.getSlowScore()) | ||
store.healthStatus.update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure the update
run as fast as possible, otherwise storeMu
will be locked for a long time. Expensive operations need to be processed outside (maybe iterate stores by batch in the future). It's OK by now.
internal/locate/region_cache.go
Outdated
}) | ||
for store, score := range slowScoreMetrics { | ||
metrics.TiKVStoreSlowScoreGauge.WithLabelValues(strconv.FormatUint(store, 10)).Set(score) | ||
logutil.BgLogger().Info("checkAndUpdateStoreHealthStats: get health details", zap.Reflect("details", healthDetails)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this log be annoying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I added this for debugging.
@@ -108,6 +108,13 @@ type Client interface { | |||
CloseAddr(addr string) error | |||
// SendRequest sends Request. | |||
SendRequest(ctx context.Context, addr string, req *tikvrpc.Request, timeout time.Duration) (*tikvrpc.Response, error) | |||
// SetEventListener registers an event listener for the Client instance. If called more than once, the previously | |||
// set one will be replaced. | |||
SetEventListener(listener ClientEventListener) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just declare that the method is not thread safe and should be called before SendRequest
so that we can get rid of atomic.Pointer
and make related code simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'm afraid that the assumption might be broken easily by mistake in the future...
store.recordHealthFeedback(feedback) | ||
} | ||
|
||
func (c *RegionCache) GetClientEventListener() client.ClientEventListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe document that it will be registered to the internal client of KVStore
automatically when calling tikv.NewKVStore
.
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
…eive-health-feedback
Signed-off-by: MyonKeminta <[email protected]>
@crazycs520 PTAL |
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
Signed-off-by: MyonKeminta <[email protected]>
As I noticed that the interface compatibility problem of unistore can be workarounded by |
internal/locate/region_cache.go
Outdated
tikvSlowScoreDecayRate = 20. / 60. // s^(-1), linear decaying | ||
tikvSlowScoreSlowThreshold = 80. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tikvSlowScoreDecayRate = 20. / 60. // s^(-1), linear decaying | |
tikvSlowScoreSlowThreshold = 80. | |
tikvSlowScoreDecayRate float64 = 20.0 / 60.0 // s^(-1), linear decaying | |
tikvSlowScoreSlowThreshold float64 = 80.0 |
@@ -3164,6 +3339,10 @@ func (s *Store) recordReplicaFlowsStats(destType replicaFlowsType) { | |||
atomic.AddUint64(&s.replicaFlowsStats[destType], 1) | |||
} | |||
|
|||
func (s *Store) recordHealthFeedback(feedback *tikvpb.HealthFeedback) { | |||
s.healthStatus.updateTiKVServerSideSlowScore(int64(feedback.GetSlowScore()), time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feedback.FeedbackSeqNo is never used now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not checked for now. Maybe I'd better add a comment to note it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REST LGTM
Signed-off-by: MyonKeminta <[email protected]>
return | ||
} | ||
|
||
// TODO: Try to get store status from PD here. But it's not mandatory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function would be executed inside the batchRecvLoop
, would it affect the performance if we call pdClient.func
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This function is not expected to be run in batchRecvLoop, but should be a periodic task executed in the background.
The function that might be called inside batchRecvLoop is updateTiKVServerSideSlowScore
.
Here what we need to be careful about is the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, it's better to add comments to functions that would be used in the critical performance path.
Ref: tikv/tikv#16297