-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance/health feedback to client and the Unified Health Controller #16297
Labels
type/enhancement
The issue or PR belongs to an enhancement.
Comments
9 tasks
ti-chi-bot bot
added a commit
that referenced
this issue
Feb 2, 2024
…Service from PdWorker to it (#16456) ref #16297 Add module health_controller and move SlowScore, SlowTrend, HealthService from PdWorker to it Signed-off-by: MyonKeminta <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
This was referenced Feb 4, 2024
ti-chi-bot bot
pushed a commit
that referenced
this issue
Feb 20, 2024
…16498) ref #16297 Support sending health feedback information to the client via BatchCommandResponse Signed-off-by: MyonKeminta <[email protected]>
dbsid
pushed a commit
to dbsid/tikv
that referenced
this issue
Mar 24, 2024
…Service from PdWorker to it (tikv#16456) ref tikv#16297 Add module health_controller and move SlowScore, SlowTrend, HealthService from PdWorker to it Signed-off-by: MyonKeminta <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: dbsid <[email protected]>
dbsid
pushed a commit
to dbsid/tikv
that referenced
this issue
Mar 24, 2024
…ikv#16498) ref tikv#16297 Support sending health feedback information to the client via BatchCommandResponse Signed-off-by: MyonKeminta <[email protected]> Signed-off-by: dbsid <[email protected]>
This was referenced Apr 8, 2024
9 tasks
ti-chi-bot bot
added a commit
that referenced
this issue
May 16, 2024
…ling RPC (#17008) close #16297 This PR makes TiKV support explicitly getting health feedback information by calling RPC. Both non-batched mode and batched mode (using BatchCommands stream) are supported. There's some special behavior when used in batched RPC mode: The BatchCommandsResponse that contains the response of getting health feedback will always have feedback information attached (in the same way as how it's attached without being explicitly requested), and the attached information and the information carried in each single GetHealthFeedbackResponse-s is the same. Signed-off-by: MyonKeminta <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Development Task
Currently, there are several ways to evaluate whether a TiKV node is in healthy state. They are:
And all these things are managed in PdWorker.
We found the evict-slow-store-scheduler mentioned before is quite helpful to solve the problem that some TiKV nodes encountering slow IO might significantly affect the whole cluster's performance. However, there can still be problem if follower read is being used in the cluster, since it doesn't need to be processed on leader.
To solve the problem, we want the client-go to know whether each TiKV node is abnormal and avoid sending follower read requests to the problematic ones. We are now considering making it able to send some of the health information to the client via kv responses, so that the client has a more efficient and up-to-time information about the TiKV nodes' status and adjust the policy to select replicas.
We are planning to add a component named Unified Health Controller, which will be the unified entrance for managing and accessing the health status of the TiKV node. The SlowScore, SlowTrend and gRPC HealthService mentioned above should be moved to it, and PdWorker will still be responsible for updating them. The component itself should be outside PdWorker, which enables us to access it elsewhere, or add more information to it that are not proper to be updated in PdWorker (e.g., readpool's stats).
TiKV
PD
PDClient support gettingStoreStats
field in GetStore. It's already included by the RPC protocol but not returned by the golang PDClient.client-go
Other related works in client-go repo: tikv/client-go#1104
TiDB
Dependencies of TiDB repo needs to be updated several times, but there isn't any major development task in TiDB repo.
Ref: pingcap/tidb#51412
Next step
The text was updated successfully, but these errors were encountered: