We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
长时间压测后,线程wal-raft-executor-112680774442680320_0 CPU高,一直降不下来 集群3个节点(32C,64G)(20,54,124 三台),35w客户端,每隔10s发 40K body 压测,每隔10-12小时休眠 3分钟左右。大概2天后,20节点 wal-raft-executor-112680774442680320_0 线程 CPU 占用高,54 节点上 wal-raft-executor-112680774434029568_0 线程 CPU 占用高,而且一直降不下来,同时 basekv-range-mutator 线程 CPU也很高而且无法将来下。 但这期间集群正常,warn.log 和 error.log 都没有错误打印, gc 日志正常。balancer 日志中能搜到该线程 20 节点 cpu 截图
20 节点 retain.store-fd6e1d50-7308-4146-84fd-5fa62de36212.log
2024-06-30 20:23:07.191 INFO [bg-task-executor-7] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2784, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true 2024-06-30 22:08:53.690 INFO [bg-task-executor-2] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2788, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]} 2024-07-01 09:55:06.882 INFO [bg-task-executor-3] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2844, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e], learner=[]}] result: true 2024-07-02 17:55:13.775 INFO [bg-task-executor] --- [KVRangeBalanceController.java:181] Balancer command[ReplicaCntBalancer,ChangeConfigCommand{toStore=fd6e1d50-7308-4146-84fd-5fa62de36212, kvRangeId=112680774442680320_0, expectedVer=2856, voters=[fd6e1d50-7308-4146-84fd-5fa62de36212, 7a67e104-2788-4608-a121-7b80e9dc001e, 542e442a-3748-4ec4-b6db-eda13ad225e6], learner=[]}] result: true
54 节点 cpu 截图
54 节点 inbox.store-0a40673e-7e57-47d6-8fa9-e69a2305152e.log
2024-07-03 19:27:35.164 INFO [bg-task-executor] --- [KVRangeBalanceController.java:169] Balancer[ReplicaCntBalancer] run command: ChangeConfigCommand{toStore=0a40673e-7e57-47d6-8fa9-e69a2305152e, kvRangeId=112680774434029568_0, expectedVer=3640, voters=[62837868-8a27-4d5c-9bc3-1a155fc63a66, e8a84d42-8292-489e-a241-9ce716d14e07, 0a40673e-7e57-47d6-8fa9-e69a2305152e], learner=[]}
BifroMQ
To Reproduce 压测客户端,35w client, 每隔8.5S 发送 body 40k qos =0 的消息,每隔10-12小时休眠 3分钟以上 *** PUB Client ***:
Expected behavior
Logs
Configurations
OS(please complete the following information):
JVM:
Performance Related
Additional context Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
用你给的reproduce信息无法复现你描述的现象,以下建议供参考:1)在issue描述中给出完整的稳定reproduce问题步骤,或者2)如果停止压测并重启后问题依然存在,可提供三台节点完整的data数据共诊断
Sorry, something went wrong.
data 数据 链接:https://pan.baidu.com/s/1K2gkC2vtzGz2ykbsFPYSAA?pwd=y5cg 提取码:y5cg
你的数据通过相关metrics(basekv_meta_ver)显示, inbox store和retain store的range经过了几千次的管理版本变更,并且副本之间的进展也不同步,占用cpu的线程应该是leader一直在尝试同步操作,这种情况你需要排查节点间的通信质量是否有问题。另外,3.2.1包含了一些存储引擎方面的稳定性优化,推荐用同样的场景实测。
basekv_meta_ver
No branches or pull requests
长时间压测后,线程wal-raft-executor-112680774442680320_0 CPU高,一直降不下来
集群3个节点(32C,64G)(20,54,124 三台),35w客户端,每隔10s发 40K body 压测,每隔10-12小时休眠 3分钟左右。大概2天后,20节点 wal-raft-executor-112680774442680320_0 线程 CPU 占用高,54 节点上 wal-raft-executor-112680774434029568_0 线程 CPU 占用高,而且一直降不下来,同时 basekv-range-mutator 线程 CPU也很高而且无法将来下。
但这期间集群正常,warn.log 和 error.log 都没有错误打印, gc 日志正常。balancer 日志中能搜到该线程
20 节点 cpu 截图
20 节点 retain.store-fd6e1d50-7308-4146-84fd-5fa62de36212.log
54 节点 cpu 截图
54 节点 inbox.store-0a40673e-7e57-47d6-8fa9-e69a2305152e.log
BifroMQ
To Reproduce
压测客户端,35w client, 每隔8.5S 发送 body 40k qos =0 的消息,每隔10-12小时休眠 3分钟以上
*** PUB Client ***:
Expected behavior
Logs
Configurations
OS(please complete the following information):
JVM:
Performance Related
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: