-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
124 lines (84 loc) · 19.3 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title><![CDATA[Kafka Producer Test]]></title>
<url>/2019/11/20/Kafka-Producer-Test/</url>
<content type="html"><![CDATA[<h2 id="环境配置"><a href="#环境配置" class="headerlink" title="环境配置"></a>环境配置</h2><p>5<em>(24 core, 64G men, 1T</em>12 disk)<br>CentOS6.5<br>JDK1.8.0_91<br>kafka_2.11-1.0.0<br>brokers:<br>export KAFKA_HEAP_OPTS=”-Xmx8G -Xms8G”<br>default.replication.factor = 2<br>num.io.threads = 16<br>num.network.threads = 8<br>num.replica.fetchers = 2<br>mun.partitions = 30</p>
<h2 id="测试用例"><a href="#测试用例" class="headerlink" title="测试用例"></a>测试用例</h2><p>测试服务端变动情况下生产者的运行情况及数据表现<br>考察重试机制,批量模式<br>case 0: Preferred Replica Election<br>case 1: Reassign Partitions<br>case 2: Add Partitions<br>case 3: Broker Down<br>case 4: Add Broker</p>
<h2 id="Producer配置"><a href="#Producer配置" class="headerlink" title="Producer配置"></a>Producer配置</h2><h3 id="producer0-默认配置"><a href="#producer0-默认配置" class="headerlink" title="producer0:默认配置"></a>producer0:默认配置</h3><p>bin/kafka-producer-perf-test.sh <br>–topic test <br>–num-records 6000 –record-size 100 –throughput 100 <br>–print-metrics <br>–producer-props bootstrap.servers=10.31.33.21:6667,10.31.33.23:6667,10.31.33.25:6667</p>
<h3 id="producer1-增加重试"><a href="#producer1-增加重试" class="headerlink" title="producer1: 增加重试"></a>producer1: 增加重试</h3><p>bin/kafka-producer-perf-test.sh <br>–topic test <br>–num-records 6000 –record-size 100 –throughput 100 <br>–print-metrics <br>–producer-props bootstrap.servers=10.31.33.21:6667,10.31.33.23:6667,10.31.33.25:6667 <br>retries=3 <br>retry.backoff.ms=1000</p>
<h2 id="测试过程"><a href="#测试过程" class="headerlink" title="测试过程"></a>测试过程</h2><h3 id="case0-producer0"><a href="#case0-producer0" class="headerlink" title="case0-producer0:"></a>case0-producer0:</h3><p>现象:日志报错<br>org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.<br>结论:数据丢失</p>
<h3 id="case0-producer1"><a href="#case0-producer1" class="headerlink" title="case0-producer1:"></a>case0-producer1:</h3><p>现象:<br>[2019-11-20 13:56:51,637] WARN [Producer clientId=producer-1] Got error produce response with correlation id 560 on topic-partition test-9, retrying (2 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)<br>结论:正常</p>
<h3 id="case1-producer0"><a href="#case1-producer0" class="headerlink" title="case1-producer0:"></a>case1-producer0:</h3><p>现象:日志报错<br>[2019-11-19 15:49:00,959] WARN [Producer clientId=producer-1] Received unknown topic or partition error in produce request on partition test-11. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)<br>org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.<br>[2019-11-19 15:49:01,030] WARN [Producer clientId=producer-1] Received unknown topic or partition error in produce request on partition test-16. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)<br>org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.<br>result:数据丢失</p>
<h3 id="case1-producer1"><a href="#case1-producer1" class="headerlink" title="case1-producer1:"></a>case1-producer1:</h3><p>数据少量丢失</p>
<h3 id="case2-producer0,case2-producer1"><a href="#case2-producer0,case2-producer1" class="headerlink" title="case2-producer0,case2-producer1:"></a>case2-producer0,case2-producer1:</h3><p>需要重启producer,才能往新partition写数据</p>
<h3 id="case3-producer0"><a href="#case3-producer0" class="headerlink" title="case3-producer0:"></a>case3-producer0:</h3><p>[2019-11-20 15:16:11,922] WARN [Producer clientId=producer-1] Got error produce response with correlation id 11898 on topic-partition test-12, retrying (4 attempts left). Error: NOT_LEADER_FOR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)<br>[2019-11-20 15:16:12,234] WARN [Producer clientId=producer-1] Connection to node 1001 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)<br>丢失数据</p>
<h3 id="case3-producer1"><a href="#case3-producer1" class="headerlink" title="case3-producer1:"></a>case3-producer1:</h3><p>丢失少量数据</p>
<h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>尽力避免导致分区重分配的操作,暂停producer进行操作</p>
<h2 id="kafka-工具应用"><a href="#kafka-工具应用" class="headerlink" title="kafka 工具应用"></a>kafka 工具应用</h2><h3 id="分区重新分布,可用于增加、减少副本数,重新均衡分区分布;慎用,会导致producer丢失数据"><a href="#分区重新分布,可用于增加、减少副本数,重新均衡分区分布;慎用,会导致producer丢失数据" class="headerlink" title="分区重新分布,可用于增加、减少副本数,重新均衡分区分布;慎用,会导致producer丢失数据"></a>分区重新分布,可用于增加、减少副本数,重新均衡分区分布;慎用,会导致producer丢失数据</h3><p>bin/kafka-reassign-partitions.sh –zookeeper 10.31.33.29:2181 –reassignment-json-file rap.json –execute<br>{“version”:1,<br>“partitions”:[{“topic”:”test”,”partition”:0,”replicas”:[0,1,2]}]}</p>
<h3 id="触发leader选举"><a href="#触发leader选举" class="headerlink" title="触发leader选举"></a>触发leader选举</h3><p>bin/kafka-preferred-replica-election.sh –zookeeper 10.31.33.29:2181 –path-to-json-file ./pre.json<br>{<br> “partitions”:<br> [<br> {“topic”: “test”, “partition”: 0}<br> ]<br>}</p>
]]></content>
<categories>
<category> 大数据 </category>
</categories>
<tags>
<tag> java </tag>
<tag> kafka </tag>
</tags>
</entry>
<entry>
<title><![CDATA[记一次RESTAPI应用崩溃复盘]]></title>
<url>/2019/11/01/%E8%AE%B0%E4%B8%80%E6%AC%A1RESTAPI%E5%BA%94%E7%94%A8%E5%B4%A9%E6%BA%83%E5%A4%8D%E7%9B%98/</url>
<content type="html"><![CDATA[<p>记一次RESTAPI应用崩溃复盘,第一次写博,没有代码样例,没有日志截图,全凭想象。<br>同一个世界,不同的场景,本文仅代表个人经验。</p>
<a id="more"></a>
<h1 id="构架环境"><a href="#构架环境" class="headerlink" title="构架环境"></a>构架环境</h1><p>centos6.5<br>lvs<br>jdk1.8.0_91<br>tomcat9.0.20<br>spring-boot2.1.3<br>oracle11.2.0.4<br>ojdbc6<br>JAVA_OPTS=”$JAVA_OPTS<br>-Xms8g -Xmx8g -Xss256k<br>-XX:+UseParNewGC<br>-XX:+UseConcMarkSweepGC<br>-XX:CMSInitiatingOccupancyFraction=75<br>-XX:+UseCMSInitiatingOccupancyOnly<br>-XX:+DisableExplicitGC<br>-XX:+HeapDumpOnOutOfMemoryError”</p>
<h1 id="工作流程"><a href="#工作流程" class="headerlink" title="工作流程"></a>工作流程</h1><p>一台API提供HTTP服务,后端通过lvs负载访问两台oracle数据库</p>
<h1 id="场景还原"><a href="#场景还原" class="headerlink" title="场景还原"></a>场景还原</h1><p>每隔5分钟高频访问,5kQPS峰值持续1分钟;每隔一个小时有大查询持续2分钟<br>监控发现大量502错误,tomcat进程CPU使用率1000%,10分钟内服务僵死</p>
<h1 id="调整过程"><a href="#调整过程" class="headerlink" title="调整过程"></a>调整过程</h1><p>0、查看日志,发现内存溢出异常</p>
<p>1、首先观察到的是内存问题,jstat发现old区内存爆满,持续FULL GC;增大内存到16G并调整为G1GC策略;</p>
<p>2、运行1小时后发现内存还是爆了,排查发现大量线程内存不能释放,<br> 查看tomcat参数,maxThreads=”32000”怀疑该参数过大,通过几轮调整测试发现每线程占用20M-25M内存<br> 决定设置maxThreads=”96”再观察;</p>
<p>3、运行一天后服务在某个时间点僵死,再查看日志,发现大量无法获取数据库连接错误,排查代码没有发现泄漏<br> 通过TCP查看物理连接情况,发现实际连接较少;再观察数据库端session数量,发现session数量已经到达max值,<br> 大量idle连接等待超时;<br> 此时重点排查数据库连接过程:APP->LVS->DB;首先绕过LVS直连DB,APP恢复正常,问题得到解决!<br> 再排查LVS参数:persistence_timeout 60,ipvsadm –set 120 5 60;发现是LVS保持连接时间过短,导致TCP断开<br> 而数据库端依旧保持连接等待idle超时;每5分钟查询峰值到达APP时,连接池检查连接不可用,丢弃后重新创建连接;<br> 但数据库idle远大于5分钟故不释放session,如此往复创建连接导致数据库端连接用完,印证了现象3;</p>
<p>4、回顾升级前,同样是APP->LVS->DB架构,却没有发生异常;猜测是所用的连接池差异导致该事故,以前用的是C3P0,<br> 升级以后使用HikariCP;经过几轮测试发现C3P0具有连接保活机制,在连接idle是主动发送探测请求到数据库,<br> 这个探测间隔是60秒,与LVS保持连接时间一致。HikariCP没有idle保活机制,它只在从池中获取连接时检查连接可用性。</p>
<p>5、解决方案:1、调整lvs超时时间,我们的lvs有多个业务公用,且lvs无法单独设置某个VIP的超时时间<br> 2、直连数据库,绕过lvs超时机制 (仍然要考虑TCP层超时)<br> 3、调整数据库idle时间,我们公用数据库会影响其他业务<br> 4、应用层主动闲时探测保活,需要连接池支持:c3p0、druid<br> 5、加强监控,app探测,日志监控,数据库监控,从多角度早发现问题</p>
<p>6、扩展研究,支持idle保活的连接池有C3P0,阿里druid<br>c3p0测试通过:网络流传性能比较差,但本场景下表现稳定</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><bean id="logDataSource"</span><br><span class="line"> class="com.mchange.v2.c3p0.ComboPooledDataSource"</span><br><span class="line"> destroy-method="close"></span><br><span class="line"> <property name="jdbcUrl" value="${url}" /></span><br><span class="line"> <property name="driverClass" value="${driverClassName}" /></span><br><span class="line"> <property name="user" value="${user}" /></span><br><span class="line"> <property name="password" value="${passwd}" /></span><br><span class="line"> <property name="maxPoolSize" value="${maxPoolSize}" /></span><br><span class="line"> <property name="minPoolSize" value="#{${maxPoolSize}/3}" /></span><br><span class="line"> <property name="acquireIncrement" value="#{${maxPoolSize}/2}" /></span><br><span class="line"> <property name="testConnectionOnCheckin" value="true" /></span><br><span class="line"> <property name="checkoutTimeout" value="120000" /></span><br><span class="line"> <property name="idleConnectionTestPeriod" value="120" /><!--定时检查空闲连接活性--></span><br><span class="line"> <property name="preferredTestQuery" value="select 1 from dual" /><!--活性探测SQL--></span><br><span class="line"></bean></span><br></pre></td></tr></table></figure>
<p>druid测试失败:使用timeBetweenEvictionRunsMillis、minEvictableIdleTimeMillis两个个参数来控制连接的快速逐出,<br> 使用minIdle、keepAlive来做idle连接保活,<br> 测试发现逐出执行时间并不是设置的时间,测试值大概是设置值的3倍左右;<br> 测试发现个别连接既没有执行逐出也没有执行保活动作,直到数据库端session超时,<br> 导致一段时间后从数据库端发现连接数小于minIdle,并且在逐渐减少,<br> 而从druid监控来看依然有minIdle个连接,并且KACheckCnt正常增加;</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><bean id="dataSource" class="com.alibaba.druid.pool.DruidDataSource"</span><br><span class="line"> init-method="init" destroy-method="close"></span><br><span class="line"> <property name="url" value="${url}" /></span><br><span class="line"> <property name="username" value="${user}" /></span><br><span class="line"> <property name="password" value="${passwd}" /></span><br><span class="line"> <property name="maxActive" value="${maxPoolSize}" /></span><br><span class="line"> <property name="initialSize" value="5" /></span><br><span class="line"> <property name="minIdle" value="5" /></span><br><span class="line"> <property name="maxWait" value="60000" /></span><br><span class="line"> <property name="timeBetweenEvictionRunsMillis" value="60000" /></span><br><span class="line"> <property name="minEvictableIdleTimeMillis" value="60000" /></span><br><span class="line"> <property name="validationQuery" value="select 1 from dual" /></span><br><span class="line"> <property name="testWhileIdle" value="true" /></span><br><span class="line"> <property name="testOnBorrow" value="false" /></span><br><span class="line"> <property name="keepAlive" value="true" /></span><br><span class="line"> <property name="filters" value="stat" /></span><br><span class="line"></bean></span><br></pre></td></tr></table></figure>
]]></content>
<categories>
<category> Web后端 </category>
</categories>
<tags>
<tag> java </tag>
<tag> spring </tag>
<tag> jdbc </tag>
<tag> tomcat </tag>
</tags>
</entry>
<entry>
<title><![CDATA[Hello World]]></title>
<url>/2019/08/26/Hello_world/</url>
<content type="html"><![CDATA[<div align="center">
开博立贴
</div>
<h2 id="鸣谢"><a href="#鸣谢" class="headerlink" title="鸣谢"></a>鸣谢</h2><p><a href="https://blog.csdn.net/sinat_37781304/article/details/82729029" target="_blank" rel="noopener">教程</a><br><a href="https://github.com/hexojs/hexo" target="_blank" rel="noopener">官方</a><br><a href="https://github.com/codefine/hexo-theme-mellow" target="_blank" rel="noopener">主题</a></p>
<h2 id="写一篇博文"><a href="#写一篇博文" class="headerlink" title="写一篇博文"></a>写一篇博文</h2><h3 id="新建文章"><a href="#新建文章" class="headerlink" title="新建文章"></a>新建文章</h3><p>hexo new “Kafka Producer Test”</p>
<h3 id="编辑文章"><a href="#编辑文章" class="headerlink" title="编辑文章"></a>编辑文章</h3><p>vim source/_posts/XXX.md</p>
<h3 id="生成blog并启动本地服务"><a href="#生成blog并启动本地服务" class="headerlink" title="生成blog并启动本地服务"></a>生成blog并启动本地服务</h3><p>hexo clean && hexo g && hexo server</p>
<h3 id="部署到github"><a href="#部署到github" class="headerlink" title="部署到github"></a>部署到github</h3><p>hexo d</p>
<a id="more"></a>
<h2 id="提交文章源代码"><a href="#提交文章源代码" class="headerlink" title="提交文章源代码"></a>提交文章源代码</h2><p>将整个hexo划分为四个部分来用git管理:<br>hexo代码:npm安装,独立仓库,一般不修改<br>主题代码:fork别人的作品,作为单独的仓库<br>文章源码:作为blog库的一个分支source<br>blog产出:blog库master分支</p>
<h3 id="1、hexo主目录下-gitignore添加排除部署相关目录"><a href="#1、hexo主目录下-gitignore添加排除部署相关目录" class="headerlink" title="1、hexo主目录下.gitignore添加排除部署相关目录"></a>1、hexo主目录下.gitignore添加排除部署相关目录</h3><p>.DS_Store<br>Thumbs.db<br>db.json<br><em>.log<br>node_modules/<br>public/<br>.deploy</em>/<br>themes/</p>
<h3 id="2、提交文章源代码"><a href="#2、提交文章源代码" class="headerlink" title="2、提交文章源代码"></a>2、提交文章源代码</h3><p>git add . && git commit -m”add new blog”</p>
<h3 id="3、push文章源代码到github"><a href="#3、push文章源代码到github" class="headerlink" title="3、push文章源代码到github"></a>3、push文章源代码到github</h3><h4 id="a、关联远程仓库"><a href="#a、关联远程仓库" class="headerlink" title="a、关联远程仓库"></a>a、关联远程仓库</h4><p>git remote add github <a href="https://github.com/etansens/etansens.github.io.git" target="_blank" rel="noopener">https://github.com/etansens/etansens.github.io.git</a></p>
<h4 id="b、push到分支"><a href="#b、push到分支" class="headerlink" title="b、push到分支"></a>b、push到分支</h4><p>git push github master:source</p>
]]></content>
<categories>
<category> Blog </category>
</categories>
<tags>
<tag> hexo </tag>
<tag> mellow </tag>
</tags>
</entry>
</search>