MysqlIO.readFully問題排查分析過程

問題描述

支付清結(jié)算系統(tǒng)"外部對(duì)賬"部分任務(wù)在早上6:00~7:00時(shí)間段內(nèi)對(duì)賬速度緩慢,且經(jīng)常造成對(duì)賬統(tǒng)計(jì)結(jié)果不準(zhǔn)確問題,導(dǎo)致結(jié)轉(zhuǎn)記賬不準(zhǔn)確。

初步分析及處理過程

由于外部對(duì)賬是經(jīng)過性能優(yōu)化的,且在壓測(cè)環(huán)境表現(xiàn)良好,所以最開始初步認(rèn)為是由于數(shù)據(jù)庫性能瓶頸導(dǎo)致,初步處理方式為將線上6:00~7:00之前的慢SQL進(jìn)行了一番排查,并對(duì)涉及外部系統(tǒng)的慢SQL進(jìn)行了優(yōu)化,這個(gè)步驟理論上是可以解決此類問題,但RDS(阿里云Mysql可能存在慢SQL統(tǒng)計(jì)不準(zhǔn)確問題)。

特點(diǎn)時(shí)間段內(nèi)線程棧分析

特定時(shí)間段線程棧分析,由于SQL優(yōu)化部分并沒有解決此類問題,為了查看在早上6:00~7:00這個(gè)時(shí)間段內(nèi)程序JVM內(nèi)部到底發(fā)生了什么,所以采取了定時(shí)抓取線程棧日志的手段(具體方式為在Linux服務(wù)器添加cron任務(wù),并累計(jì)輸出日志的方式,腳本如??)。

clearjstack.sh

#!/bin/sh
DATE='/bin/date'
timestamp() {
$DATE +'%Y-%m-%d %H:%M:%S'
}
ts=`timestamp`
echo "start log $ts" >> /tmp/clearjstack.log
/usr/java/jdk1.7.0_67/bin/jstack -l `/usr/java/jdk1.7.0_67/bin/jps -lvm | grep tomcat-clear | awk '{print $1}'` >> /tmp/clearjstack.log
ts=`timestamp`
echo "end log $ts" >> /tmp/clearjstack.log

Linux下添加定時(shí)任務(wù)的方式:

//添加計(jì)劃任務(wù)
[root@123.57.28.241/10.172.233.8 tmp]cd /var/spool/cron/
[root@123.57.28.241/10.172.233.8 cron]pwd
/var/spool/cron/具體用戶名(如為root用戶)
[root@123.57.28.241/10.172.233.8 cron]vim root
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
//查看計(jì)劃任務(wù)
[root@123.57.28.241/10.172.233.8 cron]crontab -l
0 0 * * * /usr/local/script/Cutlog.sh >/dev/null 2>&1
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1

對(duì)線程棧日志分析(1)

#clearjstack.log

start log 2016-11-24 06:20:01
2016-11-24 06:20:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):

"Thread-70" prio=10 tid=0x00007f21f4022800 nid=0x12da runnable [0x00007f216cace000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <0x0000000786facbc8> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:493)
at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55)
at com.sun.proxy.$Proxy84.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:54)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:115)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:46)
at com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor.intercept(OffsetLimitInterceptor.java:66)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:57)
at com.sun.proxy.$Proxy82.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:62)
at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358)
at com.sun.proxy.$Proxy29.selectOne(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:163)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:63)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
at com.sun.proxy.$Proxy87.countOrgSettleFlow(Unknown Source)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl.isExistOrgSettleFlow(OrgSettleFlowServiceImpl.java:45)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$FastClassBySpringCGLIB$$c921c9ca.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$6b2e2fc1.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078c795ed8> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078deebbf0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Thread-69" prio=10 tid=0x00007f21f401c000 nid=0x12d9 waiting for monitor entry [0x00007f216ccd1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078defb298> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Thread-68" prio=10 tid=0x00007f21f401b000 nid=0x12d8 waiting for monitor entry [0x00007f216c7cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Locked ownable synchronizers:
- <0x000000078deec880> (a java.util.concurrent.ThreadPoolExecutor$Worker)

從上述日志分析線程Thread-68、Thread-69都為阻塞狀態(tài),且阻塞狀態(tài)

smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)

所對(duì)應(yīng)的代碼主要體現(xiàn)在對(duì)賬結(jié)果寫入部分,另外從日志

- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)

所以此時(shí)很容易覺得問題出現(xiàn)在連接池與MySQL數(shù)據(jù)庫本身的問題上。按照這個(gè)思路(JVM日志很豐富,去偽存真,往往需要嘗試與不斷試錯(cuò)),開始了對(duì)連接池本身及MySQL本身問題的排查(Google),以下為幾種Google結(jié)果鏈接及原因概述。

1)、網(wǎng)絡(luò)原因&MySQL設(shè)置&換連接池

在以上鏈接所述的原因中,重點(diǎn)是說了MySQL及網(wǎng)絡(luò)方面的原因。針對(duì)這種敘述,與運(yùn)維大哥溝通后調(diào)取了阿里云RDS MySQL實(shí)例的參數(shù)設(shè)置,如下:

show global variables like '%timeout%'

最后一個(gè)參數(shù)wait_timeout表明Mysql服務(wù)器與客戶端連接空閑自動(dòng)斷開的超時(shí)時(shí)間為24小時(shí),也就是說只有連接在與MySQL服務(wù)器連接后24小時(shí)內(nèi)無任何操作,MySQL服務(wù)器才會(huì)主動(dòng)斷開與客戶端(這里為應(yīng)用連接池中與MySQL數(shù)據(jù)庫建立的連接),然而對(duì)于應(yīng)用來說,這種可能性基本不存在,況且大多數(shù)連接池配置都有對(duì)連接是否可用以及自動(dòng)探測(cè)連接的配置。

查看應(yīng)用連接池配置:

<property name="timeBetweenEvictionRunsMillis" value="3000" />
<property name="minEvictableIdleTimeMillis" value="300000" />
<property name="validationQuery" value="SELECT 'x'" />
<property name="testWhileIdle" value="true" />
<property name="testOnBorrow" value="true" />
<property name="testOnReturn" value="true" />
<property name="poolPreparedStatements" value="false" />
<property name="maxPoolPreparedStatementPerConnectionSize" value="-1" />

按照上述配置基本上不會(huì)出現(xiàn)連接長(zhǎng)時(shí)間與MySQL服務(wù)器無任何交互的可能性,除非連接池本身的處理存在問題(druid連接池為阿里高性能數(shù)據(jù)庫連接池),但是出于懷疑的態(tài)度,按照鏈接中的說法druid在處理長(zhǎng)時(shí)間連接等待方面可能不如c3p0好,也不排除druid本身存在問題。但出于謹(jǐn)慎角度,并沒有立刻采用對(duì)連接池進(jìn)行替換的措施,而是將焦點(diǎn)轉(zhuǎn)移到應(yīng)用服務(wù)器與MySQL服務(wù)器網(wǎng)絡(luò)連接上(也確實(shí)有人是因?yàn)榫W(wǎng)絡(luò)防火墻設(shè)置導(dǎo)致出現(xiàn)此類問題的)。

2)、網(wǎng)絡(luò)連接參數(shù)調(diào)整

由于阿里云底層網(wǎng)絡(luò)結(jié)構(gòu)并不對(duì)用戶透明,且遇到問題的應(yīng)用存在批量大數(shù)據(jù)插入的使用場(chǎng)景,確實(shí)是存在TCP緩沖區(qū)不夠的可能性,通過對(duì)MySQL官方文檔相關(guān)章節(jié)的閱讀,MySQL連接串參數(shù)中確實(shí)提供了tcpRcvBuf這樣的參數(shù),其對(duì)該參數(shù)的說明是:“connecting using TCP/IP, should the driver set SO_RCV_BUF to the given value? The default value of '0', means use the platform default value for this property)”。
于是覺得這種可能性會(huì)相對(duì)大些,所以在線上調(diào)整了連接參數(shù)而并沒有更換連接池(如若不行,可再嘗試替換連接池),參數(shù)調(diào)整如下:

jdbc:mysql://xxx.xxx.xx.xx:3306/xx?useUnicode=true&characterEncoding=UTF-8&tcpRcvBuf=1024000&autoReconnect=true&failOverReadOnly=false&connectTimeout=0

連接TCP緩沖區(qū)的大小被調(diào)整為1M。

對(duì)線程棧日志分析(2)

在4中通過對(duì)網(wǎng)絡(luò)連接參數(shù)的調(diào)整后,并沒有起到作用,于是對(duì)線程棧日志再次進(jìn)行了調(diào)取,并進(jìn)行了地毯式閱讀,終于發(fā)現(xiàn)了問題的癥結(jié):

at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$5775a83.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078b4d8e98> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x00000007880399f0> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterPara

通過對(duì)比多日問題時(shí)段的線程棧日志,發(fā)現(xiàn)都存在兩處代碼都指向同一個(gè)方法,即isExistOrgSettleFlow,于是找到代碼所對(duì)應(yīng)的方法,發(fā)現(xiàn)是由于該方法高頻訪問了一個(gè)未加相關(guān)索引的大數(shù)據(jù)表,導(dǎo)致方法本身的執(zhí)行存在阻塞及鎖住的情況,占用了大量連接資源,從而導(dǎo)致4中所看到的其他方法拿不到數(shù)據(jù)庫連接出現(xiàn)長(zhǎng)時(shí)間等待,甚至超時(shí)的情況。加上索引后,隔日觀察問題消失。

總結(jié)

  • 對(duì)陌生問題的分析,需要通過各類分析手段從而找到問題所在并進(jìn)行解決(這其中需要運(yùn)用各類分析工具Linux命令、JDK命令、網(wǎng)絡(luò)命令等等);

  • 要心存思辨之心,對(duì)傳統(tǒng)的權(quán)威的東西在敬畏的同時(shí)也要大膽的懷疑并進(jìn)行分析、嘗試,切不可畏懼不前;

  • 在編碼方面,對(duì)代碼類、方法的命名一定要合理、規(guī)范,不然日后出現(xiàn)疑難問題在排查時(shí),會(huì)給人造成錯(cuò)覺從而延長(zhǎng)解決時(shí)間;

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

  • Swift1> Swift和OC的區(qū)別1.1> Swift沒有地址/指針的概念1.2> 泛型1.3> 類型嚴(yán)謹(jǐn) 對(duì)...
    cosWriter閱讀 11,689評(píng)論 1 32
  • ORA-00001: 違反唯一約束條件 (.) 錯(cuò)誤說明:當(dāng)在唯一索引所對(duì)應(yīng)的列上鍵入重復(fù)值時(shí),會(huì)觸發(fā)此異常。 O...
    我想起個(gè)好名字閱讀 6,024評(píng)論 0 9
  • 前言 記錄個(gè)人在2017年08月的學(xué)習(xí)和總結(jié),不定期更新 2017-08-02 有序的Map HashMap是無序...
    Kevin_ZGJ閱讀 451評(píng)論 0 0
  • 1.A simple master-to-slave replication is currently being...
    Kevin關(guān)大大閱讀 6,263評(píng)論 0 3
  • 如果有人跟你說我不要這個(gè)需求,你只要這樣…這樣就可以了。 對(duì)于有些人,我覺得T在說謊,然后會(huì)提出更好的條件。 對(duì)于...
    你便是我的晴天閱讀 67評(píng)論 0 0

友情鏈接更多精彩內(nèi)容