Java線程泄露的分析與處理

1. 生產(chǎn)環(huán)境的異常現(xiàn)象及初步分析

最近發(fā)現(xiàn)系統(tǒng)程序內(nèi)存消耗越來(lái)越大,開(kāi)始并沒(méi)特別注意,就簡(jiǎn)單調(diào)了一下jvm參數(shù)。但直到前些天內(nèi)存爆滿,持續(xù)Full GC,這肯定出現(xiàn)了內(nèi)存泄露。

原以為哪里出現(xiàn)了比較低級(jí)的錯(cuò)誤,所以很直接想到先去看看程序是在跑哪段代碼。jstack -l <pid>以后,居然有上千個(gè)線程,而且都是屬于RUNNING并WAIT的狀態(tài)。

I/O dispatcher 125" #739 prio=5 os_prio=0 tid=0x0000000002394800 nid=0x1e2a runnable [0x00007f5c2125b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x00000007273401d0> (a sun.nio.ch.Util$2)
        - locked <0x00000007273401c0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000007273401e0> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:257)
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:106)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:590)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

"pool-224-thread-1" #738 prio=5 os_prio=0 tid=0x00007f5c463f4000 nid=0x1e29 runnable [0x00007f5c2024b000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        - locked <0x0000000727340478> (a sun.nio.ch.Util$2)
        - locked <0x0000000727340468> (a java.util.Collections$UnmodifiableSet)
        - locked <0x0000000727340488> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:342)
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:191)
        at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

我以下的思考路徑都未能解決(自己記錄一下,看官可以跳過(guò)...)

  1. 查看線程的stack,看調(diào)用處是否有問(wèn)題。這個(gè)一般都能解決問(wèn)題,但是上面的異常線程棧確實(shí)沒(méi)什么信息量,無(wú)法定位。

  2. Google了一下有關(guān)大量這個(gè)線程停在epollwait的資料,發(fā)現(xiàn)這個(gè)現(xiàn)象和epoll nio的bug是一樣的,還以為碰到了一個(gè)無(wú)法處理的高級(jí)問(wèn)題。第一反應(yīng)就是去HttpClient的官網(wǎng)查bug日志,結(jié)果還真發(fā)現(xiàn)了最近的升級(jí)有解決類似問(wèn)題的,然后升級(jí)到最新版問(wèn)題依舊。但是最后仔細(xì)想想,也確實(shí)不太可能,畢竟應(yīng)用場(chǎng)景還是比較普通的。

  3. jmap -histo <pid>看了一下對(duì)象,結(jié)果發(fā)現(xiàn)存在InternalHttpAsyncClient數(shù)量和泄露的線程數(shù)量剛好相等,所以基本就確定是這個(gè)對(duì)象的創(chuàng)建和回收有問(wèn)題。但是這是誰(shuí)創(chuàng)建的?

  4. 查了調(diào)用棧和異常對(duì)象的package,發(fā)現(xiàn)是HttpClient的,把本地所有相關(guān)調(diào)用都查了一遍,看起來(lái)寫的也都是對(duì)的。

  5. 搬出jvirtualvm的性能分析工具,發(fā)現(xiàn)只能看到泄露現(xiàn)象,無(wú)法定位問(wèn)題。

這下懵逼了,剛好忙其他事,就放了幾天順帶考慮一下,還好泄露比較慢,問(wèn)題處理不著急。。。

2. 線程泄露的分析方法

處理這個(gè)問(wèn)題的關(guān)鍵:必須準(zhǔn)確知道是什么泄露了線程!

在Google過(guò)程中突然受到啟發(fā),JDK中的工具是應(yīng)該可以分析引用的。最后發(fā)現(xiàn)jhat - Java Heap Analysis Tool正是我要的。

最終解決方式:

  1. j**map -F -dump:format=b,file=tomcat.bin <pid>** 導(dǎo)出tomcat的內(nèi)存

  2. jhat -J-Xmx4g <heap dump file> 分析Heap中的信息(注意:分析非常消耗CPU和內(nèi)存,盡量在配置較好的機(jī)器上運(yùn)行)

  3. 查看相關(guān)對(duì)象的reference,OQL也可以用,但是網(wǎng)頁(yè)版直接點(diǎn)鏈接也夠用了。

3. 鎖定原因并解決

從之前異常heap中發(fā)現(xiàn)存在的問(wèn)題對(duì)象有如下這些:

$ cat histo | grep org.apache.http. | grep 1944 | less 
 197:          1944         217728  org.apache.http.impl.nio.conn.ManagedNHttpClientConnectionImpl 232:          1944         171072  org.apache.http.impl.nio.conn.CPool 233:          1944         171072  org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor 248:          1944         155520  org.apache.http.impl.nio.reactor.BaseIOReactor 249:          1944         155520  org.apache.http.impl.nio.reactor.IOSessionImpl 276:          1944         139968  org.apache.http.impl.nio.client.InternalHttpAsyncClient 277:          1944         139968  org.apache.http.impl.nio.conn.CPoolEntry 323:          1944         108864  org.apache.http.impl.nio.client.MainClientExec 363:          1944          93312  org.apache.http.impl.nio.codecs.DefaultHttpResponseParser 401:          1944          77760  org.apache.http.impl.nio.reactor.SessionInputBufferImpl 402:          1944          77760  org.apache.http.impl.nio.reactor.SessionOutputBufferImpl 403:          1944          77760  org.apache.http.nio.protocol.HttpAsyncRequestExecutor$State 442:          1944          62208  org.apache.http.impl.cookie.DefaultCookieSpecProvider 443:          1944          62208  org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager 444:          1944          62208  org.apache.http.nio.conn.ssl.SSLIOSessionStrategy 445:          1944          62208  org.apache.http.nio.pool.AbstractNIOConnPool$2
 511:          1944          46656  [Lorg.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker; 512:          1944          46656  [Lorg.apache.http.impl.nio.reactor.BaseIOReactor; 513:          1944          46656  org.apache.http.conn.ssl.DefaultHostnameVerifier 514:          1944          46656  org.apache.http.impl.cookie.DefaultCookieSpec 515:          1944          46656  org.apache.http.impl.cookie.NetscapeDraftSpecProvider 516:          1944          46656  org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1
 517:          1944          46656  org.apache.http.impl.nio.client.InternalIODispatch 518:          1944          46656  org.apache.http.impl.nio.codecs.DefaultHttpRequestWriter 519:          1944          46656  org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$ConfigData 520:          1944          46656  org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddre***esolver 521:          1944          46656  org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalConnectionFactory 522:          1944          46656  org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker 523:          1944          46656  org.apache.http.nio.protocol.HttpAsyncRequestExecutor 603:          1944          31104  org.apache.http.client.protocol.RequestExpectContinue 604:          1944          31104  org.apache.http.conn.routing.BasicRouteDirector 605:          1944          31104  org.apache.http.impl.auth.HttpAuthenticator 606:          1944          31104  org.apache.http.impl.conn.DefaultRoutePlanner 607:          1944          31104  org.apache.http.impl.cookie.IgnoreSpecProvider 608:          1944          31104  org.apache.http.impl.nio.SessionHttpContext 609:          1944          31104  org.apache.http.impl.nio.reactor.AbstractIOReactor$1
 610:          1944          31104  org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$DefaultThreadFactory 611:          1944          31104  org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback

接下來(lái)要找出到底誰(shuí)new了這些對(duì)象,這些異常Object中很多是內(nèi)部field,所以要先找出最外層的對(duì)象。這個(gè)就只是邊猜邊看了,結(jié)果發(fā)現(xiàn)就是InternalHttpAsyncClient。點(diǎn)開(kāi)進(jìn)去看了一下,發(fā)現(xiàn)有一堆Instance,最后了發(fā)現(xiàn)泄露的對(duì)象。也可以用OQL select referrers(c) from org.apache.http.impl.nio.client.InternalHttpAsyncClient c

instance of org.apache.http.impl.nio.client.InternalHttpAsyncClient@0x932be638 (128 bytes)

Class:

class org.apache.http.impl.nio.client.InternalHttpAsyncClientInstance data members:


References to this object:

org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1@0x932be6c8 (40 bytes) : field this$0
com.aliyun.mqs.common.http.DefaultServiceClient@0x931cc588 (32 bytes) : field httpClient`

這里的信息就是阿里云的mqs創(chuàng)建了這些對(duì)象。去看了一下代碼,書寫看似沒(méi)有問(wèn)題,實(shí)際上,連接壓根忘記關(guān)了。有問(wèn)題的阿里云MQS文檔是這個(gè),但是最新版本的官網(wǎng)文檔已經(jīng)改用了org.eclipse.jetty.client.HttpClient,也是沒(méi)有顯式調(diào)用stop函數(shù),希望這個(gè)類庫(kù)不會(huì)出現(xiàn)此問(wèn)題。

@Service
public class AliyunService implements IAliyunService {
    private static Logger logger = Logger.getLogger(AliyunService.class.getName());

    @Autowired
    private AliyunConfig aliyunConfig;    

    @Override
    public void sendMessage(String content) {
        MQSClient client = new DefaultMQSClient(aliyunConfig.mqEndpoint, aliyunConfig.mqAccessId, aliyunConfig.mqAccessKey);
        String queueName = aliyunConfig.mqQueue;        try {
            CloudQueue queue = client.getQueueRef(queueName);            
            // queue沒(méi)做關(guān)閉處理,應(yīng)該最后加上
            // finally{ queue.close(); }
            Message message = new Message();
            message.setMessageBody(content);
            queue.putMessage(message);
        } catch (Exception e) {
            logger.warning(e.getMessage());
        }
    }

}

以下是MQS給的jar中相應(yīng)關(guān)閉的源碼

public final class CloudQueue {    private ServiceClient serviceClient;
    ...    
    public void close() {        
        if(this.serviceClient != null) {            
            this.serviceClient.close();
        }
    }

}

真相大白!至此修改后,問(wèn)題順利解決。

4. 總結(jié)

首先,這個(gè)問(wèn)題的解決確實(shí)還是要善用并熟悉JDK工具*,之前對(duì)jhat的理解不深,導(dǎo)致第一時(shí)間沒(méi)有想到這個(gè)解決方案。日后再有內(nèi)存問(wèn)題,會(huì)有更犀利的解決方法了。

其次,熟悉了線程泄露的現(xiàn)象,解決方式還是去找線程的對(duì)象,說(shuō)到底,還是對(duì)象的泄露。

歡迎工作一到五年的Java工程師朋友們加入Java架構(gòu)開(kāi)發(fā): 957734884,群內(nèi)提供免費(fèi)的Java架構(gòu)學(xué)習(xí)資料(里面有高可用、高并發(fā)、高性能及分布式、Jvm性能調(diào)優(yōu)、Spring源碼,MyBatis,Netty,Redis,Kafka,Mysql,Zookeeper,Tomcat,Docker,Dubbo,Nginx等多個(gè)知識(shí)點(diǎn)的架構(gòu)資料)合理利用自己每一分每一秒的時(shí)間來(lái)學(xué)習(xí)提升自己,不要再用"沒(méi)有時(shí)間“來(lái)掩飾自己思想上的懶惰!趁年輕,使勁拼,給未來(lái)的自己一個(gè)交代!

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容