短鏈接的time_out的問(wèn)題(我們組的人對(duì)線上業(yè)務(wù)的改進(jìn))
問(wèn)題現(xiàn)象:
yp_fras前段時(shí)間經(jīng)常會(huì)大量出現(xiàn)錯(cuò)誤號(hào)為110的connection time out 錯(cuò)誤
691: yp_wp075v.add.bjdt.qihoo.net [2016/06/12:14:52:03] access [xxx.xxx.xxx.xxx:9869] table: yp_fras catch exception in __construct: [BadaSocket: Could not connect to xxx.xxx.xxx.xxx:9869 (連接超時(shí) [110])]
692: yp_wp023v.add.bjdt.qihoo.net [2016/06/12:14:52:03] access [xxx.xxx.xxx.xxx:9869] table: yp_fras catch exception in __construct: [BadaSocket: Could not connect to xxx.xxx.xxx.xxx:9869 (連接超時(shí) [110])]
693: yp_wp089v.add.bjdt.qihoo.net [2016/06/12:14:52:03] access [xxx.xxx.xxx.xxx:9869] table: yp_fras catch exception in __construct: [BadaSocket: Could not connect to xxx.xxx.xxx.xxx:9869 (連接超時(shí) [110])]
694: yp_wp080v.add.bjdt.qihoo.net [2016/06/12:14:52:07] access [xxx.xxx.xxx.xxx:9869] table: yp_fras catch exception in __construct: [BadaSocket: Could not connect to xxx.xxx.xxx.xxx:9869 (連接超時(shí) [110])]
1: yp_wp054v.add.bjdt.qihoo.net [2016/0
問(wèn)題結(jié)論
這個(gè)問(wèn)題是由大量的短鏈接造成的,據(jù)初步統(tǒng)計(jì),有近百臺(tái)客戶機(jī)上部署有訪問(wèn)bjdt機(jī)房yp_vrs的客戶端,每個(gè)客戶機(jī)上又有不少的客戶端。當(dāng)這些客戶端在某個(gè)時(shí)刻以短鏈接的方式集中訪問(wèn)服務(wù)端時(shí),TCP連接建立的壓力是比較驚人的。
客戶端大量的短連接請(qǐng)求,使得服務(wù)端的listen端口的ACCEPT隊(duì)列產(chǎn)生溢出,從而不接受新的連接請(qǐng)求,連接失敗,導(dǎo)致”[110][connection time out]”報(bào)錯(cuò)。
改進(jìn)意見(jiàn)
提高客戶端的鏈接超時(shí)限制。當(dāng)前是300ms,比如可以提升到3s等;(治標(biāo)不治本)
提高服務(wù)端的somaxconn限制,這也是個(gè)治標(biāo)不治本的方法,只能是一定程度的緩解。(修改內(nèi)核的其他的網(wǎng)絡(luò)參數(shù)也是一樣,只能是緩解,并不能解決根本問(wèn)題)
在客戶端使用連接緩沖池,將短鏈接轉(zhuǎn)換成長(zhǎng)鏈接來(lái)使用(個(gè)人認(rèn)為這個(gè)才是更好的辦法,一勞永逸)
問(wèn)題分析
Linux的服務(wù)端從listen的端口建立的連接要經(jīng)過(guò)兩個(gè)隊(duì)列的過(guò)渡,分別是SYN隊(duì)列和ACCEPT隊(duì)列。服務(wù)端接受到SYN請(qǐng)求后,會(huì)發(fā)送SYNACK,并把這個(gè)request sock存在SYN隊(duì)列內(nèi);等到三次握手完成后,再存放到ACCEPT隊(duì)列內(nèi);然后再由accept系統(tǒng)調(diào)用,從ACCEPT隊(duì)列內(nèi)拿出,交給用戶使用。
SYN隊(duì)列和ACCEPT隊(duì)列都是有長(zhǎng)度限制的,這個(gè)長(zhǎng)度限制與以下三個(gè)參數(shù)有關(guān):
- a. 調(diào)用listen接口,傳遞給back_log參數(shù);
- b. 內(nèi)核參數(shù)somaxconn; //與ACCEPT隊(duì)列相關(guān)
- c.內(nèi)核參數(shù)tcp_max_syn_backlog; //與SYN隊(duì)列相關(guān)
我們線上的問(wèn)題主要是ACCEPT隊(duì)列出現(xiàn)溢出造成的,所以這里主要分析ACCEPT隊(duì)列長(zhǎng)度限制的情況
在調(diào)用listen接口的時(shí)候,內(nèi)核會(huì)用系統(tǒng)的somaxconn參數(shù)去截?cái)鄠鬟f給listen的back_log參數(shù),下面是linux2.6.32-70的相關(guān)代碼片段
@sock.c
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
......
if ((unsigned)backlog > somaxconn)
backlog = somaxconn; //被截?cái)?......
err = sock->ops->listen(sock, backlog);//調(diào)用的就是下面的inet_listen函數(shù)
......
}
@af_inet.c
int inet_listen(struct socket *sock, int backlog)
{
......
sk->sk_max_ack_backlog = backlog;
......
}
上面的sk_max_ack_backlog就是listen端口的ACCEPT隊(duì)列的最大長(zhǎng)度
當(dāng)短鏈接的量太大,accept系統(tǒng)調(diào)用接口處理來(lái)不及時(shí),ACCEPT隊(duì)列就可能會(huì)阻塞溢出,這個(gè)時(shí)候,Linux的TCP/IP協(xié)議棧的做法是把新來(lái)的SYN請(qǐng)求丟棄掉( Accept backlog is full. If we have already queued enough of warm entries in syn queue, drop request. It is better than clogging syn queue with openreqs with exponentially increasing timeout.),這樣當(dāng)客戶端設(shè)定的連接超時(shí)不夠發(fā)送第二次SYN請(qǐng)求時(shí),就會(huì)收不到服務(wù)端ack,連接建立失敗,這個(gè)時(shí)候報(bào)的錯(cuò)誤是ETIMEDOUT,也就是“[110][connection time out]“。下面是linux.2.6.32-70的相關(guān)代碼片段
@tcp_ipv4.c
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
......
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) //inet_csk_reqsk_queue_young(sk) 表示SYN隊(duì)列中還沒(méi)有握手完成的請(qǐng)求數(shù),也就是young request sock的數(shù)量
goto drop;//丟棄這個(gè)SYN請(qǐng)求
......
}
@sock.h
static inline int sk_acceptq_is_full(struct sock *sk)
{
return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
}
在上面的代碼段中,sk_acceptq_is_full(sk)是判斷ACCEPT隊(duì)列是否滿了(隊(duì)列長(zhǎng)度限制已經(jīng)在listen系統(tǒng)調(diào)用中被截?cái)嗔耍@也是為什么我們修改內(nèi)核somaxconn內(nèi)核參數(shù),對(duì)當(dāng)前應(yīng)用程序的已經(jīng)listen的端口的ACCEPT隊(duì)列長(zhǎng)度限制不產(chǎn)生影響的原因,需要重起,才能夠使用新的內(nèi)核參數(shù)),如果滿了,而且SYN隊(duì)列中又有新的沒(méi)有完成握手的連接請(qǐng)求,則丟棄當(dāng)前這個(gè)鏈接請(qǐng)求,這個(gè)時(shí)候的如果客戶端設(shè)置的鏈接超時(shí)只夠它發(fā)送一次SYN請(qǐng)求,則鏈接失敗,發(fā)生“[110][connection time out]“報(bào)錯(cuò)。
驗(yàn)證:
- 按照線上情況,設(shè)置somaxconn為128,listen接口的back_log為8192 運(yùn)行一定數(shù)量的客戶端,頻繁的向服務(wù)端建立TCP鏈接,然后釋放,觀察情況
- 設(shè)置somaxconn為8192, 同時(shí)設(shè)置listen的接口的back_log參數(shù)也為8192,重復(fù)1的步驟
<?php
while (true) {
$fp = fsockopen ( "10.138.79.205" , 8221 , $errno , $errstr , 0.5 );
fclose ( $fp );
}
?>
上面是單個(gè)客戶端的代碼邏輯,很簡(jiǎn)單。
somaxconn為128。
客戶端大量報(bào)錯(cuò)
PHP Warning: fsockopen(): unable to connect to xxxxxxxxxxx:8221 (Connection refused) in /home/wxf/sample.php on line 3
PHP Warning: fsockopen(): unable to connect to xxxxxxxxxxx:8221 (Connection refused) in /home/wxy/sample.php on line 3
PHP Warning: fsockopen(): unable to connect to xxxxxxxxxxx:8221 (Connection refused) in /home/wxy/sample.php on line 3
....
服務(wù)端的現(xiàn)象
[wxf@host ~]$ for i in {1..6}; do netstat -s | grep -i listen; echo; sleep 1; done
2436905 times the listen queue of a socket overflowed
2436905 SYNs to LISTEN sockets ignored
2436927 times the listen queue of a socket overflowed
2436927 SYNs to LISTEN sockets ignored
2436950 times the listen queue of a socket overflowed
2436950 SYNs to LISTEN sockets ignored
2436985 times the listen queue of a socket overflowed
2436985 SYNs to LISTEN sockets ignored
2436999 times the listen queue of a socket overflowed
2436999 SYNs to LISTEN sockets ignored
2437018 times the listen queue of a socket overflowed
2437018 SYNs to LISTEN sockets ignored
從上面的結(jié)果可以看出,被丟棄的SYNs在不斷的增加
somaxconn為8192
客戶端沒(méi)有報(bào)錯(cuò)
服務(wù)端
[wxy@host ~]$ for i in {1..6}; do netstat -s | grep -i listen; echo ;sleep 1; done
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
2439591 times the listen queue of a socket overflowed
2439591 SYNs to LISTEN sockets ignored
可以看出,這段時(shí)間內(nèi)沒(méi)有被丟棄的SYNs
驗(yàn)證的結(jié)果和內(nèi)核代碼以及我們的預(yù)想是吻合的
**??