問題描述
因k8s節(jié)點(diǎn)異常關(guān)機(jī)導(dǎo)致啟動(dòng)后業(yè)務(wù)Pod重新部署,關(guān)機(jī)之前的Pod狀態(tài)已被刪除,今天在查看日志時(shí)發(fā)現(xiàn)在異常關(guān)機(jī)之前的集群節(jié)點(diǎn)Pod是非正常移除的,一直刷報(bào)錯(cuò)信息;如下:
問題排查
查看系統(tǒng)日志/var/log/messages發(fā)現(xiàn)一直在刷kubectl服務(wù)的以下的報(bào)錯(cuò),從錯(cuò)誤信息可以看到,這臺(tái)節(jié)點(diǎn)存在一個(gè)孤兒Pod,并且該P(yáng)od掛載了數(shù)據(jù)卷(volume),阻礙了Kubelet對(duì)孤兒Pod正常的回收清理。
[root@sss-010xl-n02 ~]# tail -3 /var/log/messages
Dec 12 17:50:17 sss-010xl-n02 bash[470923]: user=root,ppid=454652,from=,pwd=/var/lib/kubelet/pods,command:20211212-175006: ll
Dec 12 17:55:15 sss-010xl-n02 kubelet: E1212 17:55:15.645612 2423 kubelet_volumes.go:154] Orphaned pod "aad90ab1-2f04-11ec-b488-b4055dae3f29" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
Dec 12 17:55:15 sss-010xl-n02 kubelet: E1212 17:55:15.645612 2423 kubelet_volumes.go:154] Orphaned pod "aad90ab1-2f04-11ec-b488-b4055dae3f29" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
通過pod_id號(hào),進(jìn)入kubelet的目錄,可以發(fā)現(xiàn)里面裝的是容器的數(shù)據(jù),etc-hosts文件中還保留著Pod_name
[root@sss-010xl-n02 ~]# cd /var/lib/kubelet/pods/aad90ab1-2f04-11ec-b488-b4055dae3f29
[root@sss-010xl-n02 pods]# cd aad90ab1-2f04-11ec-b488-b4055dae3f29/
[root@sss-010xl-n02 aad90ab1-2f04-11ec-b488-b4055dae3f29]# ll
total 4
drwxr-x--- 3 root root 30 Dec 10 15:54 containers
-rw-r--r-- 1 root root 230 Dec 10 15:54 etc-hosts
drwxr-x--- 3 root root 37 Dec 10 15:54 plugins
drwxr-x--- 5 root root 82 Dec 10 15:54 volumes
drwxr-x--- 3 root root 49 Dec 10 15:54 volume-subpaths
[root@sss-010xl-n02 7e1a3af8-598e-11ec-b488-b4055dae3f29]# cat etc-hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.30.128.2 sss-wanted-010xl-5945fb4885-7gz85 \\被孤立的Pod
解決問題
首先通過etc-hosts文件的pod_name發(fā)現(xiàn)已經(jīng)沒有相關(guān)的實(shí)例在運(yùn)行了,所以直接刪除pod的目錄即可
[root@sss-010xl-n02 7e1a3af8-598e-11ec-b488-b4055dae3f29]# cd ..
[root@sss-010xl-n02 pods]# rm -rf 7e1a3af8-598e-11ec-b488-b4055dae3f29/
網(wǎng)上看其他人的博客都說這個(gè)方法有一定的危險(xiǎn)性,還不確認(rèn)是否有數(shù)據(jù)丟失的風(fēng)險(xiǎn),如果可以確認(rèn),再執(zhí)行;如果是無狀態(tài)服務(wù),一般沒有問題。
再去查看日志,就不會(huì)再刷這樣的告警日志了