一次MYSQL頻繁宕機(jī)的排查

背景

近期,兄弟部門(mén)反饋關(guān)于我們產(chǎn)品自帶的MYSQL數(shù)據(jù)庫(kù)的問(wèn)題。他們指出,數(shù)據(jù)庫(kù)在每天凌晨12點(diǎn)會(huì)出現(xiàn)宕機(jī)現(xiàn)象。這一現(xiàn)象也是最新一段才出現(xiàn)的,這段時(shí)間只是增加了一些自動(dòng)化程序。頻繁宕機(jī)的問(wèn)題影響了兄弟部門(mén)的工作流程,也產(chǎn)生了一定的負(fù)面影響。為了解決這一問(wèn)題,我們需要深入調(diào)查并給出解決措施。


排查過(guò)程

確認(rèn)MYSQL所在的服務(wù)器資源情況

我們了解到這段時(shí)間做的變更中提到增加了一些自動(dòng)化的程序,是不是自動(dòng)化程序中涉及到操作數(shù)據(jù)庫(kù),造成系統(tǒng)資源不足,被系統(tǒng)殺掉?于是查看各類系統(tǒng)資源。無(wú)論是磁盤(pán)還是內(nèi)存都有較大剩余,也查看了部分系統(tǒng)日志,均沒(méi)有因?yàn)橘Y源不足殺進(jìn)程的日志,所以應(yīng)該不是此原因?qū)е洛礄C(jī)。

結(jié)論:不是因?yàn)橄到y(tǒng)資源導(dǎo)致MYSQL宕機(jī)

排查MYSQL錯(cuò)誤日志

在my.ini中指定當(dāng)mysql發(fā)生故障時(shí)的的錯(cuò)誤日志地址,具體配置方式如下:

# start error log

log-error=mysql.err

mysql.err內(nèi)容如下

2024-01-08 ?9:30:46 0 [Note] InnoDB: Uses event mutexes

2024-01-08 ?9:30:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2024-01-08 ?9:30:46 0 [Note] InnoDB: Number of pools: 1

2024-01-08 ?9:30:46 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2024-01-08 ?9:30:46 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)

2024-01-08 ?9:30:46 0 [Note] InnoDB: Using Linux native AIO

2024-01-08 ?9:30:46 0 [Note] InnoDB: Initializing buffer pool, total size = 536870912, chunk size = 134217728

2024-01-08 ?9:30:46 0 [Note] InnoDB: Completed initialization of buffer pool

2024-01-08 ?9:30:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=296938092837,296938109566

2024-01-08 ?9:30:47 0 [Note] InnoDB: Transaction 57863035 was in the XA prepared state.

2024-01-08 ?9:30:47 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2024-01-08 ?9:30:47 0 [Note] InnoDB: Trx id counter is 57863036

2024-01-08 ?9:30:47 0 [Note] InnoDB: Starting final batch to recover 54726 pages from redo log.

2024-01-08 ?9:30:49 0 [Note] InnoDB: Last binlog file './mysql-bin.000078', position 111348635

2024-01-08 ?9:30:49 0 [Note] InnoDB: 128 rollback segments are active.

2024-01-08 ?9:30:49 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2024-01-08 ?9:30:49 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2024-01-08 ?9:30:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2024-01-08 ?9:30:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2024-01-08 ?9:30:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2024-01-08 ?9:30:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2024-01-08 ?9:30:49 0 [Note] InnoDB: 10.5.11 started; log sequence number 297037605262; transaction id 57863037

2024-01-08 ?9:30:49 0 [Note] InnoDB: Loading buffer pool(s) from /home/hsrisk/disk01/see/mysql/data/ib_buffer_pool

2024-01-08 ?9:30:49 0 [Note] Plugin 'FEEDBACK' is disabled.

2024-01-08 ?9:30:49 0 [Note] Recovering after a crash using mysql-bin

2024-01-08 ?9:30:49 0 [Note] InnoDB: Buffer pool(s) load completed at 240108 ?9:30:49

2024-01-08 ?9:30:49 0 [Note] Starting crash recovery...

2024-01-08 ?9:30:49 0 [Note] InnoDB: Starting recovery for XA transactions...

2024-01-08 ?9:30:49 0 [Note] InnoDB: Transaction 57863035 in prepared state after recovery

2024-01-08 ?9:30:49 0 [Note] InnoDB: Transaction contains changes to 1 rows

2024-01-08 ?9:30:49 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2024-01-08 ?9:30:49 0 [Note] Found 1 prepared transaction(s) in InnoDB

2024-01-08 ?9:30:49 0 [Note] Crash recovery finished.

2024-01-08 ?9:30:49 0 [Note] Server socket created on IP: '::'.

2024-01-08 ?9:30:49 0 [Note] Reading of all Master_info entries succeeded

2024-01-08 ?9:30:49 0 [Note] Added new Master_info '' to hash table

2024-01-08 ?9:30:49 0 [Note] ./mysqld: ready for connections.

Version: '10.5.11-MariaDB-log' ?socket: '/tmp/see20/mysql.sock' ?port: 3306 ?MariaDB Server

這段日志是客戶因?yàn)閿?shù)據(jù)庫(kù)宕機(jī)后手動(dòng)重啟的日志。這個(gè)日志中發(fā)現(xiàn)有Recovering after a crash using mysql-bin,Starting crash recovery,Crash recovery finished 等字樣,表示數(shù)據(jù)庫(kù)之前確實(shí)是異常關(guān)閉的,但是這種異常是MYSQL SERVER預(yù)料之外的。預(yù)料之外的含義是這種異常數(shù)據(jù)庫(kù)未定義,如果是意料之中的異常的話,日志中會(huì)留下更多的信息,例如下圖,雖然沒(méi)寫(xiě)明是什么原因?qū)е碌臄?shù)據(jù)庫(kù)崩潰,但是他清楚的告訴了報(bào)錯(cuò)的地點(diǎn)以及保留了幫助定位的信息。

2023-09-17 21:36:12 0x7f384413a700 ?InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/lock/lock0lock.cc line 7014

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/

InnoDB: about forcing recovery.

230917 21:36:12 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.


To report this bug, see https://mariadb.com/kb/en/reporting-bugs


We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.


Server version: 10.3.10-MariaDB-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=322

max_threads=6002

thread_count=261

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 13325602 K ?bytes of memory

Hope that's ok; if not, decrease some variables in the equation.


Thread pointer: 0x7f35980009a8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f3844139e30 thread_stack 0x49000

mysys/stacktrace.c:270(my_print_stacktrace)[0x556c2c650b4b]

sql/signal_handler.cc:168(handle_fatal_signal)[0x556c2c13bba7]

sigaction.c:0(__restore_rt)[0x7f3849863680]

:0(__GI_raise)[0x7f384866f207]

:0(__GI_abort)[0x7f38486708f8]

ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x556c2bea97b9]

lock/lock0lock.cc:7014(DeadlockChecker::check_and_resolve(ib_lock_t const*, trx_t*))[0x556c2c319ec2]

lock/lock0lock.cc:1741(lock_rec_enqueue_waiting(ib_lock_t*, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*, lock_prdt*))[0x556c2c31aaf4]

lock/lock0lock.cc:1990(lock_rec_lock(bool, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*))[0x556c2c31df48]

row/row0sel.cc:1284(sel_set_rec_lock(btr_pcur_t*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, que_thr_t*, mtr_t*))[0x556c2c3b783c]

row/row0sel.cc:5047(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x556c2c3b8a10]

handler/ha_innodb.cc:9264(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x556c2c2e11fc]

sql/handler.cc:2798(handler::ha_rnd_pos(unsigned char*, unsigned char*))[0x556c2c140c82]

sql/handler.h:3568(handler::rnd_pos_by_record(unsigned char*))[0x556c2c14855b]

sql/sql_class.h:6384(handler::ha_rnd_pos_by_record(unsigned char*))[0x556c2c219170]

sql/log_event.cc:14265(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x556c2c2194f4]

sql/log_event.cc:11514(Rows_log_event::do_apply_event(rpl_group_info*))[0x556c2c20ddcc]

sql/log_event.h:5159(Log_event::apply_event(rpl_group_info*))[0x556c2bef5c5c]

sql/slave.cc:3920(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x556c2bee83db]

sql/mysqld.h:808(thread_safe_increment64)[0x556c2c09073e]

sql/rpl_parallel.cc:1300(handle_rpl_parallel_thread)[0x556c2c0931fb]

pthread_create.c:0(start_thread)[0x7f384985bdd5]

*** stack smashing detected ***: /home/hundsun/mysql/mysql/bin/mysqld terminated

結(jié)論:是異常宕機(jī),并且該異常不是系統(tǒng)定義,無(wú)法被MYSQL SERVER所捕獲。


排查MYSQL BINLOG日志

在my.ini中指定當(dāng)開(kāi)啟binlog日志,具體配置方式如下:

log-bin=mysql-bin ?#開(kāi)啟binlog

重啟MYSQL后會(huì)創(chuàng)建如下格式的binlog文件

mysql-bin.000001

mysql-bin.000002

MYSQL生成的binlog是二進(jìn)制文件,無(wú)法直接查看。可以使用mysqlbinlog工具將binlog二進(jìn)制數(shù)據(jù)導(dǎo)出進(jìn)行查看,例如

./mysqlbinlog ../data/mysql-bin.000004 --start-datetime='2020-02-12 23:30:00' --stop-datetime='2025-02-12:23:50:00' --base64-output=DECODE-ROWS ?> 4.txt

可以按照時(shí)間維度將數(shù)據(jù)導(dǎo)出到文件中進(jìn)行查看。下面是2個(gè)binlog的導(dǎo)出數(shù)據(jù)


mysql-bin.000041這個(gè)文件有203M,最后的修改時(shí)間是2023-12-24 23:59:15。此binlog導(dǎo)出文件的末尾數(shù)據(jù)如下

#231224 23:59:15 server id 1 ?end_log_pos 83409660 CRC32 0x595aa790 Table_map: `acm`.`tb_lock` mapped to number 58

# at 83409660

#231224 23:59:15 server id 1 ?end_log_pos 83410156 CRC32 0xed3ad18a Update_rows: table id 58 flags: STMT_END_F

# Number of rows: 1

# at 83410156

#231224 23:59:15 server id 1 ?end_log_pos 83410187 CRC32 0x539c3bfa Xid = 1904727

COMMIT/*!*/;

DELIMITER ;

# End of log file

ROLLBACK /* added by mysqlbinlog */;

/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

mysql-bin.000042這個(gè)文件有1G,最后的修改時(shí)間是2023-12-25 19:03:03。此binlog導(dǎo)出文件的末尾數(shù)據(jù)如下

還有 27% 的精彩內(nèi)容
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時(shí)請(qǐng)結(jié)合常識(shí)與多方信息審慎甄別。
平臺(tái)聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡(jiǎn)書(shū)系信息發(fā)布平臺(tái),僅提供信息存儲(chǔ)服務(wù)。
禁止轉(zhuǎn)載,如需轉(zhuǎn)載請(qǐng)通過(guò)簡(jiǎn)信或評(píng)論聯(lián)系作者。
支付 ¥10.00 繼續(xù)閱讀

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容