背景
近期,兄弟部門(mén)反饋關(guān)于我們產(chǎn)品自帶的MYSQL數(shù)據(jù)庫(kù)的問(wèn)題。他們指出,數(shù)據(jù)庫(kù)在每天凌晨12點(diǎn)會(huì)出現(xiàn)宕機(jī)現(xiàn)象。這一現(xiàn)象也是最新一段才出現(xiàn)的,這段時(shí)間只是增加了一些自動(dòng)化程序。頻繁宕機(jī)的問(wèn)題影響了兄弟部門(mén)的工作流程,也產(chǎn)生了一定的負(fù)面影響。為了解決這一問(wèn)題,我們需要深入調(diào)查并給出解決措施。
排查過(guò)程
確認(rèn)MYSQL所在的服務(wù)器資源情況
我們了解到這段時(shí)間做的變更中提到增加了一些自動(dòng)化的程序,是不是自動(dòng)化程序中涉及到操作數(shù)據(jù)庫(kù),造成系統(tǒng)資源不足,被系統(tǒng)殺掉?于是查看各類系統(tǒng)資源。無(wú)論是磁盤(pán)還是內(nèi)存都有較大剩余,也查看了部分系統(tǒng)日志,均沒(méi)有因?yàn)橘Y源不足殺進(jìn)程的日志,所以應(yīng)該不是此原因?qū)е洛礄C(jī)。
結(jié)論:不是因?yàn)橄到y(tǒng)資源導(dǎo)致MYSQL宕機(jī)
排查MYSQL錯(cuò)誤日志
在my.ini中指定當(dāng)mysql發(fā)生故障時(shí)的的錯(cuò)誤日志地址,具體配置方式如下:
# start error log
log-error=mysql.err
mysql.err內(nèi)容如下
2024-01-08 ?9:30:46 0 [Note] InnoDB: Uses event mutexes
2024-01-08 ?9:30:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2024-01-08 ?9:30:46 0 [Note] InnoDB: Number of pools: 1
2024-01-08 ?9:30:46 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2024-01-08 ?9:30:46 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2024-01-08 ?9:30:46 0 [Note] InnoDB: Using Linux native AIO
2024-01-08 ?9:30:46 0 [Note] InnoDB: Initializing buffer pool, total size = 536870912, chunk size = 134217728
2024-01-08 ?9:30:46 0 [Note] InnoDB: Completed initialization of buffer pool
2024-01-08 ?9:30:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=296938092837,296938109566
2024-01-08 ?9:30:47 0 [Note] InnoDB: Transaction 57863035 was in the XA prepared state.
2024-01-08 ?9:30:47 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2024-01-08 ?9:30:47 0 [Note] InnoDB: Trx id counter is 57863036
2024-01-08 ?9:30:47 0 [Note] InnoDB: Starting final batch to recover 54726 pages from redo log.
2024-01-08 ?9:30:49 0 [Note] InnoDB: Last binlog file './mysql-bin.000078', position 111348635
2024-01-08 ?9:30:49 0 [Note] InnoDB: 128 rollback segments are active.
2024-01-08 ?9:30:49 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2024-01-08 ?9:30:49 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2024-01-08 ?9:30:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2024-01-08 ?9:30:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2024-01-08 ?9:30:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2024-01-08 ?9:30:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2024-01-08 ?9:30:49 0 [Note] InnoDB: 10.5.11 started; log sequence number 297037605262; transaction id 57863037
2024-01-08 ?9:30:49 0 [Note] InnoDB: Loading buffer pool(s) from /home/hsrisk/disk01/see/mysql/data/ib_buffer_pool
2024-01-08 ?9:30:49 0 [Note] Plugin 'FEEDBACK' is disabled.
2024-01-08 ?9:30:49 0 [Note] Recovering after a crash using mysql-bin
2024-01-08 ?9:30:49 0 [Note] InnoDB: Buffer pool(s) load completed at 240108 ?9:30:49
2024-01-08 ?9:30:49 0 [Note] Starting crash recovery...
2024-01-08 ?9:30:49 0 [Note] InnoDB: Starting recovery for XA transactions...
2024-01-08 ?9:30:49 0 [Note] InnoDB: Transaction 57863035 in prepared state after recovery
2024-01-08 ?9:30:49 0 [Note] InnoDB: Transaction contains changes to 1 rows
2024-01-08 ?9:30:49 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2024-01-08 ?9:30:49 0 [Note] Found 1 prepared transaction(s) in InnoDB
2024-01-08 ?9:30:49 0 [Note] Crash recovery finished.
2024-01-08 ?9:30:49 0 [Note] Server socket created on IP: '::'.
2024-01-08 ?9:30:49 0 [Note] Reading of all Master_info entries succeeded
2024-01-08 ?9:30:49 0 [Note] Added new Master_info '' to hash table
2024-01-08 ?9:30:49 0 [Note] ./mysqld: ready for connections.
Version: '10.5.11-MariaDB-log' ?socket: '/tmp/see20/mysql.sock' ?port: 3306 ?MariaDB Server
這段日志是客戶因?yàn)閿?shù)據(jù)庫(kù)宕機(jī)后手動(dòng)重啟的日志。這個(gè)日志中發(fā)現(xiàn)有Recovering after a crash using mysql-bin,Starting crash recovery,Crash recovery finished 等字樣,表示數(shù)據(jù)庫(kù)之前確實(shí)是異常關(guān)閉的,但是這種異常是MYSQL SERVER預(yù)料之外的。預(yù)料之外的含義是這種異常數(shù)據(jù)庫(kù)未定義,如果是意料之中的異常的話,日志中會(huì)留下更多的信息,例如下圖,雖然沒(méi)寫(xiě)明是什么原因?qū)е碌臄?shù)據(jù)庫(kù)崩潰,但是他清楚的告訴了報(bào)錯(cuò)的地點(diǎn)以及保留了幫助定位的信息。
2023-09-17 21:36:12 0x7f384413a700 ?InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/lock/lock0lock.cc line 7014
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
InnoDB: about forcing recovery.
230917 21:36:12 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.3.10-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=322
max_threads=6002
thread_count=261
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 13325602 K ?bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f35980009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f3844139e30 thread_stack 0x49000
mysys/stacktrace.c:270(my_print_stacktrace)[0x556c2c650b4b]
sql/signal_handler.cc:168(handle_fatal_signal)[0x556c2c13bba7]
sigaction.c:0(__restore_rt)[0x7f3849863680]
:0(__GI_raise)[0x7f384866f207]
:0(__GI_abort)[0x7f38486708f8]
ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x556c2bea97b9]
lock/lock0lock.cc:7014(DeadlockChecker::check_and_resolve(ib_lock_t const*, trx_t*))[0x556c2c319ec2]
lock/lock0lock.cc:1741(lock_rec_enqueue_waiting(ib_lock_t*, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*, lock_prdt*))[0x556c2c31aaf4]
lock/lock0lock.cc:1990(lock_rec_lock(bool, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*))[0x556c2c31df48]
row/row0sel.cc:1284(sel_set_rec_lock(btr_pcur_t*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, que_thr_t*, mtr_t*))[0x556c2c3b783c]
row/row0sel.cc:5047(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x556c2c3b8a10]
handler/ha_innodb.cc:9264(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x556c2c2e11fc]
sql/handler.cc:2798(handler::ha_rnd_pos(unsigned char*, unsigned char*))[0x556c2c140c82]
sql/handler.h:3568(handler::rnd_pos_by_record(unsigned char*))[0x556c2c14855b]
sql/sql_class.h:6384(handler::ha_rnd_pos_by_record(unsigned char*))[0x556c2c219170]
sql/log_event.cc:14265(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x556c2c2194f4]
sql/log_event.cc:11514(Rows_log_event::do_apply_event(rpl_group_info*))[0x556c2c20ddcc]
sql/log_event.h:5159(Log_event::apply_event(rpl_group_info*))[0x556c2bef5c5c]
sql/slave.cc:3920(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x556c2bee83db]
sql/mysqld.h:808(thread_safe_increment64)[0x556c2c09073e]
sql/rpl_parallel.cc:1300(handle_rpl_parallel_thread)[0x556c2c0931fb]
pthread_create.c:0(start_thread)[0x7f384985bdd5]
*** stack smashing detected ***: /home/hundsun/mysql/mysql/bin/mysqld terminated
結(jié)論:是異常宕機(jī),并且該異常不是系統(tǒng)定義,無(wú)法被MYSQL SERVER所捕獲。
排查MYSQL BINLOG日志
在my.ini中指定當(dāng)開(kāi)啟binlog日志,具體配置方式如下:
log-bin=mysql-bin ?#開(kāi)啟binlog
重啟MYSQL后會(huì)創(chuàng)建如下格式的binlog文件
mysql-bin.000001
mysql-bin.000002
MYSQL生成的binlog是二進(jìn)制文件,無(wú)法直接查看。可以使用mysqlbinlog工具將binlog二進(jìn)制數(shù)據(jù)導(dǎo)出進(jìn)行查看,例如
./mysqlbinlog ../data/mysql-bin.000004 --start-datetime='2020-02-12 23:30:00' --stop-datetime='2025-02-12:23:50:00' --base64-output=DECODE-ROWS ?> 4.txt
可以按照時(shí)間維度將數(shù)據(jù)導(dǎo)出到文件中進(jìn)行查看。下面是2個(gè)binlog的導(dǎo)出數(shù)據(jù)
mysql-bin.000041這個(gè)文件有203M,最后的修改時(shí)間是2023-12-24 23:59:15。此binlog導(dǎo)出文件的末尾數(shù)據(jù)如下
#231224 23:59:15 server id 1 ?end_log_pos 83409660 CRC32 0x595aa790 Table_map: `acm`.`tb_lock` mapped to number 58
# at 83409660
#231224 23:59:15 server id 1 ?end_log_pos 83410156 CRC32 0xed3ad18a Update_rows: table id 58 flags: STMT_END_F
# Number of rows: 1
# at 83410156
#231224 23:59:15 server id 1 ?end_log_pos 83410187 CRC32 0x539c3bfa Xid = 1904727
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
mysql-bin.000042這個(gè)文件有1G,最后的修改時(shí)間是2023-12-25 19:03:03。此binlog導(dǎo)出文件的末尾數(shù)據(jù)如下