這幾天寫程序發(fā)現(xiàn)有個(gè)bug,最后看底層才解決,寫篇blog 復(fù)盤一下。
具體表現(xiàn)就是服務(wù)端軟件接受請(qǐng)求時(shí),一些值在首次請(qǐng)求是正確的,以后請(qǐng)求時(shí)都成了非隨機(jī)固定值。
其實(shí)這個(gè)場(chǎng)景比較常見(jiàn)。有人會(huì)說(shuō),軟件帶了狀態(tài)。
既然第一次是正確的,說(shuō)明程序本身沒(méi)問(wèn)題,問(wèn)題在各種狀態(tài)標(biāo)記,或者說(shuō)可能作為狀態(tài)的值的生命周期上。
這一想法直接導(dǎo)致查bug思路進(jìn)入誤區(qū)。
看起來(lái)是帶了狀態(tài),所以我把相關(guān)的構(gòu)造析構(gòu),各種涉及到對(duì)象生命周期的代碼都檢查調(diào)試了一遍,沒(méi)發(fā)現(xiàn)問(wèn)題。
由于代碼不公開(kāi),這里省略上層軟件的調(diào)試,直接用gdb顯示最終問(wèn)題。
[qianzichen@dev ~]$ ps -ef | grep -E '$regex...' | awk '{print $2}'
25497
[qianzichen@dev ~]$ gdb -p 25497
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 25497
...
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
...
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
...
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7f5c157fb700 (LWP 25531)]
[New Thread 0x7f5c161fc700 (LWP 25530)]
[New Thread 0x7f5c16bfd700 (LWP 25529)]
[New Thread 0x7f5c175fe700 (LWP 25528)]
[New Thread 0x7f5c17fff700 (LWP 25527)]
[New Thread 0x7f5c2cdfa700 (LWP 25526)]
[New Thread 0x7f5c2d7fb700 (LWP 25525)]
[New Thread 0x7f5c2e1fc700 (LWP 25524)]
[New Thread 0x7f5c2ebfd700 (LWP 25523)]
[New Thread 0x7f5c2f5fe700 (LWP 25522)]
[New Thread 0x7f5c2ffff700 (LWP 25521)]
[New Thread 0x7f5c48dfa700 (LWP 25520)]
[New Thread 0x7f5c497fb700 (LWP 25519)]
[New Thread 0x7f5c4a1fc700 (LWP 25518)]
[New Thread 0x7f5c4abfd700 (LWP 25517)]
[New Thread 0x7f5c4b5fe700 (LWP 25516)]
[New Thread 0x7f5c4bfff700 (LWP 25515)]
[New Thread 0x7f5c50f73700 (LWP 25514)]
[New Thread 0x7f5c51974700 (LWP 25513)]
[New Thread 0x7f5c5d3d2700 (LWP 25500)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
...
(gdb) b exit
Breakpoint 1 at 0x3ec7a35d40
(gdb) b abort
Breakpoint 2 at 0x3ec7a33f90
(gdb) b src/path/to/target_file/file.cc:...
Breakpoint 3 at 0x7f5c5ed8042d: file src/path/to/target_file/file.cc, line ....
(gdb) c
Continuing.
[Switching to Thread 0x7f5c16bfd700 (LWP 25529)]
Breakpoint 3, (omitted...)
(gdb) p ctx
$1 = {px = 0x7f5c080008e0, pn = {pi_ = 0x7f5c08001430}}
(gdb) p ctx.px.a_member_instance
$2 = {
...
too large to display, omitted...
...
}
(gdb) set print pretty on
(gdb) p ctx.px.dbg_data_
$3 = {
url_param_string = {
static npos = 18446744073709551615,
_M_dataplus = {
<std::allocator<char>> = {
<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
_M_p = 0x7f5c6beb5578 "zichen"
}
},
request = 0x0,
search_context = 0x0,
xxx = {
...
yyy = {
...
}, <No data fields>},
...
},
doc_response_str = {
static npos = 18446744073709551615,
_M_dataplus = {
<std::allocator<char>> = {
<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
_M_p = 0x7f5c6beb5578 "zichen"
}
}, '
...
too large to display, omitted...
...
}
(gdb)
如上,vector中的空string、map中的string、隨處定義的string或者其他容器其他方式訪存的string,_M_p指針均指向同一地址,值為"zichen",是首次請(qǐng)求傳入服務(wù)端的值。
所以最后問(wèn)題定位于,該類的c_str為定值定址。
RTFS(Read The Friendly Source),直接打開(kāi)當(dāng)前版本的C++源碼:
[qianzichen@dev ~]$ vi /usr/local/gcc-4.8.5/include/c++/4.8.5/string
...
// You should have received a copy of the GNU General Public License and
// a copy of the GCC Runtime Library Exception along with this program;
// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
// <http://www.gnu.org/licenses/>.
/** @file include/string
* This is a Standard C++ Library header.
*/
//
// ISO C++ 14882: 21 Strings library
//
#ifndef _GLIBCXX_STRING
#define _GLIBCXX_STRING 1
#pragma GCC system_header
#include <bits/c++config.h>
#include <bits/stringfwd.h>
#include <bits/char_traits.h> // NB: In turn includes stl_algobase.h
#include <bits/allocator.h>
#include <bits/cpp_type_traits.h>
#include <bits/localefwd.h> // For operators >>, <<, and getline.
#include <bits/ostream_insert.h>
#include <bits/stl_iterator_base_types.h>
#include <bits/stl_iterator_base_funcs.h>
#include <bits/stl_iterator.h>
#include <bits/stl_function.h> // For less
#include <ext/numeric_traits.h>
#include <bits/stl_algobase.h>
#include <bits/range_access.h>
#include <bits/basic_string.h>
#include <bits/basic_string.tcc>
...
看stringfwd.h
[qianzichen@dev ~]$ vi /usr/local/gcc-4.8.5/include/c++/4.8.5/bits/stringfwd.h
...
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
/**
* @defgroup strings Strings
*
* @{
*/
template<class _CharT>
struct char_traits;
template<typename _CharT, typename _Traits = char_traits<_CharT>,
typename _Alloc = allocator<_CharT> >
class basic_string;
template<> struct char_traits<char>;
/// A string of @c char
typedef basic_string<char> string;
#ifdef _GLIBCXX_USE_WCHAR_T
template<> struct char_traits<wchar_t>;
/// A string of @c wchar_t
typedef basic_string<wchar_t> wstring;
...
如上,可以看出string類型為basic_string<char>類型,basic_string是一個(gè)模板類。
現(xiàn)看basic_string實(shí)現(xiàn)
[qianzichen@dev ~]$ vi /usr/local/gcc-4.8.5/include/c++/4.8.5/bits/basic_string.h
找到c_str函
/**
* @brief Swap contents with another string.
* @param __s String to swap with.
*
* Exchanges the contents of this string with that of @a __s in constant
* time.
*/
void
swap(basic_string& __s);
// String operations:
/**
* @brief Return const pointer to null-terminated contents.
*
* This is a handle to internal data. Do not modify or dire things may
* happen.
*/
const _CharT*
c_str() const _GLIBCXX_NOEXCEPT
{ return _M_data(); }
/**
* @brief Return const pointer to contents.
*
* This is a handle to internal data. Do not modify or dire things may
* happen.
*/
const _CharT*
data() const _GLIBCXX_NOEXCEPT
{ return _M_data(); }
繼續(xù)看
private:
// Data Members (private):
mutable _Alloc_hider _M_dataplus;
_CharT*
_M_data() const
{ return _M_dataplus._M_p; }
_CharT*
_M_data(_CharT* __p)
{ return (_M_dataplus._M_p = __p); }
所以返回的是 _M_dataplus 成員的 _M_p 成員。找到_Alloc_hider結(jié)構(gòu)。
...
// Use empty-base optimization: http://www.cantrip.org/emptyopt.html
struct _Alloc_hider : _Alloc
{
_Alloc_hider(_CharT* __dat, const _Alloc& __a)
: _Alloc(__a), _M_p(__dat) { }
_CharT* _M_p; // The actual data.
};
public:
...
_Alloc_hider 構(gòu)造函的__dat參數(shù)初始化_M_p成員。其成員類型_CharT為實(shí)例化string類型時(shí),basic_string模板類傳入的類型。
現(xiàn)看basic_string的構(gòu)造函
...
// NB: We overload ctors in some cases instead of using default
// arguments, per 17.4.4.4 para. 2 item 2.
/**
* @brief Default constructor creates an empty string.
*/
basic_string()
#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0
: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { }
#else
: _M_dataplus(_S_construct(size_type(), _CharT(), _Alloc()), _Alloc()){ }
#endif
...
可能有兩種委托構(gòu)造,當(dāng)前環(huán)境使用哪種呢?直接確定_GLIBCXX_FULLY_DYNAMIC_STRING的值不簡(jiǎn)單。換一種方式,直接改源碼如下。在預(yù)處理宏分支里寫一些正常compiler不會(huì)定義的符號(hào),如heihei(嘿嘿...)
...
// NB: We overload ctors in some cases instead of using default
// arguments, per 17.4.4.4 para. 2 item 2.
/**
* @brief Default constructor creates an empty string.
*/
basic_string()
#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0
: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { }
#else
: _M_dataplus(_S_construct(size_type(), _CharT(), _Alloc()), _Alloc()){ heihei }
#endif
...
再單獨(dú)寫一個(gè)UT。簡(jiǎn)單到只用string相關(guān),復(fù)雜到要到某個(gè)解析階段(僅預(yù)處理還不能保證這塊代碼被編譯)。
[qianzichen@dev ~]$ cat heihei.cc
#include <string>
[qianzichen@dev ~]$
如上,只寫一行,后編譯。
[qianzichen@dev ~]$ /usr/local/gcc-4.8.5/bin/g++ heihei.cc
/usr/lib/../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
[qianzichen@dev ~]$
如此,說(shuō)明使用的是上面那個(gè)委托構(gòu)造函。
...
// NB: We overload ctors in some cases instead of using default
// arguments, per 17.4.4.4 para. 2 item 2.
/**
* @brief Default constructor creates an empty string.
*/
basic_string()
#if _GLIBCXX_FULLY_DYNAMIC_STRING == 0
: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { heihei }
#else
: _M_dataplus(_S_construct(size_type(), _CharT(), _Alloc()), _Alloc()){ }
#endif
...
如不確定可分支驗(yàn)證,改源碼如上,再編譯。
[qianzichen@dev ~]$ /usr/local/gcc-4.8.5/bin/g++ heihei.cc
In file included from /usr/local/gcc-4.8.5/include/c++/4.8.5/string:52:0,
from heihei.cc:1:
/usr/local/gcc-4.8.5/include/c++/4.8.5/bits/basic_string.h: In constructor ‘std::basic_string<_CharT, _Traits, _Alloc>::basic_string()’:
/usr/local/gcc-4.8.5/include/c++/4.8.5/bits/basic_string.h:439:62: error: ‘heihei’ was not declared in this scope
: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { heihei }
^
/usr/local/gcc-4.8.5/include/c++/4.8.5/bits/basic_string.h:439:69: error: expected ‘;’ before ‘}’ token
: _M_dataplus(_S_empty_rep()._M_refdata(), _Alloc()) { heihei }
^
[qianzichen@dev ~]$
如上,這次在源碼中報(bào)錯(cuò)。至此確定環(huán)境下的basic_string的構(gòu)造函委托的是上面較簡(jiǎn)單的那個(gè)。
_S_empty_rep()._M_refdata() 為上文所提入?yún)_dat
看_S_empty_rep結(jié)構(gòu)
...
void
_M_leak_hard();
static _Rep&
_S_empty_rep()
{ return _Rep::_S_empty_rep(); }
public:
...
返回static上的_Rep類型實(shí)例的引用。具體為_(kāi)Rep類型的靜態(tài)函_S_empty_rep返回值。
直接看_Rep結(jié)構(gòu)
...
struct _Rep : _Rep_base
{
// Types:
typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;
// (Public) Data members:
// The maximum number of individual char_type elements of an
...
static _Rep&
_S_empty_rep()
{
// NB: Mild hack to avoid strict-aliasing warnings. Note that
// _S_empty_rep_storage is never modified and the punning should
// be reasonably safe in this case.
void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
return *reinterpret_cast<_Rep*>(__p);
}
bool
_M_is_leaked() const
{ return this->_M_refcount < 0; }
...
可見(jiàn),靜態(tài)函_S_empty_rep返回一個(gè)static上的_Rep類型實(shí)例的引用。
這里開(kāi)發(fā)者 shutup 了 compiler的strict-aliasing warnings
reinterpret_cast 為運(yùn)算對(duì)象的位模式提供較低層次上的重新解釋,類型改變了,compiler未給出警告等提示信息,當(dāng)_S_empty_rep用一個(gè)_S_empty_rep_storage的地址返回引用時(shí),顯式聲稱這個(gè)轉(zhuǎn)換合法。使用返回的引用時(shí),就認(rèn)定它的值為_(kāi)Rep類型。
舊式類型轉(zhuǎn)換,如
char *pc = (char *)ip;
效果與使用reinterpret_cast一樣,如文后最小復(fù)現(xiàn)代碼。
返回的地址為_(kāi)S_empty_rep_storage的地址,查找該符號(hào)
...
// m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
// In addition, this implementation quarters this amount.
static const size_type _S_max_size;
static const _CharT _S_terminal;
// The following storage is init'd to 0 by the linker, resulting
// (carefully) in an empty string with one reference.
static size_type _S_empty_rep_storage[];
...
為static上的數(shù)組,獨(dú)立于類型實(shí)例,該數(shù)據(jù)段在Linker鏈接階段初始化為0。
這就解釋了string的c_str(),為定值定址的問(wèn)題。
整個(gè)程序一定某處訪存了該址。致使這段內(nèi)存被污染。
至此問(wèn)題確定,繼續(xù)查找服務(wù)端bug。
隨手定義一個(gè)string,在我的代碼中二分法查找bug區(qū)域。最終縮小到請(qǐng)求摘要之后,進(jìn)入摘要模塊,繼續(xù)查找...,終于找到是在某一次序列化輸出中,直接取了某個(gè)string的c_str址,有寫入操作。作者應(yīng)該是想直接利用這個(gè)buffer。
改為程序自定義buffer之后,問(wèn)題解決。
最小復(fù)現(xiàn)代碼:
[qianzichen@dev ~]$ vi heihei.cc
#include <string>
#include <iostream>
#include <string.h>
int main() {
std::string test1("this is a test");
char *ptest1 = (char *)test1.c_str();
strncpy(ptest1, "hug you", 8);
std::cout << " ptest1 = " << ptest1 << std::endl;
std::string test2;
const char *ptest2 = test2.c_str();
std::string test3;
const char *ptest3 = test3.c_str();
std::cout << " ptest2 = " << ptest2 << std::endl;
std::cout << " ptest3 = " << ptest3 << std::endl;
std::cout << " address of ptest1 = " << (unsigned long)ptest1 << std::endl;
std::cout << " address of ptest2 = " << (unsigned long)ptest2 << std::endl;
std::cout << " address of ptest3 = " << (unsigned long)ptest3 << std::endl;
return 0;
}
執(zhí)行
[qianzichen@dev ~]$ ./a.out
ptest1 = hug you
ptest2 = hug you
ptest3 = hug you
address of ptest1 = 261138363096
address of ptest2 = 261138363096
address of ptest3 = 261138363096
[qianzichen@dev ~]$
更明顯地打印出是同址同值。
復(fù)盤整個(gè)debug過(guò)程,需要反思的是,首先要確認(rèn),即“軟件首次行為是正確的”這個(gè)條件是否完全正確,否則方向不對(duì)容易進(jìn)入誤區(qū)。
在開(kāi)發(fā)的時(shí)候想到過(guò)折衷,避開(kāi)問(wèn)題,但是核心問(wèn)題不解決是不行的。在高性能,高并發(fā)場(chǎng)景下更是如此,還須“不破樓蘭終不還”。
Linkerist
2019年1月24日于酒仙橋