Roman Podoliaka's Blog2023-11-22T20:59:23Zhttp://podoliaka.org/HolocronThe perils of temporary object lifetimes in C++http://podoliaka.org//2023/11/21/cpp-lifetimes/2023-11-21T00:00:00+00:002023-11-22T20:59:12.632036+00:00Roman Podoliaka
<p>The other day my junior colleague asked me to help him troubleshoot a failing unit test: the
actual value of a protocol buffer string field did not match<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> the expected value, and he
couldn't understand why. Unfortunately, I'm not allowed to share the code, but it can be
summarized using this contrived example:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><iostream></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string></span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">foo</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Hello, programmer!"</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string_view</span><span class="w"> </span><span class="n">bar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">foo</span><span class="p">()};</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">bar</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>A seasoned C++ programmer will immediately spot the problem, but someone who have only used a
memory-safe language like Java or Python in the past will be baffled why a seemingly trivial
program above may or may not print the string <code>"Hello, programmer!\n"</code> depending on the chosen
level of optimizations, the version of a compiler, operating system, etc.</p>
<p>The issue, of course, is that <code>foo()</code> returns a <em>temporary</em> object whose lifetime ends too early:</p>
<ul>
<li>
<p>unlike <code>std::string</code>, <code>std::string_view</code> does <em>not</em> allocate and own a dynamic array of <code>char</code>;
instead, it <em>references</em> a contiguous sequence of <code>char</code> somewhere in main memory. In this case,
that sequence is a dynamic array owned by the temporary <code>std::string</code> object</p>
</li>
<li>
<p>this temporary object is destroyed right after the initialization of a <code>std::string_view</code> value,
and so the newly constructed <code>std::string_view</code> object immediately points to a memory region that
has already been freed and, potentially, reused for something else</p>
</li>
</ul>
<p>How would a junior programmer know they made a mistake like that? Would the compiler be able to help?
I tried the latest versions of both clang++ and g++ available in Arch Linux, but neither complained
about the code above:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>clang++<span class="w"> </span>--version
clang<span class="w"> </span>version<span class="w"> </span><span class="m">16</span>.0.6
Target:<span class="w"> </span>x86_64-pc-linux-gnu
Thread<span class="w"> </span>model:<span class="w"> </span>posix
InstalledDir:<span class="w"> </span>/usr/bin
$<span class="w"> </span>clang++<span class="w"> </span>example.cpp<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-Wall<span class="w"> </span>-Werror<span class="w"> </span>-pedantic<span class="w"> </span>-std<span class="o">=</span>c++17
$
$<span class="w"> </span>g++<span class="w"> </span>--version
g++<span class="w"> </span><span class="o">(</span>GCC<span class="o">)</span><span class="w"> </span><span class="m">13</span>.2.1<span class="w"> </span><span class="m">20230801</span>
Copyright<span class="w"> </span><span class="o">(</span>C<span class="o">)</span><span class="w"> </span><span class="m">2023</span><span class="w"> </span>Free<span class="w"> </span>Software<span class="w"> </span>Foundation,<span class="w"> </span>Inc.
This<span class="w"> </span>is<span class="w"> </span>free<span class="w"> </span>software<span class="p">;</span><span class="w"> </span>see<span class="w"> </span>the<span class="w"> </span><span class="nb">source</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>copying<span class="w"> </span>conditions.<span class="w"> </span>There<span class="w"> </span>is<span class="w"> </span>NO
warranty<span class="p">;</span><span class="w"> </span>not<span class="w"> </span>even<span class="w"> </span><span class="k">for</span><span class="w"> </span>MERCHANTABILITY<span class="w"> </span>or<span class="w"> </span>FITNESS<span class="w"> </span>FOR<span class="w"> </span>A<span class="w"> </span>PARTICULAR<span class="w"> </span>PURPOSE.
$<span class="w"> </span>g++<span class="w"> </span>example.cpp<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-Wall<span class="w"> </span>-Werror<span class="w"> </span>-pedantic<span class="w"> </span>-std<span class="o">=</span>c++17
$
</code></pre></div>
<p>Interestingly, replacing <code>std::string_view bar = {foo()};</code> with <code>std::string_view bar = foo();</code> makes
the difference, and clang++ is now able to generate a warning (that can be turned into an error if you
compile the code with <code>-Werror</code>):</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>clang++<span class="w"> </span>example.cpp<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-Wall<span class="w"> </span>-Werror<span class="w"> </span>-pedantic<span class="w"> </span>-std<span class="o">=</span>c++17
example.cpp:9:28:<span class="w"> </span>error:<span class="w"> </span>object<span class="w"> </span>backing<span class="w"> </span>the<span class="w"> </span>pointer<span class="w"> </span>will<span class="w"> </span>be<span class="w"> </span>destroyed<span class="w"> </span>at<span class="w"> </span>the<span class="w"> </span>end<span class="w"> </span>of<span class="w"> </span>the<span class="w"> </span>full-expression<span class="w"> </span><span class="o">[</span>-Werror,-Wdangling-gsl<span class="o">]</span>
<span class="w"> </span>std::string_view<span class="w"> </span><span class="nv">bar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>foo<span class="o">()</span><span class="p">;</span>
<span class="w"> </span>^~~~~
<span class="m">1</span><span class="w"> </span>error<span class="w"> </span>generated.
</code></pre></div>
<p><a href="https://en.cppreference.com/w/cpp/language/initialization">C++ initialization rules</a> are mind-boggling, but
it looks like list-initialization somehow throws clang++ off, while diagnostics in g++ are even weaker.</p>
<p>Not all hope is lost, though. While it might be hard to detect this issue at <em>compile</em> time, it is certainly
possible to do so at <em>runtime</em>. <a href="https://github.com/google/sanitizers/wiki/AddressSanitizer">AddressSanitizer</a>
is a huge step forward and truly is a must have if you write C/C++ these days. Passing <code>-fsanitize=address</code> when
compiling binaries using clang or GCC will add the necessary instrumentation to detect such errors, for example:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>clang++<span class="w"> </span>example.cpp<span class="w"> </span>-o<span class="w"> </span>example<span class="w"> </span>-Wall<span class="w"> </span>-Werror<span class="w"> </span>-pedantic<span class="w"> </span>-std<span class="o">=</span>c++17<span class="w"> </span>-g<span class="w"> </span>-Og<span class="w"> </span>-fsanitize<span class="o">=</span>address<span class="w"> </span>-fno-omit-frame-pointer
$<span class="w"> </span>./example
<span class="o">=================================================================</span>
<span class="o">==</span><span class="nv">795293</span><span class="o">==</span>ERROR:<span class="w"> </span>AddressSanitizer:<span class="w"> </span>heap-use-after-free<span class="w"> </span>on<span class="w"> </span>address<span class="w"> </span>0x603000000040<span class="w"> </span>at<span class="w"> </span>pc<span class="w"> </span>0x55a69bd7cd75<span class="w"> </span>bp<span class="w"> </span>0x7ffee39bfff0<span class="w"> </span>sp<span class="w"> </span>0x7ffee39bf7b0
READ<span class="w"> </span>of<span class="w"> </span>size<span class="w"> </span><span class="m">18</span><span class="w"> </span>at<span class="w"> </span>0x603000000040<span class="w"> </span>thread<span class="w"> </span>T0
<span class="w"> </span><span class="c1">#0 0x55a69bd7cd74 in __interceptor_fwrite.part.0 asan_interceptors.cpp.o</span>
<span class="w"> </span><span class="c1">#1 0x7f17373489e4 in std::basic_streambuf<char, std::char_traits<char>>::sputn(char const*, long) /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/streambuf:458:28</span>
<span class="w"> </span><span class="c1">#2 0x7f17373489e4 in void std::__ostream_write<char, std::char_traits<char>>(std::basic_ostream<char, std::char_traits<char>>&, char const*, long) /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:53:52</span>
<span class="w"> </span><span class="c1">#3 0x7f17373489e4 in std::basic_ostream<char, std::char_traits<char>>& std::__ostream_insert<char, std::char_traits<char>>(std::basic_ostream<char, std::char_traits<char>>&, char const*, long) /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ostream_insert.h:104:18</span>
<span class="w"> </span><span class="c1">#4 0x55a69be6466d in std::basic_ostream<char, std::char_traits<char>>& std::operator<<<char, std::char_traits<char>>(std::basic_ostream<char, std::char_traits<char>>&, std::basic_string_view<char, std::char_traits<char>>) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/string_view:762:14</span>
<span class="w"> </span><span class="c1">#5 0x55a69be6466d in main /home/malor/sandbox/example.cpp:10:15</span>
<span class="w"> </span><span class="c1">#6 0x7f1737045ccf (/usr/lib/libc.so.6+0x27ccf) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)</span>
<span class="w"> </span><span class="c1">#7 0x7f1737045d89 in __libc_start_main (/usr/lib/libc.so.6+0x27d89) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)</span>
<span class="w"> </span><span class="c1">#8 0x55a69bd2b0d4 in _start (/home/malor/sandbox/example+0x1e0d4) (BuildId: c4ea059bae95830d2bf5f9b4963447a6f11ab751)</span>
0x603000000040<span class="w"> </span>is<span class="w"> </span>located<span class="w"> </span><span class="m">0</span><span class="w"> </span>bytes<span class="w"> </span>inside<span class="w"> </span>of<span class="w"> </span><span class="m">19</span>-byte<span class="w"> </span>region<span class="w"> </span><span class="o">[</span>0x603000000040,0x603000000053<span class="o">)</span>
freed<span class="w"> </span>by<span class="w"> </span>thread<span class="w"> </span>T0<span class="w"> </span>here:
<span class="w"> </span><span class="c1">#0 0x55a69be61fba in operator delete(void*) (/home/malor/sandbox/example+0x154fba) (BuildId: c4ea059bae95830d2bf5f9b4963447a6f11ab751)</span>
<span class="w"> </span><span class="c1">#1 0x55a69be6464e in std::__new_allocator<char>::deallocate(char*, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/new_allocator.h:168:2</span>
<span class="w"> </span><span class="c1">#2 0x55a69be6464e in std::allocator<char>::deallocate(char*, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/allocator.h:210:25</span>
<span class="w"> </span><span class="c1">#3 0x55a69be6464e in std::allocator_traits<std::allocator<char>>::deallocate(std::allocator<char>&, char*, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/alloc_traits.h:516:13</span>
<span class="w"> </span><span class="c1">#4 0x55a69be6464e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_destroy(unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:289:9</span>
<span class="w"> </span><span class="c1">#5 0x55a69be6464e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_dispose() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:283:4</span>
<span class="w"> </span><span class="c1">#6 0x55a69be6464e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::~basic_string() /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:792:9</span>
<span class="w"> </span><span class="c1">#7 0x55a69be6464e in main /home/malor/sandbox/example.cpp:9:28</span>
<span class="w"> </span><span class="c1">#8 0x7f1737045ccf (/usr/lib/libc.so.6+0x27ccf) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)</span>
previously<span class="w"> </span>allocated<span class="w"> </span>by<span class="w"> </span>thread<span class="w"> </span>T0<span class="w"> </span>here:
<span class="w"> </span><span class="c1">#0 0x55a69be61522 in operator new(unsigned long) (/home/malor/sandbox/example+0x154522) (BuildId: c4ea059bae95830d2bf5f9b4963447a6f11ab751)</span>
<span class="w"> </span><span class="c1">#1 0x55a69be645b1 in std::__new_allocator<char>::allocate(unsigned long, void const*) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/new_allocator.h:147:27</span>
<span class="w"> </span><span class="c1">#2 0x55a69be645b1 in std::allocator<char>::allocate(unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/allocator.h:198:32</span>
<span class="w"> </span><span class="c1">#3 0x55a69be645b1 in std::allocator_traits<std::allocator<char>>::allocate(std::allocator<char>&, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/alloc_traits.h:482:20</span>
<span class="w"> </span><span class="c1">#4 0x55a69be645b1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_S_allocate(std::allocator<char>&, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:126:16</span>
<span class="w"> </span><span class="c1">#5 0x55a69be645b1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_create(unsigned long&, unsigned long) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.tcc:155:14</span>
<span class="w"> </span><span class="c1">#6 0x55a69be645b1 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.tcc:225:14</span>
<span class="w"> </span><span class="c1">#7 0x55a69be645b1 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&) /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/basic_string.h:639:2</span>
<span class="w"> </span><span class="c1">#8 0x55a69be645b1 in foo[abi:cxx11]() /home/malor/sandbox/example.cpp:5:12</span>
<span class="w"> </span><span class="c1">#9 0x55a69be645b1 in main /home/malor/sandbox/example.cpp:9:29</span>
<span class="w"> </span><span class="c1">#10 0x7f1737045ccf (/usr/lib/libc.so.6+0x27ccf) (BuildId: 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658)</span>
SUMMARY:<span class="w"> </span>AddressSanitizer:<span class="w"> </span>heap-use-after-free<span class="w"> </span>asan_interceptors.cpp.o<span class="w"> </span><span class="k">in</span><span class="w"> </span>__interceptor_fwrite.part.0
Shadow<span class="w"> </span>bytes<span class="w"> </span>around<span class="w"> </span>the<span class="w"> </span>buggy<span class="w"> </span>address:
<span class="w"> </span>0x602ffffffd80:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>0x602ffffffe00:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>0x602ffffffe80:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>0x602fffffff00:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>0x602fffffff80:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="nv">00</span>
<span class="o">=</span>>0x603000000000:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="o">[</span>fd<span class="o">]</span>fd<span class="w"> </span>fd<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
<span class="w"> </span>0x603000000080:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
<span class="w"> </span>0x603000000100:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
<span class="w"> </span>0x603000000180:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
<span class="w"> </span>0x603000000200:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
<span class="w"> </span>0x603000000280:<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa<span class="w"> </span>fa
Shadow<span class="w"> </span>byte<span class="w"> </span>legend<span class="w"> </span><span class="o">(</span>one<span class="w"> </span>shadow<span class="w"> </span>byte<span class="w"> </span>represents<span class="w"> </span><span class="m">8</span><span class="w"> </span>application<span class="w"> </span>bytes<span class="o">)</span>:
<span class="w"> </span>Addressable:<span class="w"> </span><span class="m">00</span>
<span class="w"> </span>Partially<span class="w"> </span>addressable:<span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">02</span><span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">04</span><span class="w"> </span><span class="m">05</span><span class="w"> </span><span class="m">06</span><span class="w"> </span><span class="m">07</span><span class="w"> </span>
<span class="w"> </span>Heap<span class="w"> </span>left<span class="w"> </span>redzone:<span class="w"> </span>fa
<span class="w"> </span>Freed<span class="w"> </span>heap<span class="w"> </span>region:<span class="w"> </span>fd
<span class="w"> </span>Stack<span class="w"> </span>left<span class="w"> </span>redzone:<span class="w"> </span>f1
<span class="w"> </span>Stack<span class="w"> </span>mid<span class="w"> </span>redzone:<span class="w"> </span>f2
<span class="w"> </span>Stack<span class="w"> </span>right<span class="w"> </span>redzone:<span class="w"> </span>f3
<span class="w"> </span>Stack<span class="w"> </span>after<span class="w"> </span><span class="k">return</span>:<span class="w"> </span>f5
<span class="w"> </span>Stack<span class="w"> </span>use<span class="w"> </span>after<span class="w"> </span>scope:<span class="w"> </span>f8
<span class="w"> </span>Global<span class="w"> </span>redzone:<span class="w"> </span>f9
<span class="w"> </span>Global<span class="w"> </span>init<span class="w"> </span>order:<span class="w"> </span>f6
<span class="w"> </span>Poisoned<span class="w"> </span>by<span class="w"> </span>user:<span class="w"> </span>f7
<span class="w"> </span>Container<span class="w"> </span>overflow:<span class="w"> </span><span class="nb">fc</span>
<span class="w"> </span>Array<span class="w"> </span>cookie:<span class="w"> </span>ac
<span class="w"> </span>Intra<span class="w"> </span>object<span class="w"> </span>redzone:<span class="w"> </span>bb
<span class="w"> </span>ASan<span class="w"> </span>internal:<span class="w"> </span>fe
<span class="w"> </span>Left<span class="w"> </span>alloca<span class="w"> </span>redzone:<span class="w"> </span>ca
<span class="w"> </span>Right<span class="w"> </span>alloca<span class="w"> </span>redzone:<span class="w"> </span><span class="nv">cb</span>
<span class="o">==</span><span class="nv">795293</span><span class="o">==</span>ABORTING
</code></pre></div>
<p>AddressSanitizer detected the issue at runtime and immediately aborted execution of the program.
Its output will contain the following information:</p>
<ul>
<li>
<p>the type of a memory error and where it happened. In this case, it's <code>heap-use-after-free</code><sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> that
is triggered inside <code>operator<<()</code> which tries to access already freed memory through a dangling pointer</p>
</li>
<li>
<p>where in the code this memory block was previously allocated and deallocated</p>
</li>
<li>
<p>AddressSanitizer's view of main memory
(<a href="https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm">read</a> about how AddressSanitizer
works to make sense of this output)</p>
</li>
</ul>
<p>(note the use of <code>-g -Og -fno-omit-frame-pointer</code> to produce detailed stack traces with source code line numbers).</p>
<p>The catch is that the faulty code path must be triggered in order for AddressSanitizer to do its job.
Hey, but that's why you should always write tests!</p>
<p>It <em>is</em> amazing to me how easy it is to make a mistake like that in C++ compared to programming languages
with automatic memory management, or programming languages like Rust that can detect those at <em>compile</em> time.
This also reminds me how old I am: we now have a whole new generation of programmers for whom automatic memory
management is the norm, and those pesky C++ object lifetime rules are some anachronisms.</p>
<div class="footnote">
<hr />
<ol>
<li id="fn:1">
<p>In addition to a test failure, there was a warning about protocol buffer string serialization detecting
an invalid UTF-8 sequence. When I pointed that out to my colleague (to hint that we were reading garbage memory
somehow), it didn't trigger any reaction. I now realize that we are living in the world where UTF-8 has officially
won and it is now universally associated with the term Unicode or even "text" -- no one cares what encoding
is used and how it works, as it's all UTF-8 these days. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>The string value in the example above was carefully chosen to trigger this specific kind of a memory error.
Modern implementations of <code>std::string</code> will store shorter strings on the stack, and heap won't be used at all.
This does not change the outcome, though: either way <code>std::string_view</code> would be referencing a memory block that
might already be used for storing something else. AddressSanitizer would detect that as <code>stack-use-after-scope</code>. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>
Debugging of CPython processes with gdbhttp://podoliaka.org//2016/04/10/debugging-cpython-gdb/2016-04-10T00:00:00+00:002023-11-22T20:59:12.632036+00:00Roman Podoliaka
<p><a href="https://docs.python.org/3.5/library/pdb.html">pdb</a> has been, is and probably always will be the bread and butter of Python
programmers, when they need to find the root cause of a problem in their
applications, as it's a built-in and easy to use debugger. But there are cases,
when <code>pdb</code> can't help you, e.g. if your app has got stuck somewhere, and you
need to attach to a running process to find out why, without restarting it.
This is where <a href="https://www.gnu.org/software/gdb/">gdb</a> shines.</p>
<h2>Why gdb?</h2>
<p><code>gdb</code> is a general purpose debugger, that is mostly used for debugging of C and
C++ applications (although it actually supports Ada, Objective-C, Pascal and more).</p>
<p>There are different reasons why a Python programmer would be interested in <code>gdb</code>
for debugging:</p>
<ul>
<li>
<p><code>gdb</code> allows one to attach to a running process without starting an app
in debug mode or modifying the app code in some way first (e.g. putting
something like <code>import rpdb; rpdb.set_trace()</code> into the code)</p>
</li>
<li>
<p><code>gdb</code> allows one to take a <a href="https://en.wikipedia.org/wiki/Core_dump">core dump</a> of a process and analyze it later.
This is useful, when you don't want to stop the process for the duration of time,
while you are introspecting its state, as well as when you do <a href="https://en.wikipedia.org/wiki/Debugging#Techniques">post-mortem</a>
debugging of a process that has already failed (e.g. <a href="https://www.freedesktop.org/software/systemd/man/systemd-coredump.html">crashed</a> with a
segmentation fault)</p>
</li>
<li>
<p>most debuggers available for Python (notable exceptions are <a href="http://winpdb.org/">winpdb</a> and <a href="https://github.com/fabioz/PyDev.Debugger">pydevd</a>)
do not support switching between threads of the application being debugged. <code>gdb</code>
allows that, as well as debugging of threads created by non-Python code (e.g. in some
native library used)</p>
</li>
</ul>
<h2>Debugging of interpreted languages</h2>
<p>So what makes Python special when using <code>gdb</code>?</p>
<p>In contradistinction to programming languages like C or C++, Python code is not
compiled into a native binary for a target platform. Instead there is an
interpreter (e.g. <a href="https://en.wikipedia.org/wiki/CPython">CPython</a>, the reference implementation of Python), which
executes compiled <a href="http://security.coverity.com/blog/2014/Nov/understanding-python-bytecode.html">byte-code</a>.</p>
<p>This effectively means, that when you attach to a Python process with <code>gdb</code>,
you'll debug the interpreter instance and introspect the process state at the
interpreter level, not the application level: i.e. you will see functions and
variables of the interpreter, not of your app.</p>
<p>To give you an example, let's take a look at a <code>gdb</code> backtrace of a CPython
(the most popular Python interpreter) process:</p>
<div class="codehilite"><pre><span></span><code><span class="gh">#</span>0 0x00007fcce9b2faf3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
<span class="gh">#</span>1 0x0000000000435ef8 in pyepoll_poll (self=0x7fccdf54f240, args=<optimized out>, kwds=<optimized out>) at ../Modules/selectmodule.c:1034
<span class="gh">#</span>2 0x000000000049968d in call_function (oparg=<optimized out>, pp_stack=0x7ffc20d7bfb0) at ../Python/ceval.c:4020
<span class="gh">#</span>3 PyEval_EvalFrameEx () at ../Python/ceval.c:2666
<span class="gh">#</span>4 0x0000000000499ef2 in fast_function () at ../Python/ceval.c:4106
<span class="gh">#</span>5 call_function () at ../Python/ceval.c:4041
<span class="gh">#</span>6 PyEval_EvalFrameEx () at ../Python/ceval.c:2666
</code></pre></div>
<p>and one obtained by the means of <code>traceback.extract_stack()</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="nx">lib</span><span class="o">/</span><span class="nx">python2</span><span class="m m-Double">.7</span><span class="o">/</span><span class="nx">dist</span><span class="o">-</span><span class="nx">packages</span><span class="o">/</span><span class="nx">eventlet</span><span class="o">/</span><span class="nx">greenpool</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">82</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_spawn_n_impl</span>
<span class="w"> </span><span class="err">`</span><span class="nx">func</span><span class="p">(</span><span class="o">*</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="o">**</span><span class="nx">kwargs</span><span class="p">)</span><span class="err">`</span>
<span class="o">/</span><span class="nx">opt</span><span class="o">/</span><span class="nx">stack</span><span class="o">/</span><span class="nx">neutron</span><span class="o">/</span><span class="nx">neutron</span><span class="o">/</span><span class="nx">agent</span><span class="o">/</span><span class="nx">l3</span><span class="o">/</span><span class="nx">agent</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">461</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_process_router_update</span>
<span class="w"> </span><span class="err">`</span><span class="k">for</span><span class="w"> </span><span class="nx">rp</span><span class="p">,</span><span class="w"> </span><span class="nx">update</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">_queue</span><span class="p">.</span><span class="nx">each_update_to_next_router</span><span class="p">():</span><span class="err">`</span>
<span class="o">/</span><span class="nx">opt</span><span class="o">/</span><span class="nx">stack</span><span class="o">/</span><span class="nx">neutron</span><span class="o">/</span><span class="nx">neutron</span><span class="o">/</span><span class="nx">agent</span><span class="o">/</span><span class="nx">l3</span><span class="o">/</span><span class="nx">router_processing_queue</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">154</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">each_update_to_next_router</span>
<span class="w"> </span><span class="err">`</span><span class="nx">next_update</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">_queue</span><span class="p">.</span><span class="nx">get</span><span class="p">()</span><span class="err">`</span>
<span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="nx">lib</span><span class="o">/</span><span class="nx">python2</span><span class="m m-Double">.7</span><span class="o">/</span><span class="nx">dist</span><span class="o">-</span><span class="nx">packages</span><span class="o">/</span><span class="nx">eventlet</span><span class="o">/</span><span class="nx">queue</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">313</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">get</span>
<span class="w"> </span><span class="err">`</span><span class="k">return</span><span class="w"> </span><span class="nx">waiter</span><span class="p">.</span><span class="nx">wait</span><span class="p">()</span><span class="err">`</span>
<span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="nx">lib</span><span class="o">/</span><span class="nx">python2</span><span class="m m-Double">.7</span><span class="o">/</span><span class="nx">dist</span><span class="o">-</span><span class="nx">packages</span><span class="o">/</span><span class="nx">eventlet</span><span class="o">/</span><span class="nx">queue</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">141</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">wait</span>
<span class="w"> </span><span class="err">`</span><span class="k">return</span><span class="w"> </span><span class="nx">get_hub</span><span class="p">().</span><span class="nx">switch</span><span class="p">()</span><span class="err">`</span>
<span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="nx">lib</span><span class="o">/</span><span class="nx">python2</span><span class="m m-Double">.7</span><span class="o">/</span><span class="nx">dist</span><span class="o">-</span><span class="nx">packages</span><span class="o">/</span><span class="nx">eventlet</span><span class="o">/</span><span class="nx">hubs</span><span class="o">/</span><span class="nx">hub</span><span class="p">.</span><span class="nx">py</span><span class="p">:</span><span class="mi">294</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">switch</span>
<span class="w"> </span><span class="err">`</span><span class="k">return</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">greenlet</span><span class="p">.</span><span class="nx">switch</span><span class="p">()</span><span class="err">`</span>
</code></pre></div>
<p>As is, the former is of little help, when you are trying to find a problem
in your Python code, and all you see is the current state of the interpreter
itself.</p>
<p>However, <a href="https://docs.python.org/2/c-api/veryhigh.html#c.PyEval_EvalFrameEx">PyEval_EvalFrameEx</a> looks interesting: it's a function of CPython,
which executes bytecode of Python application level functions and, thus,
has access to their state - the very state we are usually interested in.</p>
<h2>gdb and Python</h2>
<p>Search results for <code>"gdb debug python"</code> can be confusing. The thing is, that starting
from <code>gdb</code> version 7 it's been possible to <a href="https://sourceware.org/gdb/current/onlinedocs/gdb/Python.html#Python">extend</a> the debugger with Python code, e.g.
in order to provide visualisations for C++ <a href="https://sourceware.org/gdb/wiki/STLSupport">STL</a> types, which is much easier to implement
in Python rather than in the built-in <a href="http://www.ibm.com/developerworks/aix/library/au-gdb.html">macro</a> language.</p>
<p>In order to be able to debug CPython processes and introspect the application level state,
the interpreter developers decided to extend <code>gdb</code> and wrote a <a href="https://github.com/python/cpython/blob/master/Tools/gdb/libpython.py">script</a> for that in... Python,
of course!</p>
<p>So it's two different, but related things:</p>
<ul>
<li><code>gdb</code> versions 7+ are extendable with Python modules</li>
<li>there's a Python <code>gdb</code> extension for debugging of CPython processes</li>
</ul>
<h2>Debugging Python with gdb 101</h2>
<p>First of all, you need to install <code>gdb</code>:</p>
<div class="codehilite"><pre><span></span><code># apt-get install gdb
</code></pre></div>
<p>or</p>
<div class="codehilite"><pre><span></span><code># yum install gdb
</code></pre></div>
<p>depending on the Linux distro you are using.</p>
<p>The next step is to install <a href="http://www.tutorialspoint.com/gnu_debugger/gdb_debugging_symbols.htm">debugging symbols</a> for the CPython build you have:</p>
<div class="codehilite"><pre><span></span><code># apt-get install python-dbg
</code></pre></div>
<p>or</p>
<div class="codehilite"><pre><span></span><code># yum install python-debuginfo
</code></pre></div>
<p>Some Linux distros like CentOS or RHEL ship debugging symbols <a href="http://debuginfo.centos.org/">separately</a> from
all other packages and recommend to install those like:</p>
<div class="codehilite"><pre><span></span><code># debuginfo-install python
</code></pre></div>
<p>The installed debugging symbols will be used by the CPython <a href="https://github.com/python/cpython/blob/master/Tools/gdb/libpython.py">script</a> for <code>gdb</code>
in order to analyze the <code>PyEval_EvalFrameEx</code> frames (a frame essentially is a
function call and the associated state in a form of local variables and CPU
registers, etc) and map those to application level functions in your code.</p>
<p>Without debugging symbols it's much harder to do - <code>gdb</code> allows you to
manipulate the process memory in any way you want, but you can't easily
understand what data structures reside in what memory areas.</p>
<p>After all preparatory steps have been completed, you can give <code>gdb</code> a try. E.g.
in order to attach to a running CPython process, do:</p>
<div class="codehilite"><pre><span></span><code>gdb /usr/bin/python -p $PID
</code></pre></div>
<p>At this point you can get an application level backtrace for the current
thread (note that some frames are "missing" - this is expected, as <code>gdb</code>
counts all the interpreter level frames and only some of those are calls
in application level code - <code>PyEval_EvalFrameEx</code> ones):</p>
<div class="codehilite"><pre><span></span><code>(gdb) py-bt
<span class="gh">#</span>4 Frame 0x1b7da60, for file /usr/lib/python2.7/sched.py, line 111, in run (self=<scheduler(timefunc=<built-in function time>, delayfunc=<built-in function sleep>, _queue=[<Event at remote 0x7fe1f8c74a10>]) at remote 0x7fe1fa086758>, q=[...], delayfunc=<built-in function sleep>, timefunc=<built-in function time>, pop=<built-in function heappop>, time=<float at remote 0x1a0a400>, priority=1, action=<function at remote 0x7fe1fa083aa0>, argument=(171657,), checked_event=<...>, now=<float at remote 0x1b8ec58>)
delayfunc(time - now)
<span class="gh">#</span>7 Frame 0x1b87e90, for file /usr/bin/dstat, line 2416, in main (interval=1, user='ubuntu', hostname='rpodolyaka-devstack', key='unit_hi', linewidth=150, plugin='page', mods=('page', 'page24'), mod='page', pluginfile='dstat_page', scheduler=<scheduler(timefunc=<built-in function time>, delayfunc=<built-in function sleep>, _queue=[<Event at remote 0x7fe1f8c74a10>]) at remote 0x7fe1fa086758>)
scheduler.run()
<span class="gh">#</span>11 Frame 0x7fe1fa0bc5c0, for file /usr/bin/dstat, line 2554, in <module> ()
main()
</code></pre></div>
<p>or find out what exact line of the application code is currently being executed:</p>
<div class="codehilite"><pre><span></span><code>(gdb) py-list
106 pop = heapq.heappop
107 while q:
108 time, priority, action, argument = checked_event = q[0]
109 now = timefunc()
110 if now < time:
>111 delayfunc(time - now)
112 else:
113 event = pop(q)
114 # Verify that the event was not removed or altered
115 # by another thread after we last looked at q[0].
116 if event is checked_event:
</code></pre></div>
<p>or look at values of local variables:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="n">py</span><span class="o">-</span><span class="n">locals</span>
<span class="bp">self</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">scheduler</span><span class="p">(</span><span class="n">timefunc</span><span class="o">=<</span><span class="n">built</span><span class="o">-</span><span class="ow">in</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">time</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">delayfunc</span><span class="o">=<</span><span class="n">built</span><span class="o">-</span><span class="ow">in</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">sleep</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">_queue</span><span class="o">=</span><span class="p">[</span><span class="o"><</span><span class="n">Event</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x7fe1f8c74a10</span><span class="o">></span><span class="p">])</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x7fe1fa086758</span><span class="o">></span>
<span class="n">q</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="o"><</span><span class="n">Event</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x7fe1f8c74a10</span><span class="o">></span><span class="p">]</span>
<span class="n">delayfunc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">built</span><span class="o">-</span><span class="ow">in</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">sleep</span><span class="o">></span>
<span class="n">timefunc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">built</span><span class="o">-</span><span class="ow">in</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">time</span><span class="o">></span>
<span class="n">pop</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">built</span><span class="o">-</span><span class="ow">in</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="n">heappop</span><span class="o">></span>
<span class="n">time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="nb nb-Type">float</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x1a0a400</span><span class="o">></span>
<span class="n">priority</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span>
<span class="n">action</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">function</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x7fe1fa083aa0</span><span class="o">></span>
<span class="n">argument</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="mi">171657</span><span class="p">,)</span>
<span class="n">checked_event</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="n">Event</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x7fe1f8c74a10</span><span class="o">></span>
<span class="n">now</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o"><</span><span class="nb nb-Type">float</span><span class="w"> </span><span class="n">at</span><span class="w"> </span><span class="k">remote</span><span class="w"> </span><span class="mh">0x1b8ec58</span><span class="o">></span>
</code></pre></div>
<p>There are more <code>py-</code> commands provided by the CPython <a href="https://github.com/python/cpython/blob/master/Tools/gdb/libpython.py">script</a> for <code>gdb</code>.
Check out the debugging <a href="https://docs.python.org/devguide/gdb.html">guide</a> for details.</p>
<h2>Gotchas</h2>
<p>Although the described technique should work out-of-box, there are a few known
gotchas.</p>
<h2>python-dbg</h2>
<p>The <code>python-dbg</code> package in Debian and Ubuntu will not only install the
debugging symbols for <code>python</code> (which are stripped at the package build time
to save disk space), but also provide an additional CPython binary
<code>python-dbg</code>.</p>
<p>The latter essentially is a separate build of CPython (with <code>--with-pydebug</code> flag
passed to <code>./configure</code>) with many run-time checks. Generally, you don't want
to use <code>python-dbg</code> in production, as it can be (much) slower than <code>python</code>,
e.g.:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>python<span class="w"> </span>-c<span class="w"> </span><span class="s2">"print(sum(range(1, 1000000)))"</span>
<span class="m">499999500000</span>
real<span class="w"> </span>0m0.096s
user<span class="w"> </span>0m0.057s
sys<span class="w"> </span>0m0.030s
$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>python-dbg<span class="w"> </span>-c<span class="w"> </span><span class="s2">"print(sum(range(1, 1000000)))"</span>
<span class="m">499999500000</span>
<span class="o">[</span><span class="m">18318</span><span class="w"> </span>refs<span class="o">]</span>
real<span class="w"> </span>0m0.237s
user<span class="w"> </span>0m0.197s
sys<span class="w"> </span>0m0.016s
</code></pre></div>
<p>The good thing is, that you don't need to: it's still possible to debug
<code>python</code> executable by the means of <code>gdb</code>, as long as the corresponding debugging
symbols are installed. So <code>python-dbg</code> just adds a bit more confusion to the
CPython/gdb story - you can safely ignore its existence.</p>
<h2>Build flags</h2>
<p>Some Linux distros build CPython passing the <code>-g0</code> or <code>-g1</code> <a href="https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html">option</a> to <code>gcc</code>:
the former produces a binary without debugging information at all, and the latter
does not allow <code>gdb</code> to get information about local variables at runtime.</p>
<p>Both these options break the described workflow of debugging CPython processes
by the means of <code>gdb</code>. The solution is to rebuild CPython with <code>-g</code> or <code>-g2</code>
(<code>2</code> is the default value when <code>-g</code> is passed).</p>
<p>Fortunately, all current versions of the major Linux distros (Ubuntu Trusty/Xenial,
Debian Jessie, CentOS/RHEL 7) ship the "correctly" built CPython.</p>
<h2>Optimized out frames</h2>
<p>For introspection to work properly, it's crucial, that information about
<code>PyEval_EvalFrameEx</code> arguments is preserved for each call. Depending on the
<a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html">optimization level</a> used in <code>gcc</code> when building CPython or the concrete
compiler version used, it's possible that this information will be lost at
runtime (especially with aggressive optimizations enabled by <code>-O3</code>). In this
case <code>gdb</code> will show you something like:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">bt</span>
<span class="err">#</span><span class="mi">0</span><span class="w"> </span><span class="mh">0x00007fdf3ca31be3</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__select_nocancel</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">sysdeps</span><span class="o">/</span><span class="nx">unix</span><span class="o">/</span><span class="nx">syscall</span><span class="o">-</span><span class="nx">template</span><span class="p">.</span><span class="nx">S</span><span class="p">:</span><span class="mi">84</span>
<span class="err">#</span><span class="mi">1</span><span class="w"> </span><span class="mh">0x00000000005d1da4</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">pysleep</span><span class="w"> </span><span class="p">(</span><span class="nx">secs</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">timemodule</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">1408</span>
<span class="err">#</span><span class="mi">2</span><span class="w"> </span><span class="nx">time_sleep</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">timemodule</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">231</span>
<span class="err">#</span><span class="mi">3</span><span class="w"> </span><span class="mh">0x00000000004f5465</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7fff62b184c0</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4637</span>
<span class="err">#</span><span class="mi">4</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3185</span>
<span class="err">#</span><span class="mi">5</span><span class="w"> </span><span class="mh">0x00000000004f5194</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">fast_function</span><span class="w"> </span><span class="p">(</span><span class="nx">nk</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">na</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">n</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7fff62b185c0</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">func</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4750</span>
<span class="err">#</span><span class="mi">6</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7fff62b185c0</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4677</span>
<span class="err">#</span><span class="mi">7</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3185</span>
<span class="err">#</span><span class="mi">8</span><span class="w"> </span><span class="mh">0x00000000004f5194</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">fast_function</span><span class="w"> </span><span class="p">(</span><span class="nx">nk</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">na</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">n</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7fff62b186c0</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">func</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4750</span>
<span class="err">#</span><span class="mi">9</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7fff62b186c0</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4677</span>
<span class="err">#</span><span class="mi">10</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3185</span>
<span class="err">#</span><span class="mi">11</span><span class="w"> </span><span class="mh">0x00000000005c5da8</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_PyEval_EvalCodeWithName</span><span class="p">.</span><span class="nx">lto_priv</span><span class="m m-Double">.1326</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3965</span>
<span class="err">#</span><span class="mi">12</span><span class="w"> </span><span class="mh">0x00000000005e9d7f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalCodeEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3986</span>
<span class="err">#</span><span class="mi">13</span><span class="w"> </span><span class="nx">PyEval_EvalCode</span><span class="w"> </span><span class="p">(</span><span class="nx">co</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">globals</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">locals</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">777</span>
<span class="err">#</span><span class="mi">14</span><span class="w"> </span><span class="mh">0x00000000005fe3d2</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">run_mod</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">970</span>
<span class="err">#</span><span class="mi">15</span><span class="w"> </span><span class="mh">0x000000000060057a</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_FileExFlags</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">923</span>
<span class="err">#</span><span class="mi">16</span><span class="w"> </span><span class="mh">0x000000000060075c</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_SimpleFileExFlags</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">396</span>
<span class="err">#</span><span class="mi">17</span><span class="w"> </span><span class="mh">0x000000000062b870</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">run_file</span><span class="w"> </span><span class="p">(</span><span class="nx">p_cf</span><span class="p">=</span><span class="mh">0x7fff62b18920</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">=</span><span class="mh">0x1733260</span><span class="w"> </span><span class="nx">L</span><span class="s">"test2.py"</span><span class="p">,</span><span class="w"> </span><span class="nx">fp</span><span class="p">=</span><span class="mh">0x1790190</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">318</span>
<span class="err">#</span><span class="mi">18</span><span class="w"> </span><span class="nx">Py_Main</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">768</span>
<span class="err">#</span><span class="mi">19</span><span class="w"> </span><span class="mh">0x00000000004cb8ef</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">main</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Programs</span><span class="o">/</span><span class="nx">python</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">69</span>
<span class="err">#</span><span class="mi">20</span><span class="w"> </span><span class="mh">0x00007fdf3c970610</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__libc_start_main</span><span class="w"> </span><span class="p">(</span><span class="nx">main</span><span class="p">=</span><span class="mh">0x4cb810</span><span class="w"> </span><span class="p"><</span><span class="nx">main</span><span class="p">>,</span><span class="w"> </span><span class="nx">argc</span><span class="p">=</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">argv</span><span class="p">=</span><span class="mh">0x7fff62b18b38</span><span class="p">,</span><span class="w"> </span><span class="nx">init</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">rtld_fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">stack_end</span><span class="p">=</span><span class="mh">0x7fff62b18b28</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">libc</span><span class="o">-</span><span class="nx">start</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">291</span>
<span class="err">#</span><span class="mi">21</span><span class="w"> </span><span class="mh">0x00000000005c9df9</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_start</span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">py</span><span class="o">-</span><span class="nx">bt</span>
<span class="nx">Traceback</span><span class="w"> </span><span class="p">(</span><span class="nx">most</span><span class="w"> </span><span class="nx">recent</span><span class="w"> </span><span class="nx">call</span><span class="w"> </span><span class="nx">first</span><span class="p">):</span>
<span class="w"> </span><span class="nx">File</span><span class="w"> </span><span class="s">"test2.py"</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">9</span><span class="p">,</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">g</span>
<span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">sleep</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
<span class="w"> </span><span class="nx">File</span><span class="w"> </span><span class="s">"test2.py"</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">f</span>
<span class="w"> </span><span class="nx">g</span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="nx">frame</span><span class="w"> </span><span class="nx">information</span><span class="w"> </span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">)</span>
</code></pre></div>
<p>i.e. some application level frames will be available, some will not.
There is little you can do at this point, except for rebuilding CPython
with a lower optimization level, but that often is not an option for production
(not to mention the fact you'll be using a custom CPython build, not the
one provided by your Linux distro).</p>
<p><strong>Update</strong>: actually, there is something you could do. This "frame information optimized out"
message essentially tells you that gdb wasn't able to figure out the location of
<code>PyFrameObject</code> data structure in a given stack frame (DWARF debugging symbols
allow gdb to calculate addresses of local variables and function arguments). But
it has to be somewhere; otherwise CPython would not be able to execute your Python
code.</p>
<p>On x86-64 machines the obvious place to check is CPU registers: there are 16 general
purpose CPU registers, that compilers can use for storing the values of function
call arguments and local variables.</p>
<p>The following command prints the values of all CPU registers in the selected
stack frame:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="n">info</span><span class="w"> </span><span class="n">registers</span>
<span class="n">rax</span><span class="w"> </span><span class="mh">0</span><span class="n">xfffffffffffffdfe</span><span class="w"> </span><span class="o">-</span><span class="mh">514</span>
<span class="n">rbx</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7fd7c20</span><span class="w"> </span><span class="mh">140737353972768</span>
<span class="n">rcx</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7afaff7</span><span class="w"> </span><span class="mh">140737348874231</span>
<span class="n">rdx</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">rsi</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">rdi</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">rbp</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7fd7d98</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7fd7d98</span>
<span class="n">rsp</span><span class="w"> </span><span class="mh">0</span><span class="n">x7fffffffe3c0</span><span class="w"> </span><span class="mh">0</span><span class="n">x7fffffffe3c0</span>
<span class="n">r8</span><span class="w"> </span><span class="mh">0</span><span class="n">x7fffffffe050</span><span class="w"> </span><span class="mh">140737488347216</span>
<span class="n">r9</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">r10</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">r11</span><span class="w"> </span><span class="mh">0</span><span class="n">x246</span><span class="w"> </span><span class="mh">582</span>
<span class="n">r12</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">r13</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7fae050</span><span class="w"> </span><span class="mh">140737353801808</span>
<span class="n">r14</span><span class="w"> </span><span class="mh">0</span><span class="n">x7ffff7fae050</span><span class="w"> </span><span class="mh">140737353801808</span>
<span class="n">r15</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">rip</span><span class="w"> </span><span class="mh">0</span><span class="n">x5555556468ca</span><span class="w"> </span><span class="mh">0</span><span class="n">x5555556468ca</span><span class="w"> </span><span class="o"><</span><span class="n">PyEval_EvalCodeEx</span><span class="o">+</span><span class="mh">1754</span><span class="o">></span>
<span class="n">eflags</span><span class="w"> </span><span class="mh">0</span><span class="n">x246</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="n">PF</span><span class="w"> </span><span class="n">ZF</span><span class="w"> </span><span class="n">IF</span><span class="w"> </span><span class="p">]</span>
<span class="n">cs</span><span class="w"> </span><span class="mh">0</span><span class="n">x33</span><span class="w"> </span><span class="mh">51</span>
<span class="n">ss</span><span class="w"> </span><span class="mh">0</span><span class="n">x2b</span><span class="w"> </span><span class="mh">43</span>
<span class="n">ds</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">es</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">fs</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
<span class="n">gs</span><span class="w"> </span><span class="mh">0</span><span class="n">x0</span><span class="w"> </span><span class="mh">0</span>
</code></pre></div>
<p>But these are just numbers. We need to help gdb put some meaning behind them.</p>
<p>Note, that some of the numbers above clearly look like memory addresses. We can ask
gdb to interpret the value of a CPU register as a pointer to some data type. We know,
that most of CPython runtime data structures are PyObject's, that store information
on the actual type internally (e.g. <code>->ob_type->tp_name</code> field contains a type
name encoded as a C-string).</p>
<p>So what we'll do is try to cast the value of each CPU register to <code>PyObject*</code> and
see if we can find anything useful:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="p">((</span><span class="nx">PyObject</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="err">$</span><span class="nx">rax</span><span class="p">)</span><span class="o">-></span><span class="nx">ob_type</span><span class="o">-></span><span class="nx">tp_name</span>
<span class="nx">Cannot</span><span class="w"> </span><span class="nx">access</span><span class="w"> </span><span class="nx">memory</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="mh">0xfffffffffffffe06</span>
</code></pre></div>
<p>If we give gdb a memory address, that does not actually point to a <code>PyObject</code> instance,
we'll get an error on pointer dereference.</p>
<p>There are only so many CPU registers to check. And you can easily automate this search by the
means of a helper gdb command similar to:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span> <span class="nc">LocatePyFrameObject</span><span class="p">(</span><span class="n">gdb</span><span class="o">.</span><span class="n">Command</span><span class="p">):</span>
<span class="s1">'Locate the CPU register that contains the value of PyFrameObject* in the selected stack frame'</span>
<span class="n">REGISTERS</span> <span class="o">=</span> <span class="p">(</span>
<span class="c1"># x86-64 registers, that can be used for storing of local variables and function arguments</span>
<span class="s1">'rax'</span><span class="p">,</span> <span class="s1">'rbx'</span><span class="p">,</span> <span class="s1">'rcx'</span><span class="p">,</span> <span class="s1">'rdx'</span><span class="p">,</span>
<span class="s1">'rsi'</span><span class="p">,</span> <span class="s1">'rdi'</span><span class="p">,</span>
<span class="s1">'rbp'</span><span class="p">,</span> <span class="s1">'rsp'</span><span class="p">,</span>
<span class="s1">'r8'</span><span class="p">,</span> <span class="s1">'r9'</span><span class="p">,</span> <span class="s1">'r10'</span><span class="p">,</span> <span class="s1">'r11'</span><span class="p">,</span> <span class="s1">'r12'</span><span class="p">,</span> <span class="s1">'r13'</span><span class="p">,</span> <span class="s1">'r14'</span><span class="p">,</span> <span class="s1">'r15'</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="nb">super</span><span class="p">(</span><span class="n">LocatePyFrameObject</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span>
<span class="s1">'py-locate-frame'</span><span class="p">,</span>
<span class="n">gdb</span><span class="o">.</span><span class="n">COMMAND_DATA</span><span class="p">,</span>
<span class="n">gdb</span><span class="o">.</span><span class="n">COMPLETE_NONE</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">invoke</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">args</span><span class="p">,</span> <span class="n">from_tty</span><span class="p">):</span>
<span class="n">gdb_type</span> <span class="o">=</span> <span class="n">PyObjectPtr</span><span class="o">.</span><span class="n">get_gdb_type</span><span class="p">()</span>
<span class="n">frame</span> <span class="o">=</span> <span class="n">gdb</span><span class="o">.</span><span class="n">selected_frame</span><span class="p">()</span>
<span class="k">for</span> <span class="n">register</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">REGISTERS</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">frame</span><span class="o">.</span><span class="n">read_register</span><span class="p">(</span><span class="n">register</span><span class="p">)</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">gdb_type</span><span class="p">)</span>
<span class="k">if</span> <span class="n">value</span><span class="p">[</span><span class="s1">'ob_type'</span><span class="p">][</span><span class="s1">'tp_name'</span><span class="p">]</span><span class="o">.</span><span class="n">string</span><span class="p">()</span> <span class="o">==</span> <span class="s1">'frame'</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="n">register</span><span class="p">)</span>
<span class="k">return</span>
<span class="k">except</span> <span class="n">gdb</span><span class="o">.</span><span class="n">MemoryError</span><span class="p">:</span>
<span class="c1"># if either cast or pointer dereference fails, then it's not a valid PyFrameObjectPtr*</span>
<span class="k">continue</span>
<span class="n">LocatePyFrameObject</span><span class="p">()</span>
</code></pre></div>
<p>E.g., my CPython build puts the pointer to <code>PyFrameObject</code> to the CPU register RBX:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">py</span><span class="o">-</span><span class="nx">locate</span><span class="o">-</span><span class="nx">frame</span>
<span class="nx">rbx</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="p">((</span><span class="nx">PyObject</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="err">$</span><span class="nx">rbx</span><span class="p">)</span><span class="o">-></span><span class="nx">ob_type</span><span class="o">-></span><span class="nx">tp_name</span>
<span class="err">$</span><span class="mi">28</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mh">0x5555557472ef</span><span class="w"> </span><span class="s">"frame"</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="p">(</span><span class="nx">PyFrameObject</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="err">$</span><span class="nx">rbx</span>
<span class="err">$</span><span class="mi">29</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">Frame</span><span class="w"> </span><span class="mh">0x7ffff7fd7c20</span><span class="p">,</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">file</span><span class="w"> </span><span class="nx">test2</span><span class="p">.</span><span class="nx">py</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p"><</span><span class="nx">module</span><span class="p">></span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="p">(</span><span class="nx">PyObject</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="err">$</span><span class="nx">rbx</span>
<span class="err">$</span><span class="mi">30</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">Frame</span><span class="w"> </span><span class="mh">0x7ffff7fd7c20</span><span class="p">,</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">file</span><span class="w"> </span><span class="nx">test2</span><span class="p">.</span><span class="nx">py</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p"><</span><span class="nx">module</span><span class="p">></span><span class="w"> </span><span class="p">()</span>
</code></pre></div>
<p>Note, that the loaded <code>libpython-gdb.py</code> script provides pretty-printing for
<code>PyFrameObject</code> data structure, as well it's able to figure out a specific
type of a given <code>PyObject</code> automatically. So even if high-level commands
like <code>py-bt</code> don't work on such stack frames, you'll be able to get the
very same information by pointing gdb to the location of <code>PyFrameObject</code>
manually.</p>
<p>Of course, manually poking CPU registers and memory addresses is not pretty,
but it can be the only way of debugging "optimized out" frames.</p>
<h2>Virtual environments and custom CPython builds</h2>
<p>When a virtual environment is used, it may appear that the extension does not work:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">bt</span>
<span class="err">#</span><span class="mi">0</span><span class="w"> </span><span class="mh">0x00007ff2df3d0be3</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__select_nocancel</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">sysdeps</span><span class="o">/</span><span class="nx">unix</span><span class="o">/</span><span class="nx">syscall</span><span class="o">-</span><span class="nx">template</span><span class="p">.</span><span class="nx">S</span><span class="p">:</span><span class="mi">84</span>
<span class="err">#</span><span class="mi">1</span><span class="w"> </span><span class="mh">0x0000000000588c4a</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">2</span><span class="w"> </span><span class="mh">0x00000000004bad9a</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">3</span><span class="w"> </span><span class="mh">0x00000000004bfd1f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">4</span><span class="w"> </span><span class="mh">0x00000000004bfd1f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">5</span><span class="w"> </span><span class="mh">0x00000000004b8556</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalCodeEx</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">6</span><span class="w"> </span><span class="mh">0x00000000004e91ef</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">7</span><span class="w"> </span><span class="mh">0x00000000004e3d92</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_FileExFlags</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">8</span><span class="w"> </span><span class="mh">0x00000000004e2646</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_SimpleFileExFlags</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">9</span><span class="w"> </span><span class="mh">0x0000000000491c23</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">Py_Main</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">10</span><span class="w"> </span><span class="mh">0x00007ff2df30f610</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__libc_start_main</span><span class="w"> </span><span class="p">(</span><span class="nx">main</span><span class="p">=</span><span class="mh">0x491670</span><span class="w"> </span><span class="p"><</span><span class="nx">main</span><span class="p">>,</span><span class="w"> </span><span class="nx">argc</span><span class="p">=</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">argv</span><span class="p">=</span><span class="mh">0x7ffc36f11cf8</span><span class="p">,</span><span class="w"> </span><span class="nx">init</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">rtld_fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">stack_end</span><span class="p">=</span><span class="mh">0x7ffc36f11ce8</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">libc</span><span class="o">-</span><span class="nx">start</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">291</span>
<span class="err">#</span><span class="mi">11</span><span class="w"> </span><span class="mh">0x000000000049159b</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_start</span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">py</span><span class="o">-</span><span class="nx">bt</span>
<span class="nx">Undefined</span><span class="w"> </span><span class="nx">command</span><span class="p">:</span><span class="w"> </span><span class="s">"py-bt"</span><span class="p">.</span><span class="w"> </span><span class="nx">Try</span><span class="w"> </span><span class="s">"help"</span><span class="p">.</span>
</code></pre></div>
<p><code>gdb</code> can still follow the CPython frames, but information on <code>PyEval_EvalCodeEx</code>
calls is not available.</p>
<p>If you scroll up the <code>gdb</code> output a bit, you'll see that <code>gdb</code> failed to find
the debugging symbols for <code>python</code> executable:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>gdb<span class="w"> </span>-p<span class="w"> </span><span class="m">2975</span>
GNU<span class="w"> </span>gdb<span class="w"> </span><span class="o">(</span>Debian<span class="w"> </span><span class="m">7</span>.10-1+b1<span class="o">)</span><span class="w"> </span><span class="m">7</span>.10
Copyright<span class="w"> </span><span class="o">(</span>C<span class="o">)</span><span class="w"> </span><span class="m">2015</span><span class="w"> </span>Free<span class="w"> </span>Software<span class="w"> </span>Foundation,<span class="w"> </span>Inc.
License<span class="w"> </span>GPLv3+:<span class="w"> </span>GNU<span class="w"> </span>GPL<span class="w"> </span>version<span class="w"> </span><span class="m">3</span><span class="w"> </span>or<span class="w"> </span>later<span class="w"> </span><http://gnu.org/licenses/gpl.html>
This<span class="w"> </span>is<span class="w"> </span>free<span class="w"> </span>software:<span class="w"> </span>you<span class="w"> </span>are<span class="w"> </span>free<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>and<span class="w"> </span>redistribute<span class="w"> </span>it.
There<span class="w"> </span>is<span class="w"> </span>NO<span class="w"> </span>WARRANTY,<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>extent<span class="w"> </span>permitted<span class="w"> </span>by<span class="w"> </span>law.<span class="w"> </span>Type<span class="w"> </span><span class="s2">"show copying"</span>
and<span class="w"> </span><span class="s2">"show warranty"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>details.
This<span class="w"> </span>GDB<span class="w"> </span>was<span class="w"> </span>configured<span class="w"> </span>as<span class="w"> </span><span class="s2">"x86_64-linux-gnu"</span>.
Type<span class="w"> </span><span class="s2">"show configuration"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>configuration<span class="w"> </span>details.
For<span class="w"> </span>bug<span class="w"> </span>reporting<span class="w"> </span>instructions,<span class="w"> </span>please<span class="w"> </span>see:
<http://www.gnu.org/software/gdb/bugs/>.
Find<span class="w"> </span>the<span class="w"> </span>GDB<span class="w"> </span>manual<span class="w"> </span>and<span class="w"> </span>other<span class="w"> </span>documentation<span class="w"> </span>resources<span class="w"> </span>online<span class="w"> </span>at:
<http://www.gnu.org/software/gdb/documentation/>.
For<span class="w"> </span>help,<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"help"</span>.
Type<span class="w"> </span><span class="s2">"apropos word"</span><span class="w"> </span>to<span class="w"> </span>search<span class="w"> </span><span class="k">for</span><span class="w"> </span>commands<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="s2">"word"</span>.
Attaching<span class="w"> </span>to<span class="w"> </span>process<span class="w"> </span><span class="m">2975</span>
Reading<span class="w"> </span>symbols<span class="w"> </span>from<span class="w"> </span>/home/rpodolyaka/workspace/venvs/default/bin/python2...<span class="o">(</span>no<span class="w"> </span>debugging<span class="w"> </span>symbols<span class="w"> </span>found<span class="o">)</span>...done.
</code></pre></div>
<p>How is a virtual environment any different? Why did not <code>gdb</code> find the debugging symbols?</p>
<p>First and foremost, the path to <code>python</code> executable is different. Note, that I
did not specify the executable file, when attaching to the process. In this
case <code>gdb</code> will take the executable file of the process (i.e. <code>/proc/$PID/exe</code>
value on Linux).</p>
<p>One of the ways to <a href="https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html">separate</a> debugging symbols is to put those into a well-known
directory (default is <code>/usr/lib/debug/</code>, although it's configurable via
<code>debug-file-directory</code> option in <code>gdb</code>). In our case <code>gdb</code> tried to load
debugging symbols from <code>/usr/lib/debug/home/rpodolyaka/workspace/venvs/default/bin/python2</code> and,
obviously, did not find anything there.</p>
<p>The solution is simple - specify the executable under debug explicitly when
running <code>gdb</code>:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>gdb<span class="w"> </span>/usr/bin/python2.7<span class="w"> </span>-p<span class="w"> </span><span class="nv">$PID</span>
</code></pre></div>
<p>Thus, <code>gdb</code> will look for debugging symbols in the "right" place -
<code>/usr/lib/debug/usr/bin/python2.7</code>.</p>
<p>It's also worth mentioning, that it's possible that debugging symbols for a
particular executable are identified by a unique <code>build-id</code> value stored
in <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF</a> executable headers. E.g. CPython on my Debian machine:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>objdump<span class="w"> </span>-s<span class="w"> </span>-j<span class="w"> </span>.note.gnu.build-id<span class="w"> </span>/usr/bin/python2.7
/usr/bin/python2.7:<span class="w"> </span>file<span class="w"> </span>format<span class="w"> </span>elf64-x86-64
Contents<span class="w"> </span>of<span class="w"> </span>section<span class="w"> </span>.note.gnu.build-id:
<span class="w"> </span><span class="m">400274</span><span class="w"> </span><span class="m">04000000</span><span class="w"> </span><span class="m">14000000</span><span class="w"> </span><span class="m">03000000</span><span class="w"> </span>474e5500<span class="w"> </span>............GNU.
<span class="w"> </span><span class="m">400284</span><span class="w"> </span>8d04a3ae<span class="w"> </span>38521cb7<span class="w"> </span>c7928e4a<span class="w"> </span>7c8b1ed3<span class="w"> </span>....8R.....J<span class="p">|</span>...
<span class="w"> </span><span class="m">400294</span><span class="w"> </span>85e763e4
</code></pre></div>
<p>In this case <code>gdb</code> will look for debugging symbols using the <code>build-id</code> value:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>gdb<span class="w"> </span>/usr/bin/python2.7
GNU<span class="w"> </span>gdb<span class="w"> </span><span class="o">(</span>Debian<span class="w"> </span><span class="m">7</span>.10-1+b1<span class="o">)</span><span class="w"> </span><span class="m">7</span>.10
Copyright<span class="w"> </span><span class="o">(</span>C<span class="o">)</span><span class="w"> </span><span class="m">2015</span><span class="w"> </span>Free<span class="w"> </span>Software<span class="w"> </span>Foundation,<span class="w"> </span>Inc.
License<span class="w"> </span>GPLv3+:<span class="w"> </span>GNU<span class="w"> </span>GPL<span class="w"> </span>version<span class="w"> </span><span class="m">3</span><span class="w"> </span>or<span class="w"> </span>later<span class="w"> </span><http://gnu.org/licenses/gpl.html>
This<span class="w"> </span>is<span class="w"> </span>free<span class="w"> </span>software:<span class="w"> </span>you<span class="w"> </span>are<span class="w"> </span>free<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>and<span class="w"> </span>redistribute<span class="w"> </span>it.
There<span class="w"> </span>is<span class="w"> </span>NO<span class="w"> </span>WARRANTY,<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>extent<span class="w"> </span>permitted<span class="w"> </span>by<span class="w"> </span>law.<span class="w"> </span>Type<span class="w"> </span><span class="s2">"show copying"</span>
and<span class="w"> </span><span class="s2">"show warranty"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>details.
This<span class="w"> </span>GDB<span class="w"> </span>was<span class="w"> </span>configured<span class="w"> </span>as<span class="w"> </span><span class="s2">"x86_64-linux-gnu"</span>.
Type<span class="w"> </span><span class="s2">"show configuration"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>configuration<span class="w"> </span>details.
For<span class="w"> </span>bug<span class="w"> </span>reporting<span class="w"> </span>instructions,<span class="w"> </span>please<span class="w"> </span>see:
<http://www.gnu.org/software/gdb/bugs/>.
Find<span class="w"> </span>the<span class="w"> </span>GDB<span class="w"> </span>manual<span class="w"> </span>and<span class="w"> </span>other<span class="w"> </span>documentation<span class="w"> </span>resources<span class="w"> </span>online<span class="w"> </span>at:
<http://www.gnu.org/software/gdb/documentation/>.
For<span class="w"> </span>help,<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"help"</span>.
Type<span class="w"> </span><span class="s2">"apropos word"</span><span class="w"> </span>to<span class="w"> </span>search<span class="w"> </span><span class="k">for</span><span class="w"> </span>commands<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="s2">"word"</span>...
Reading<span class="w"> </span>symbols<span class="w"> </span>from<span class="w"> </span>/usr/bin/python2.7...Reading<span class="w"> </span>symbols<span class="w"> </span>from<span class="w"> </span>/usr/lib/debug/.build-id/8d/04a3ae38521cb7c7928e4a7c8b1ed385e763e4.debug...done.
<span class="k">done</span>.
</code></pre></div>
<p>This has a nice implication - it no longer matters how the executable is called:
<code>virtualenv</code> just creates a copy of the specified interpreter executable, thus,
both executables - the one in <code>/usr/bin/</code> and the one in your virtual environment
will use the very same debugging symbols:</p>
<div class="codehilite"><pre><span></span><code>$<span class="w"> </span>gdb<span class="w"> </span>-p<span class="w"> </span><span class="m">11150</span>
GNU<span class="w"> </span>gdb<span class="w"> </span><span class="o">(</span>ebian<span class="w"> </span><span class="m">7</span>.10-1+b1<span class="o">)</span><span class="w"> </span><span class="m">7</span>.10
Copyright<span class="w"> </span><span class="o">()</span><span class="w"> </span><span class="m">2015</span><span class="w"> </span>Free<span class="w"> </span>Software<span class="w"> </span>Foundation,<span class="w"> </span>Inc.
License<span class="w"> </span>GPLv3+:<span class="w"> </span>GNU<span class="w"> </span>GPL<span class="w"> </span>version<span class="w"> </span><span class="m">3</span><span class="w"> </span>or<span class="w"> </span>later<span class="w"> </span><http://gnu.org/licenses/gpl.html>
This<span class="w"> </span>is<span class="w"> </span>free<span class="w"> </span>software:<span class="w"> </span>you<span class="w"> </span>are<span class="w"> </span>free<span class="w"> </span>to<span class="w"> </span>change<span class="w"> </span>and<span class="w"> </span>redistribute<span class="w"> </span>it.
There<span class="w"> </span>is<span class="w"> </span>NO<span class="w"> </span>WARRANTY,<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>extent<span class="w"> </span>permitted<span class="w"> </span>by<span class="w"> </span>law.<span class="w"> </span>Type<span class="w"> </span><span class="s2">"how copying"</span>
and<span class="w"> </span><span class="s2">"how warranty"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>details.
This<span class="w"> </span>GDB<span class="w"> </span>was<span class="w"> </span>configured<span class="w"> </span>as<span class="w"> </span><span class="s2">"86_64-linux-gnu"</span>.
Type<span class="w"> </span><span class="s2">"how configuration"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>configuration<span class="w"> </span>details.
For<span class="w"> </span>bug<span class="w"> </span>reporting<span class="w"> </span>instructions,<span class="w"> </span>please<span class="w"> </span>see:
<http://www.gnu.org/software/gdb/bugs/>.
Find<span class="w"> </span>the<span class="w"> </span>GDB<span class="w"> </span>manual<span class="w"> </span>and<span class="w"> </span>other<span class="w"> </span>documentation<span class="w"> </span>resources<span class="w"> </span>online<span class="w"> </span>at:
<http://www.gnu.org/software/gdb/documentation/>.
For<span class="w"> </span>help,<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s2">"elp"</span>.
Type<span class="w"> </span><span class="s2">"propos word"</span><span class="w"> </span>to<span class="w"> </span>search<span class="w"> </span><span class="k">for</span><span class="w"> </span>commands<span class="w"> </span>related<span class="w"> </span>to<span class="w"> </span><span class="s2">"ord"</span>.
Attaching<span class="w"> </span>to<span class="w"> </span>process<span class="w"> </span><span class="m">11150</span>
Reading<span class="w"> </span>symbols<span class="w"> </span>from<span class="w"> </span>/home/rpodolyaka/sandbox/testvenv/bin/python2.7...Reading<span class="w"> </span>symbols<span class="w"> </span>from
/usr/lib/debug/.build-id/8d/04a3ae38521cb7c7928e4a7c8b1ed385e763e4.debug...done.
$<span class="w"> </span>ls<span class="w"> </span>-la<span class="w"> </span>/proc/11150/exe
lrwxrwxrwx<span class="w"> </span><span class="m">1</span><span class="w"> </span>rpodolyaka<span class="w"> </span>rpodolyaka<span class="w"> </span><span class="m">0</span><span class="w"> </span>Apr<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="m">15</span>:18<span class="w"> </span>/proc/11150/exe<span class="w"> </span>-><span class="w"> </span>/home/rpodolyaka/sandbox/testvenv/bin/python2.7
</code></pre></div>
<p>The first problem is solved, <code>bt</code> output now looks much nicer, but <code>py-bt</code> command is still
undefined:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">bt</span>
<span class="err">#</span><span class="mi">0</span><span class="w"> </span><span class="mh">0x00007f3e95083be3</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__select_nocancel</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">sysdeps</span><span class="o">/</span><span class="nx">unix</span><span class="o">/</span><span class="nx">syscall</span><span class="o">-</span><span class="nx">template</span><span class="p">.</span><span class="nx">S</span><span class="p">:</span><span class="mi">84</span>
<span class="err">#</span><span class="mi">1</span><span class="w"> </span><span class="mh">0x0000000000594a59</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">floatsleep</span><span class="w"> </span><span class="p">(</span><span class="nx">secs</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">timemodule</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">948</span>
<span class="err">#</span><span class="mi">2</span><span class="w"> </span><span class="nx">time_sleep</span><span class="p">.</span><span class="nx">lto_priv</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">timemodule</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">206</span>
<span class="err">#</span><span class="mi">3</span><span class="w"> </span><span class="mh">0x00000000004c524a</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7ffefb5045b0</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4350</span>
<span class="err">#</span><span class="mi">4</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">2987</span>
<span class="err">#</span><span class="mi">5</span><span class="w"> </span><span class="mh">0x00000000004ca95f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">fast_function</span><span class="w"> </span><span class="p">(</span><span class="nx">nk</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">na</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">n</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7ffefb504700</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">func</span><span class="p">=</span><span class="mh">0x7f3e95f78c80</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4435</span>
<span class="err">#</span><span class="mi">6</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7ffefb504700</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4370</span>
<span class="err">#</span><span class="mi">7</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">2987</span>
<span class="err">#</span><span class="mi">8</span><span class="w"> </span><span class="mh">0x00000000004ca95f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">fast_function</span><span class="w"> </span><span class="p">(</span><span class="nx">nk</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">na</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">n</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7ffefb504850</span><span class="p">,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">func</span><span class="p">=</span><span class="mh">0x7f3e95f78c08</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4435</span>
<span class="err">#</span><span class="mi">9</span><span class="w"> </span><span class="nx">call_function</span><span class="w"> </span><span class="p">(</span><span class="nx">oparg</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">pp_stack</span><span class="p">=</span><span class="mh">0x7ffefb504850</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">4370</span>
<span class="err">#</span><span class="mi">10</span><span class="w"> </span><span class="nx">PyEval_EvalFrameEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">2987</span>
<span class="err">#</span><span class="mi">11</span><span class="w"> </span><span class="mh">0x00000000004c32e5</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalCodeEx</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">3582</span>
<span class="err">#</span><span class="mi">12</span><span class="w"> </span><span class="mh">0x00000000004c3089</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyEval_EvalCode</span><span class="w"> </span><span class="p">(</span><span class="nx">co</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">globals</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">locals</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">ceval</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">669</span>
<span class="err">#</span><span class="mi">13</span><span class="w"> </span><span class="mh">0x00000000004f263f</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">run_mod</span><span class="p">.</span><span class="nx">lto_priv</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">1376</span>
<span class="err">#</span><span class="mi">14</span><span class="w"> </span><span class="mh">0x00000000004ecf52</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_FileExFlags</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">1362</span>
<span class="err">#</span><span class="mi">15</span><span class="w"> </span><span class="mh">0x00000000004eb6d1</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">PyRun_SimpleFileExFlags</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Python</span><span class="o">/</span><span class="nx">pythonrun</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">948</span>
<span class="err">#</span><span class="mi">16</span><span class="w"> </span><span class="mh">0x000000000049e2d8</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">Py_Main</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="p">..</span><span class="o">/</span><span class="nx">Modules</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">640</span>
<span class="err">#</span><span class="mi">17</span><span class="w"> </span><span class="mh">0x00007f3e94fc2610</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">__libc_start_main</span><span class="w"> </span><span class="p">(</span><span class="nx">main</span><span class="p">=</span><span class="mh">0x49dc00</span><span class="w"> </span><span class="p"><</span><span class="nx">main</span><span class="p">>,</span><span class="w"> </span><span class="nx">argc</span><span class="p">=</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">argv</span><span class="p">=</span><span class="mh">0x7ffefb504c98</span><span class="p">,</span><span class="w"> </span><span class="nx">init</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span>
<span class="w"> </span><span class="nx">rtld_fini</span><span class="p">=<</span><span class="nx">optimized</span><span class="w"> </span><span class="nx">out</span><span class="p">>,</span><span class="w"> </span><span class="nx">stack_end</span><span class="p">=</span><span class="mh">0x7ffefb504c88</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">libc</span><span class="o">-</span><span class="nx">start</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">291</span>
<span class="err">#</span><span class="mi">18</span><span class="w"> </span><span class="mh">0x000000000049db29</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">_start</span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">py</span><span class="o">-</span><span class="nx">bt</span>
<span class="nx">Undefined</span><span class="w"> </span><span class="nx">command</span><span class="p">:</span><span class="w"> </span><span class="s">"py-bt"</span><span class="p">.</span><span class="w"> </span><span class="nx">Try</span><span class="w"> </span><span class="s">"help"</span><span class="p">.</span>
</code></pre></div>
<p>Once again, this is caused by the fact that <code>python</code> binary in a virtual
environment has a different path. By default, <code>gdb</code> will try to <a href="https://sourceware.org/gdb/onlinedocs/gdb/Python-Auto_002dloading.html#set%20auto%2dload%20python%2dscripts">auto-load</a>
Python extensions for a particular object file under debug, if they exist.
Specifically, <code>gdb</code> will look for <code>objfile-gdb.py</code> and try to <code>source</code> it on
start:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="n">info</span><span class="w"> </span><span class="n">auto</span><span class="o">-</span><span class="nb">load</span>
<span class="n">gdb</span><span class="o">-</span><span class="n">scripts</span><span class="p">:</span><span class="w"> </span><span class="n">No</span><span class="w"> </span><span class="n">auto</span><span class="o">-</span><span class="nb">load</span><span class="w"> </span><span class="n">scripts</span><span class="o">.</span>
<span class="n">libthread</span><span class="o">-</span><span class="n">db</span><span class="p">:</span><span class="w"> </span><span class="n">No</span><span class="w"> </span><span class="n">auto</span><span class="o">-</span><span class="n">loaded</span><span class="w"> </span><span class="n">libthread</span><span class="o">-</span><span class="n">db</span><span class="o">.</span>
<span class="n">local</span><span class="o">-</span><span class="n">gdbinit</span><span class="p">:</span><span class="w"> </span><span class="n">Local</span><span class="w"> </span><span class="o">.</span><span class="n">gdbinit</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="n">found</span><span class="o">.</span>
<span class="n">python</span><span class="o">-</span><span class="n">scripts</span><span class="p">:</span>
<span class="n">Loaded</span><span class="w"> </span><span class="n">Script</span>
<span class="n">Yes</span><span class="w"> </span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">share</span><span class="o">/</span><span class="n">gdb</span><span class="o">/</span><span class="n">auto</span><span class="o">-</span><span class="nb">load</span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python2</span><span class="o">.</span><span class="mi">7</span><span class="o">-</span><span class="n">gdb</span><span class="o">.</span><span class="n">py</span>
</code></pre></div>
<p>If, for some reason this has not been done, you can always do it manually:</p>
<div class="codehilite"><pre><span></span><code><span class="p">(</span><span class="n">gdb</span><span class="p">)</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">share</span><span class="o">/</span><span class="n">gdb</span><span class="o">/</span><span class="n">auto</span><span class="o">-</span><span class="nb">load</span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">bin</span><span class="o">/</span><span class="n">python2</span><span class="o">.</span><span class="mi">7</span><span class="o">-</span><span class="n">gdb</span><span class="o">.</span><span class="n">py</span>
</code></pre></div>
<p>e.g. if you want to test a new version of the <code>gdb</code> extension shipped with CPython.</p>
<h2>PyPy, Jython, etc</h2>
<p>The described debugging technique is only feasible for the CPython interpreter
as is, as the <code>gdb</code> extension is specifically written to introspect the state
of CPython internals (e.g. <code>PyEval_EvalFrameEx</code> calls).</p>
<p>For <a href="http://pypy.org/">PyPy</a> there is an open <a href="https://bitbucket.org/pypy/pypy/issues/1204/gdb-hooks-for-debugging-pypy">issue</a> on Bitbucket, where it was proposed to
provide integration with <code>gdb</code>, but looks like the attached patches have not
been merged yet and the person, who wrote those, lost interest in this.</p>
<p>For <a href="http://www.jython.org/">Jython</a> you could probably use standard tools for debugging of <code>JVM</code>
applications, e.g. <a href="http://visualvm.java.net/">VisualVM</a>.</p>
<h2>Conclusion</h2>
<p><code>gdb</code> is a powerful tool, that allows one to debug complex problems with
crashing or hanging CPython processes, as well as Python code, that does
calls to native libraries. On modern Linux distros debugging CPython processes
with <code>gdb</code> must be as simple as installing of debugging symbols for the
concrete interpreter build, although there are a few known gotchas, especially
when virtual environments are used.</p>
The story of one subtle optimizationhttp://podoliaka.org//2013/02/24/stringconcat-en/2013-02-24T00:00:00+00:002023-11-22T20:59:12.632036+00:00Roman Podoliaka
<p>On my previous job I worked on an RPC-like Web-service that was passing JSON-blobs between
workers via the <a class="reference external" href="http://gearman.org/">Gearman</a> task queue. One day I was hacking on our source
code trying to find out why it took such a long time (a few seconds) to transfer a 3-megabyte
JSON-blob. I was sure that it was a problem in our code. To my surprise the profiler showed that
the bottleneck was in this function of <a class="reference external" href="https://github.com/Yelp/python-gearman/">python-gearman</a>
client library:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">read_data_from_socket</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">bytes_to_read</span><span class="o">=</span><span class="mi">4096</span><span class="p">):</span><span class="w">
</span><span class="sd">"""Reads data from socket --> buffer"""</span><span class="w">
</span> <span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">connected</span><span class="p">:</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">throw_exception</span><span class="p">(</span><span class="n">message</span><span class="o">=</span><span class="s1">'disconnected'</span><span class="p">)</span><span class="w">
</span> <span class="n">recv_buffer</span> <span class="o">=</span> <span class="s1">''</span><span class="w">
</span> <span class="k">try</span><span class="p">:</span><span class="w">
</span> <span class="n">recv_buffer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gearman_socket</span><span class="o">.</span><span class="n">recv</span><span class="p">(</span><span class="n">bytes_to_read</span><span class="p">)</span><span class="w">
</span> <span class="k">except</span> <span class="n">socket</span><span class="o">.</span><span class="n">error</span><span class="p">,</span> <span class="n">socket_exception</span><span class="p">:</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">throw_exception</span><span class="p">(</span><span class="n">exception</span><span class="o">=</span><span class="n">socket_exception</span><span class="p">)</span><span class="w">
</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">recv_buffer</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">throw_exception</span><span class="p">(</span><span class="n">message</span><span class="o">=</span><span class="s1">'remote disconnected'</span><span class="p">)</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">_incoming_buffer</span> <span class="o">+=</span> <span class="n">recv_buffer</span><span class="w">
</span> <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_incoming_buffer</span><span class="p">)</span>
</pre>
<p>The problem is in the line: <code>self._incoming_buffer += recv_buffer</code> where the <code>_incoming_buffer</code>
instance variable has the type <code>str</code>. I was always taught that it is a bad idea to do a massive
string concatenation in a programming language where strings are considered to be immutable. So I
wrote a small snippet to find out how slow it really was:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">f</span><span class="p">():</span><span class="w">
</span> <span class="n">spam</span> <span class="o">=</span> <span class="s1">''</span><span class="w">
</span> <span class="n">eggs</span> <span class="o">=</span> <span class="s1">'a'</span> <span class="o">*</span> <span class="mi">4096</span><span class="w">
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span><span class="w">
</span> <span class="n">spam</span> <span class="o">+=</span> <span class="n">eggs</span>
</pre>
<p>The <code>timeit.timeit()</code> function showed that it took less than <strong>1 ms</strong> to call the function <code>f()</code>
that did 1000 string concatenation operations though. That was really amazing. I started to think
that I was missing something. So I tried to run the following snippet:</p>
<pre class="code python literal-block">
<span class="k">def</span> <span class="nf">g</span><span class="p">():</span><span class="w">
</span> <span class="k">class</span> <span class="nc">Foo</span><span class="p">:</span><span class="w">
</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">spam</span> <span class="o">=</span> <span class="s1">''</span><span class="w">
</span> <span class="n">eggs</span> <span class="o">=</span> <span class="s1">'a'</span> <span class="o">*</span> <span class="mi">4096</span><span class="w">
</span> <span class="n">f</span> <span class="o">=</span> <span class="n">Foo</span><span class="p">()</span><span class="w">
</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">xrange</span><span class="p">(</span><span class="mi">1000</span><span class="p">):</span><span class="w">
</span> <span class="n">f</span><span class="o">.</span><span class="n">spam</span> <span class="o">+=</span> <span class="n">eggs</span>
</pre>
<p>It took about 450 ms - that's much "better" (or probably closer to my expectations)!
So what the heck? These two functions look very similar except that the latter uses an instance
attribute instead of a simple variable.</p>
<p>The bytecode comparison of these two function didn't show anything interesting: the for-loop
is completely the same except that the function <code>g()</code> uses <code>LOAD_ATTR/STORE_ATTR</code> bytecodes
instead of <code>LOAD_FAST/STORE_FAST</code> ones. But how can attribute access be so slow? It surely
must not be a bottleneck while concatenating strings in a loop.</p>
<p>Another interesting thing was that it took the same amount of time (about 1.6 s) to execute both
<code>f()</code> and <code>g()</code> functions using the <a class="reference external" href="http://pypy.org/">PyPy</a> interpreter.</p>
<p>Now I was almost sure that <a class="reference external" href="http://python.org/">CPython</a> had some kind of optimization that
allowed it to execute the <code>f()</code> function so fast. But the second question, why it didn't work
for the <code>g()</code> function, remained. Running <code>python</code> with <code>strace</code> showed that execution the
<code>g()</code> function did almost 100 times more memory allocations than execution of the <code>f()</code>
function. The time came to debug the interpreter using <code>gdb</code>. The debugging allowed to find
the <code>CPython</code> function that did string concatenation.</p>
<p>The source code of <code>string_concatenate()</code> function had a few interesting lines:</p>
<pre class="code c literal-block">
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="o">-></span><span class="n">ob_refcnt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">!</span><span class="n">PyString_CHECK_INTERNED</span><span class="p">(</span><span class="n">v</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="cm">/* Now we own the last reference to 'v', so we can resize it
* in-place.
*/</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">_PyString_Resize</span><span class="p">(</span><span class="o">&</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">new_len</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="cm">/* XXX if _PyString_Resize() fails, 'v' has been
* deallocated so it cannot be put back into
* 'variable'. The MemoryError is raised when there
* is no value in 'variable', which might (very
* remotely) be a cause of incompatibilities.
*/</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="cm">/* copy 'w' into the newly allocated area of 'v' */</span><span class="w">
</span><span class="n">memcpy</span><span class="p">(</span><span class="n">PyString_AS_STRING</span><span class="p">(</span><span class="n">v</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">v_len</span><span class="p">,</span><span class="w">
</span><span class="n">PyString_AS_STRING</span><span class="p">(</span><span class="n">w</span><span class="p">),</span><span class="w"> </span><span class="n">w_len</span><span class="p">);</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">v</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="cm">/* When in-place resizing is not an option. */</span><span class="w">
</span><span class="n">PyString_Concat</span><span class="p">(</span><span class="o">&</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">w</span><span class="p">);</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">v</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>So-so, strings actually <strong>can</strong> be mutable! This allows to resize ones in-place that makes strings
concatenation very fast. But there is a strict constraint - there must be <strong>at most</strong> 1 reference
to the string to be resized (otherwise the language contract that strings are immutable would be
broken).</p>
<p>It's important to understand that this is only a <code>CPython</code>-specific optimization, and you'd better
not take benefit of it in your code! Everyone knows that string concatenation in <code>Python</code> is
a bad idea, so why to change our mind? If you really need to construct a string this way you should
read the docs for <a class="reference external" href="http://docs.python.org/2/library/stdtypes.html#str.join">str.join()</a>
method, <a class="reference external" href="http://docs.python.org/2/library/stringio.html">StringIO</a> and
<a class="reference external" href="http://docs.python.org/2/library/array.html">array</a> modules first - this is the way to write
code that works efficiently in all <code>Python</code> interpreters.</p>
<p><strong>P. S.</strong> I wrote a small patch for <a class="reference external" href="https://github.com/Yelp/python-gearman/">python-gearman</a>
that uses an <code>array</code> type for handling of incoming data that greatly improves the performance.
The patch was accepted in the <strong>master</strong> branch. Everyone who is interested in using of <code>Gearman</code>
task queue in <code>Python</code> programs might want to check it out.</p>
Iterators for Beginners. The Way of С++http://podoliaka.org//2013/01/20/iterators-en/2013-01-20T00:00:00+00:002023-11-22T20:59:12.632036+00:00Roman Podoliaka
<h2>Foreword</h2>
<p>This post is mainly for beginners and is not intended to teach you everything
about iterators in <code>C++</code>. So if you know about <code>iterator_traits<></code> or something
like that you might get bored reading it. Anyway I hope this information will be
useful for the people who haven't heard about iterators before.</p>
<h2>The problem</h2>
<p>Lets consider the following problem: we need a function that would return an index of
the maximum element in a given array of integers. We could implement it like this:</p>
<pre class="code cpp literal-block">
<span class="kt">int</span><span class="w"> </span><span class="nf">max_element</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">arr</span><span class="p">[],</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">len</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">len</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">i</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">arr</span><span class="p">[</span><span class="n">max</span><span class="p">])</span><span class="w">
</span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>This code works perfectly but it has a huge limitation though - it works only for arrays of <code>int</code> elements.
To find a maximum element in an array of floats we would have to write completely the same code and
substitute <code>int</code> with <code>float</code> in the declaration of an array type. We all know that <code>C++</code> has
a solution for this problem - <em>templates</em>.</p>
<p>A generalized version of the function which returns an index of maximum element in an array of any type
(precisely, for which the relation <em>greater than</em> is defined):</p>
<pre class="code cpp literal-block">
<span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="n">T</span><span class="w"> </span><span class="n">arr</span><span class="p">[],</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">len</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">len</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">i</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">arr</span><span class="p">[</span><span class="n">max</span><span class="p">])</span><span class="w">
</span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>Now we must be satisfied. But what if we wanted to search for the maximum element only in a small part of
the whole array?</p>
<p>Arrays in <code>C++</code> are linearly allocated memory segments filled with elements of some type with no gaps
between them. Moreover an array name is also a pointer to the first element of this array. Arrays and pointers
have many things in common and, as you may already know, it is perfectly ok to use pointer arithmetic to access
an array elements.</p>
<p>The elements range for the function to process is specified by two pointers: the first one points to the <em>first</em>
element to process (<code>begin</code>), the second one points to the <em>next after the last</em> element to process. So why have
we choosen an open range <code>[begin; end)</code>? This will allow us to write down the loop end condtion in a very clear
and obvious way.</p>
<p>The implementation is given below:</p>
<pre class="code cpp literal-block">
<span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">begin</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="o">++</span><span class="n">begin</span><span class="p">;</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">end</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">begin</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">begin</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">max</span><span class="p">)</span><span class="w">
</span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>Using this implementation one can find the maximum element both in the whole array and in any part of it:</p>
<pre class="code cpp literal-block">
<span class="kt">int</span><span class="w"> </span><span class="n">array</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">};</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="n">array</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">6</span><span class="p">);</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">max_of_first3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="n">array</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span>
</pre>
<h2>Can we do even better?</h2>
<p>Our generalized function works great for arrays of any type, but arrays have a few drawbacks:
one can't resize an existing array, one can't push a new element to the front of an array, etc.
It would be great to use the same code for searching of the maximum element in different kinds of
data structures, e. g. in a <a class="reference external" href="http://en.wikipedia.org/wiki/Linked_list">linked list</a>.</p>
<p>But elements of a linked list aren't allocated linearly in memory - they are separate chunks of
memory connected with each other using pointers (<em>links</em>). That means we can't use
our function anymore, because the way we access data structure elements has changed, though
the algorithm itself remains completely the same: visit all the elements one after another and
compare each one with the current maximum element.</p>
<p>What really has changed is the way we access data structure elements. For arrays we could use
pointers arithmetic to calculate the address of an element we want (<code>base address + index * sizeof(T)</code>).
But to access an <code>i-th</code> element of a linked list one should go from the <code>head</code> of the list
to the <code>(i-1)-th</code> element one by one using pointers stored in the list nodes.</p>
<p>So here is the problem with our function: <strong>the way we access elements of used data structure
is tightly coupled with the algorithm which is implemented by our function</strong>. That means
we have to write a separate version of our function for all data structures we want to use
it for. And we all know that code duplication is a really bad thing which leads to errors and
difficults during the process of refactoring.</p>
<p>To solve this problem we have to decouple the algorithm from a data structure it processes.</p>
<p>Lets have a look at our function:</p>
<pre class="code cpp literal-block">
<span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">begin</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">*</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="o">++</span><span class="n">begin</span><span class="p">;</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">end</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">begin</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">begin</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">max</span><span class="p">)</span><span class="w">
</span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>How do we access the array elements?</p>
<ol class="arabic simple">
<li><code>*</code> – <em>dereference a pointer</em> – get the value of an element the pointer points to.</li>
<li><code>!=</code> – <em>not equal</em> – compare two pointers (to detect the end of a range).</li>
<li><code>++</code> – <em>increment a pointer</em> – move the pointer to the next element of an array.</li>
</ol>
<p>If we pass some objects to the <code>max_element()</code> function instead of passing pointers,
we can define the operations given above for these objects in a way they implement the
logic of accessing elements of different data structures (e. g. a linked list).</p>
<p>It is easy to do using templates and operators overloading facilities of <code>C++</code>.</p>
<p>So the final version of our function looks like this:</p>
<pre class="code cpp literal-block">
<span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">Iterator</span><span class="o">></span><span class="w">
</span><span class="n">Iterator</span><span class="w"> </span><span class="n">max_element</span><span class="p">(</span><span class="n">Iterator</span><span class="w"> </span><span class="n">begin</span><span class="p">,</span><span class="w"> </span><span class="n">Iterator</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">Iterator</span><span class="w"> </span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="o">++</span><span class="n">begin</span><span class="p">;</span><span class="w"> </span><span class="n">begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">end</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">begin</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">begin</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">max</span><span class="p">)</span><span class="w">
</span><span class="n">max</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">begin</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">max</span><span class="p">;</span><span class="w">
</span><span class="p">}</span>
</pre>
<p>So here we use objects of template parameter class <code>Iterator</code> instead of pointers. So what's an
iterator?</p>
<p>An <strong>iterator</strong> is a special object which allows one to access a data structure elements without
exposing its internal implementation. One works with a data structure by the means of a well defined
abstract interface of iterators.</p>
<p><code>C++</code> iterators use pointers semantics as their interface, but that's just an
implementation detail. It is important that all containers provide iterators with the same interface.
Users work with containers using iterators and don't know anything about how the containers are
actually implemented. This is the way algorithms and data structures are decoupled - one can use
the same algorithm for different data structures without any changes of the code.</p>
<h2>Your first iterator</h2>
<p>Consider the simplest implementation of a singly linked list. The implementation of a list node:</p>
<pre class="code cpp literal-block">
<span class="cp">#ifndef __LIST_NODE_H__
#define __LIST_NODE_H__
</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">struct</span><span class="w"> </span><span class="nc">Node</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">T</span><span class="w"> </span><span class="n">data</span><span class="p">;</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">next</span><span class="p">;</span><span class="w">
</span><span class="p">};</span><span class="w">
</span><span class="cp">#endif </span><span class="cm">/* __LIST_NODE_H__ */</span>
</pre>
<p>The implementation of a list container:</p>
<pre class="code cpp literal-block">
<span class="cp">#ifndef __LINKED_LIST_H__
#define __LINKED_LIST_H__
</span><span class="w">
</span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"list_node.h"</span><span class="cp">
#include</span><span class="w"> </span><span class="cpf">"list_iterator.h"</span><span class="cp">
</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">class</span><span class="w"> </span><span class="nc">LinkedList</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">public</span><span class="o">:</span><span class="w">
</span><span class="n">LinkedList</span><span class="p">();</span><span class="w">
</span><span class="o">~</span><span class="n">LinkedList</span><span class="p">();</span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="w"> </span><span class="n">begin</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="w"> </span><span class="n">end</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w">
</span><span class="kt">void</span><span class="w"> </span><span class="nf">push_front</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="n">elem</span><span class="p">);</span><span class="w">
</span><span class="kt">void</span><span class="w"> </span><span class="nf">push_back</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="n">elem</span><span class="p">);</span><span class="w">
</span><span class="k">private</span><span class="o">:</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">_head</span><span class="p">;</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">_tail</span><span class="p">;</span><span class="w">
</span><span class="p">};</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">LinkedList</span><span class="p">()</span><span class="w">
</span><span class="o">:</span><span class="w"> </span><span class="n">_head</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="n">_tail</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::~</span><span class="n">LinkedList</span><span class="p">()</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">_head</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_head</span><span class="o">-></span><span class="n">next</span><span class="p">;</span><span class="w">
</span><span class="k">delete</span><span class="w"> </span><span class="n">_head</span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="kt">void</span><span class="w"> </span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">push_front</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="n">elem</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">_head</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">_head</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="o">-></span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">elem</span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="o">-></span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_head</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">else</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">oldfirst</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_head</span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="o">-></span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">elem</span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="o">-></span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">oldfirst</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="kt">void</span><span class="w"> </span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">push_back</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="n">elem</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">_tail</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">_tail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="o">-></span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">elem</span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="o">-></span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="n">_head</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_tail</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">else</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">oldlast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_tail</span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="o">-></span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">elem</span><span class="p">;</span><span class="w">
</span><span class="n">_tail</span><span class="o">-></span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span><span class="n">oldlast</span><span class="o">-></span><span class="n">next</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_tail</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="w"> </span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">begin</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">_head</span><span class="p">);</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="w"> </span><span class="n">LinkedList</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">end</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="mi">0</span><span class="p">);</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="cp">#endif </span><span class="cm">/* __LINKED_LIST_H__ */</span>
</pre>
<p>We have implemented a minimal subset of list methods:</p>
<ul class="simple">
<li>data structure allocation/initialization and destruction/deallocation - <code>LinkedList()</code>, <code>~LinkedList()</code>;</li>
<li>adding of elements to the front and to the back of a list - <code>push_front()</code>, <code>push_back()</code>;</li>
<li>access to a list elements - methods that return iterators which point to the begin and to the
end of the list - <code>begin()</code>, <code>end()</code>.</li>
</ul>
<p>Using the iterators that are returned by <code>begin()/end()</code> pair one can access all the list elements.
The implementation of an iterator for a linked list data structure is given below:</p>
<pre class="code cpp literal-block">
<span class="cp">#ifndef __LIST_ITERATOR_H__
#define __LIST_ITERATOR_H__
</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">class</span><span class="w"> </span><span class="nc">ListIterator</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">public</span><span class="o">:</span><span class="w">
</span><span class="n">ListIterator</span><span class="p">(</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">node</span><span class="p">);</span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">node</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>&</span><span class="w"> </span><span class="k">operator</span><span class="o">++</span><span class="p">();</span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="k">operator</span><span class="o">*</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w">
</span><span class="kt">bool</span><span class="w"> </span><span class="k">operator</span><span class="o">!=</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>&</span><span class="w"> </span><span class="n">it</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w">
</span><span class="k">private</span><span class="o">:</span><span class="w">
</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">_currentNode</span><span class="p">;</span><span class="w">
</span><span class="p">};</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">ListIterator</span><span class="p">(</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">node</span><span class="p">)</span><span class="w">
</span><span class="o">:</span><span class="w"> </span><span class="n">_currentNode</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="w">
</span><span class="p">{</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>*</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="n">node</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">_currentNode</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>&</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="k">operator</span><span class="o">++</span><span class="p">()</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="n">_currentNode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_currentNode</span><span class="o">-></span><span class="n">next</span><span class="p">;</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="o">*</span><span class="k">this</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="k">const</span><span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="k">operator</span><span class="o">*</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">_currentNode</span><span class="o">-></span><span class="n">data</span><span class="p">;</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">template</span><span class="o"><</span><span class="k">class</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w">
</span><span class="kt">bool</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>::</span><span class="k">operator</span><span class="o">!=</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">ListIterator</span><span class="o"><</span><span class="n">T</span><span class="o">>&</span><span class="w"> </span><span class="n">it</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="k">return</span><span class="w"> </span><span class="n">_currentNode</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">it</span><span class="p">.</span><span class="n">node</span><span class="p">();</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="cp">#endif </span><span class="cm">/* __LIST_ITERATOR_H__ */</span>
</pre>
<p>An iterator instance is initialized with a pointer to a linked list node. Overloaded operators implement
the logic of accessing list nodes and values they contain.</p>
<p>Lets have a look at how our function for returning of an iterator which points to the maximum element of
a given data structure works for both arrays and linked lists:</p>
<pre class="code cpp literal-block">
<span class="n">LinkedList</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="w"> </span><span class="n">l</span><span class="p">;</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_front</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="mi">4</span><span class="p">);</span><span class="w">
</span><span class="n">l</span><span class="p">.</span><span class="n">push_front</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span><span class="w">
</span><span class="kt">int</span><span class="w"> </span><span class="n">arr</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span><span class="p">};</span><span class="w">
</span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"Max in list: "</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="o">*</span><span class="n">max_element</span><span class="p">(</span><span class="n">l</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span><span class="w"> </span><span class="n">l</span><span class="p">.</span><span class="n">end</span><span class="p">())</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">" "</span><span class="p">;</span><span class="w">
</span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"Max in array: "</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="o">*</span><span class="n">max_element</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span><span class="w"> </span><span class="n">arr</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">6</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">" "</span><span class="p">;</span>
</pre>
<h2>Conclusion</h2>
<p>This is only a small part of what you need to know about iterators. We've covered only one kind of iterators
- <strong>Forward Iterators</strong> which allows one to access elements moving forwards (using the <code>++</code> operator). There
are other kinds of iterators, e. g. <strong>Bidirectional Iterators</strong> which allows one to access elements moving
backwards too (<code>++</code>, <code>!=</code>, <code>*</code> operators are suplemented with <code>--</code> operator), etc.</p>
<p><code>C++</code> iterators use pointers semantics as their interface, but it's just an
implementation detail - any other interface could have been choosen. But the actual
implementation enables one to use raw pointers as iterators.</p>
<p>Iterators are a very important part of <code>STL</code> because they decouple algorithms from data structures
these algorithms work on. This way algorithms might be generalized to work with any data structure
as long as it provides the required kinds of iterators.</p>
<p>The source code of code snippets is available on <a class="reference external" href="https://github.com/malor/iterators-source">GitHub</a>.</p>