[LLVMdev] Expressiveness of column numbers in dwarf using clang 3.0?

Tue May 31 12:11:39 PDT 2011

Update: I've found out, that the location information are possibly 
if they point to standard C/C++ headers as shown in the following listing:
/usr/include/c++/4.4/bits/basic_ios.h: 48
       if (!__f)
   58:  80489e5:    e8 2e fd ff ff           call   8048718 
<_ZSt16__throw_bad_castv at plt>
   59:  80489ea:    66 0f 1f 44 00 00        nopw   0x0(%eax,%eax,1)
   60:  80489f0:    55                       push   %ebp
   61:  80489f1:    89 e5                    mov    %esp,%ebp
   62:  80489f3:    83 ec 18                 sub    $0x18,%esp
   63:  80489f6:    c7 04 24 94 a1 04 08     movl   $0x804a194,(%esp)
   64:  80489fd:    e8 56 fd ff ff           call   8048758 
<_ZNSt8ios_base4InitC1Ev at plt>
   65:  8048a02:    c7 44 24 08 44 a0 04     movl   $0x804a044,0x8(%esp)
   66:  8048a09:    08
   67:  8048a0a:    c7 44 24 04 94 a1 04     movl   $0x804a194,0x4(%esp)
   68:  8048a11:    08
   69:  8048a12:    c7 04 24 78 87 04 08     movl   $0x8048778,(%esp)
   70:  8048a19:    e8 ea fc ff ff           call   8048708 
<__cxa_atexit at plt>
   71:  8048a1e:    83 c4 18                 add    $0x18,%esp
   72:  8048a21:    5d                       pop    %ebp
   73:  8048a22:    c3                       ret
/usr/include/c++/4.4/bits/basic_ios.h: 439
       widen(char __c) const
   74:  8048958:    8b 5c 30 7c              mov    0x7c(%eax,%esi,1),%ebx
   75:  804895c:    85 db                    test   %ebx,%ebx
   76:  804895e:    0f 84 81 00 00 00        je     80489e5 <main+0x135>
/usr/include/c++/4.4/bits/locale_facets.h: 866
   77:  8048964:    80 7b 1c 00              cmpb   $0x0,0x1c(%ebx)
   78:  8048968:    74 86                    je     80488f0 <main+0x40>
/usr/include/c++/4.4/bits/locale_facets.h: 867
     if (_M_widen_ok)
   79:  804896a:    8a 43 27                 mov    0x27(%ebx),%al
   80:  804896d:    eb 99                    jmp    8048908 <main+0x58>
/usr/include/c++/4.4/bits/locale_facets.h: 868
       return _M_widen[static_cast<unsigned char>(__c)];
   81:  80488f0:    89 1c 24                 mov    %ebx,(%esp)
   82:  80488f3:    e8 50 fe ff ff           call   8048748 
<_ZNKSt5ctypeIcE13_M_widen_initEv at plt>
/usr/include/c++/4.4/bits/locale_facets.h: 869
   83:  80488f8:    8b 03                    mov    (%ebx),%eax
   84:  80488fa:    89 1c 24                 mov    %ebx,(%esp)
   85:  80488fd:    c7 44 24 04 0a 00 00     movl   $0xa,0x4(%esp)
   86:  8048904:    00
   87:  8048905:    ff 50 18                 call   *0x18(%eax)
/usr/include/c++/4.4/ostream: 538
     endl(basic_ostream<_CharT, _Traits>& __os)
   98:  8048908:    0f be c0                 movsbl %al,%eax
   99:  804890b:    89 44 24 04              mov    %eax,0x4(%esp)
  100:  804890f:    89 34 24                 mov    %esi,(%esp)
  101:  8048912:    e8 c1 fe ff ff           call   80487d8 
<_ZNSo3putEc at plt>
  102:  8048953:    8b 06                    mov    (%esi),%eax
  103:  8048955:    8b 40 f4                 mov    -0xc(%eax),%eax
/usr/include/c++/4.4/ostream: 559
     flush(basic_ostream<_CharT, _Traits>& __os)
  104:  8048917:    89 04 24                 mov    %eax,(%esp)
  105:  804891a:    e8 79 fe ff ff           call   8048798 
<_ZNSo5flushEv at plt>
  106:  804891f:    8b 75 ec                 mov    -0x14(%ebp),%esi
  107:  8048922:    47                       inc    %edi
(The "^" marks the column position within the line.)

I am not completely sure but the mapping of line 868 in file 
"locale_facets.h" might be wrong: There is a call-instruction which 
calls "_M_widen_init" but this function is effectively called in the 
next line (869).

Here is the extract from locale_facets.h:
       widen(char __c) const
     if (_M_widen_ok)
       return _M_widen[static_cast<unsigned char>(__c)];
     return this->do_widen(__c);

In addition, line 48 of "basic_ios.h" contains a ret-instruction which 
should be mapping to a return- or throw-statement. Thecolumnnumbersare 

Are these interpretations correct?

Best regards

On 31.05.2011 20:17, trash-stuff at gmx.de wrote:
> On 31.05.2011 19:45, Devang Patel wrote:
>> On May 31, 2011, at 10:36 AM, trash-stuff at gmx.de 
>> <mailto:trash-stuff at gmx.de> wrote:
>>> On 31.05.2011 19:22, Devang Patel wrote:
>>>> On May 30, 2011, at 11:11 AM,trash-stuff at gmx.de 
>>>> <mailto:trash-stuff at gmx.de>wrote:
>>>>> Hi all,
>>>>> I am processing DWARF line and column information in (x86 and ARM) 
>>>>> executables in order to produce a mapping from the machine 
>>>>> instructions back to the original source code (C/C++). Using the 
>>>>> line numbers is quite straightforward ("libdwarf" [1] is doing the 
>>>>> work me.) But when comparing the column numbers (extracted from 
>>>>> the DWARF line table) with the corresponding source code 
>>>>> locations, it becomes clear that they are not very "useful".
>>>>> Consider the following small example (C++):
>>>>>      1: #include <iostream>
>>>>>      2: #include <ctime>
>>>>>      3: #include <cstdlib>
>>>>>      4: using namespace std;
>>>>>      5: int main() {
>>>>>      6:    int j = 0; cin >> j; long sum = (j < 0 ? -5 : 4) + rand();
>>>>>      7:    for(int i = 0; i < j; i++) { sum += j*j-2; cout << (sum
>>>>>     / 2) << endl; }
>>>>>      8:    srand(time(NULL));
>>>>>      9:    double d = rand() / 10.341; int t = (int)d+j*sum;
>>>>>     10:    cout << sum << d << t << j;
>>>>>     11:    return (0);
>>>>>     12: }
>>>>> Compiling this with "clang++ Main.cpp -g -O3 -o column" result in 
>>>>> the following location information within the generated executable:
>>>>>     $ dwarfdump -l column
>>>>>     .debug_line: line number info for a single cu
>>>>>     Source lines (from CU-DIE at .debug_info offset 11):
>>>>>     <source file>     [line,column] <pc>    //<new stmt or basic block
>>>>>     .../locale_facets.h:  [868, 2]    0x80488f0  // new statement
>>>>>                    [...]
>>>>>     .../Main.cpp: [  8, 2]    0x804896f  // new statement
>>>>>     .../Main.cpp: [  9,28]    0x8048983  // new statement
>>>>>     .../ostream:   [165, 9]    0x8048990  // new statement
>>>>>     .../Main.cpp: [  9,28]    0x80489a0  // new statement
>>>>>     .../ostream: [209, 9]    0x80489ac  // new statement
>>>>>     .../Main.cpp: [  9,28]    0x80489b5  // new statement
>>>>>     .../ostream: [209, 9]    0x80489bb  // new statement
>>>>>                    [...]
>>>>>     .../basic_ios.h:      [ 48, 2]    0x8048a23  // new statement
>>>>>     // end of text sequence
>>>>> Now, have a look at source code line 9. The extracted debug info 
>>>>> above says that we've 3 "instruction sets" (beginning 
>>>>> at0x8048983,0x80489a0and0x80489b5respectively) which correspond to 
>>>>> line 9. But all of them are labeled with column number 28! 
>>>>> According to my understanding, this does not contribute any 
>>>>> further information to support my task (= mapping assembler code 
>>>>> back to the source lines or even to statements within a line). Did 
>>>>> i miss anything?
>>>> You are looking at the line table produced at -O3, i.e. after 
>>>> aggressive optimizer had opportunities to optimize code. Try -O0 
>>>> and see if it helps.
>>> First of all, thanks for your reply!
>>> I've already checked that at -O0 but it results in the same information.
>> You mean, the instructions with given line and column number do not 
>> match the source code construct at that location ?
> No, they do.
>>> (The documentation about "Source Level Debugging with LLVM" says 
>>> "*LLVM debug information always provides information to accurately 
>>> read the source-level state of the program, regardless of which LLVM 
>>> optimizations have been run*, and without any modification to the 
>>> optimizations themselves." [1])
>> It means the instructions with given line and column number matches 
>> the source code construct at that line/col number. It does not mean 
>> that optimizer/code generator will not reorder instruction. It also 
>> does not mean that optimizer/code generator will not emit instruction 
>> without line number information. It means, if there is a line number 
>> information, it is as accurate as possible to map source construct.
> Yes, that matches my understanding, too. But I thought that clang 
> would be able to emit *more* than one (different) column number per 
> line. As in my example, for line number 9 (in Main.cpp), there are 
> *three* entries in the DWARF line table. But all of them contain the 
> *same* information. As a consequence, the associated assembler 
> instructions were all mapped to the same source line and thus, the 
> column information is useless...? I mean, what are the additional 
> information included in the column numbers?
> I extracted the assembler instructions for the 9th line (x86):
> .../Main.cpp: 9
>     double d = rand() / 10.341; int t = (int)d+j*sum;
>                               ^
> 8048983:    e8 40 fe ff ff           call   80487c8 <rand at plt>
> 8048988:    89 c7                    mov    %eax,%edi
> 804898a:    8b 5d f0                 mov    -0x10(%ebp),%ebx
> 804898d:    0f af de                 imul   %esi,%ebx
> 80489a0:    f2 0f 2a c7              cvtsi2sd %edi,%xmm0
> 80489a4:    f2 0f 5e 05 f0 8a 04     divsd  0x8048af0,%xmm0
> 80489ab:    08
> 80489b5:    f2 0f 2c f0              cvttsd2si %xmm0,%esi
> 80489b9:    01 de                    add    %ebx,%esi
> I hope that makes it clearer... ;-)
> BTW, any hints to my cross-compilation-related question?
> Best regards
>   Adrian
