[LLVMdev] Expressiveness of column numbers in dwarf using clang 3.0?

trash-stuff at gmx.de trash-stuff at gmx.de
Tue May 31 11:17:32 PDT 2011


On 31.05.2011 19:45, Devang Patel wrote:
>
> On May 31, 2011, at 10:36 AM, trash-stuff at gmx.de 
> <mailto:trash-stuff at gmx.de> wrote:
>
>> On 31.05.2011 19:22, Devang Patel wrote:
>>>
>>> On May 30, 2011, at 11:11 AM,trash-stuff at gmx.de 
>>> <mailto:trash-stuff at gmx.de>wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am processing DWARF line and column information in (x86 and ARM) 
>>>> executables in order to produce a mapping from the machine 
>>>> instructions back to the original source code (C/C++). Using the 
>>>> line numbers is quite straightforward ("libdwarf" [1] is doing the 
>>>> work me.) But when comparing the column numbers (extracted from the 
>>>> DWARF line table) with the corresponding source code locations, it 
>>>> becomes clear that they are not very "useful".
>>>>
>>>> Consider the following small example (C++):
>>>>
>>>>      1: #include <iostream>
>>>>      2: #include <ctime>
>>>>      3: #include <cstdlib>
>>>>      4: using namespace std;
>>>>      5: int main() {
>>>>      6:    int j = 0; cin >> j; long sum = (j < 0 ? -5 : 4) + rand();
>>>>      7:    for(int i = 0; i < j; i++) { sum += j*j-2; cout << (sum
>>>>     / 2) << endl; }
>>>>      8:    srand(time(NULL));
>>>>      9:    double d = rand() / 10.341; int t = (int)d+j*sum;
>>>>     10:    cout << sum << d << t << j;
>>>>     11:    return (0);
>>>>     12: }
>>>>
>>>> Compiling this with "clang++ Main.cpp -g -O3 -o column" result in 
>>>> the following location information within the generated executable:
>>>>
>>>>     $ dwarfdump -l column
>>>>
>>>>     .debug_line: line number info for a single cu
>>>>     Source lines (from CU-DIE at .debug_info offset 11):
>>>>     <source file>     [line,column] <pc>    //<new stmt or basic block
>>>>     .../locale_facets.h:  [868, 2]    0x80488f0  // new statement
>>>>                    [...]
>>>>     .../Main.cpp: [  8, 2]    0x804896f  // new statement
>>>>     .../Main.cpp: [  9,28]    0x8048983  // new statement
>>>>     .../ostream:   [165, 9]    0x8048990  // new statement
>>>>     .../Main.cpp: [  9,28]    0x80489a0  // new statement
>>>>     .../ostream: [209, 9]    0x80489ac  // new statement
>>>>     .../Main.cpp: [  9,28]    0x80489b5  // new statement
>>>>     .../ostream: [209, 9]    0x80489bb  // new statement
>>>>                    [...]
>>>>     .../basic_ios.h:      [ 48, 2]    0x8048a23  // new statement
>>>>     // end of text sequence
>>>>
>>>> Now, have a look at source code line 9. The extracted debug info 
>>>> above says that we've 3 "instruction sets" (beginning 
>>>> at0x8048983,0x80489a0and0x80489b5respectively) which correspond to 
>>>> line 9. But all of them are labeled with column number 28! 
>>>> According to my understanding, this does not contribute any further 
>>>> information to support my task (= mapping assembler code back to 
>>>> the source lines or even to statements within a line). Did i miss 
>>>> anything?
>>>
>>> You are looking at the line table produced at -O3, i.e. after 
>>> aggressive optimizer had opportunities to optimize code. Try -O0 and 
>>> see if it helps.
>> First of all, thanks for your reply!
>>
>> I've already checked that at -O0 but it results in the same information.
>
> You mean, the instructions with given line and column number do not 
> match the source code construct at that location ?
No, they do.
>
>> (The documentation about "Source Level Debugging with LLVM" says 
>> "*LLVM debug information always provides information to accurately 
>> read the source-level state of the program, regardless of which LLVM 
>> optimizations have been run*, and without any modification to the 
>> optimizations themselves." [1])
>
> It means the instructions with given line and column number matches 
> the source code construct at that line/col number. It does not mean 
> that optimizer/code generator will not reorder instruction. It also 
> does not mean that optimizer/code generator will not emit instruction 
> without line number information. It means, if there is a line number 
> information, it is as accurate as possible to map source construct.
Yes, that matches my understanding, too. But I thought that clang would 
be able to emit *more* than one (different) column number per line. As 
in my example, for line number 9 (in Main.cpp), there are *three* 
entries in the DWARF line table. But all of them contain the *same* 
information. As a consequence, the associated assembler instructions 
were all mapped to the same source line and thus, the column information 
is useless...? I mean, what are the additional information included in 
the column numbers?

I extracted the assembler instructions for the 9th line (x86):
.../Main.cpp: 9
     double d = rand() / 10.341; int t = (int)d+j*sum;
                               ^
8048983:    e8 40 fe ff ff           call   80487c8 <rand at plt>
8048988:    89 c7                    mov    %eax,%edi
804898a:    8b 5d f0                 mov    -0x10(%ebp),%ebx
804898d:    0f af de                 imul   %esi,%ebx
80489a0:    f2 0f 2a c7              cvtsi2sd %edi,%xmm0
80489a4:    f2 0f 5e 05 f0 8a 04     divsd  0x8048af0,%xmm0
80489ab:    08
80489b5:    f2 0f 2c f0              cvttsd2si %xmm0,%esi
80489b9:    01 de                    add    %ebx,%esi

I hope that makes it clearer... ;-)

BTW, any hints to my cross-compilation-related question?

Best regards
   Adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110531/d0a5296b/attachment.html>


More information about the llvm-dev mailing list