[LLVMdev] Expressiveness of column numbers in dwarf using clang 3.0?
trash-stuff at gmx.de
trash-stuff at gmx.de
Tue May 31 11:17:32 PDT 2011
On 31.05.2011 19:45, Devang Patel wrote:
>
> On May 31, 2011, at 10:36 AM, trash-stuff at gmx.de
> <mailto:trash-stuff at gmx.de> wrote:
>
>> On 31.05.2011 19:22, Devang Patel wrote:
>>>
>>> On May 30, 2011, at 11:11 AM,trash-stuff at gmx.de
>>> <mailto:trash-stuff at gmx.de>wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am processing DWARF line and column information in (x86 and ARM)
>>>> executables in order to produce a mapping from the machine
>>>> instructions back to the original source code (C/C++). Using the
>>>> line numbers is quite straightforward ("libdwarf" [1] is doing the
>>>> work me.) But when comparing the column numbers (extracted from the
>>>> DWARF line table) with the corresponding source code locations, it
>>>> becomes clear that they are not very "useful".
>>>>
>>>> Consider the following small example (C++):
>>>>
>>>> 1: #include <iostream>
>>>> 2: #include <ctime>
>>>> 3: #include <cstdlib>
>>>> 4: using namespace std;
>>>> 5: int main() {
>>>> 6: int j = 0; cin >> j; long sum = (j < 0 ? -5 : 4) + rand();
>>>> 7: for(int i = 0; i < j; i++) { sum += j*j-2; cout << (sum
>>>> / 2) << endl; }
>>>> 8: srand(time(NULL));
>>>> 9: double d = rand() / 10.341; int t = (int)d+j*sum;
>>>> 10: cout << sum << d << t << j;
>>>> 11: return (0);
>>>> 12: }
>>>>
>>>> Compiling this with "clang++ Main.cpp -g -O3 -o column" result in
>>>> the following location information within the generated executable:
>>>>
>>>> $ dwarfdump -l column
>>>>
>>>> .debug_line: line number info for a single cu
>>>> Source lines (from CU-DIE at .debug_info offset 11):
>>>> <source file> [line,column] <pc> //<new stmt or basic block
>>>> .../locale_facets.h: [868, 2] 0x80488f0 // new statement
>>>> [...]
>>>> .../Main.cpp: [ 8, 2] 0x804896f // new statement
>>>> .../Main.cpp: [ 9,28] 0x8048983 // new statement
>>>> .../ostream: [165, 9] 0x8048990 // new statement
>>>> .../Main.cpp: [ 9,28] 0x80489a0 // new statement
>>>> .../ostream: [209, 9] 0x80489ac // new statement
>>>> .../Main.cpp: [ 9,28] 0x80489b5 // new statement
>>>> .../ostream: [209, 9] 0x80489bb // new statement
>>>> [...]
>>>> .../basic_ios.h: [ 48, 2] 0x8048a23 // new statement
>>>> // end of text sequence
>>>>
>>>> Now, have a look at source code line 9. The extracted debug info
>>>> above says that we've 3 "instruction sets" (beginning
>>>> at0x8048983,0x80489a0and0x80489b5respectively) which correspond to
>>>> line 9. But all of them are labeled with column number 28!
>>>> According to my understanding, this does not contribute any further
>>>> information to support my task (= mapping assembler code back to
>>>> the source lines or even to statements within a line). Did i miss
>>>> anything?
>>>
>>> You are looking at the line table produced at -O3, i.e. after
>>> aggressive optimizer had opportunities to optimize code. Try -O0 and
>>> see if it helps.
>> First of all, thanks for your reply!
>>
>> I've already checked that at -O0 but it results in the same information.
>
> You mean, the instructions with given line and column number do not
> match the source code construct at that location ?
No, they do.
>
>> (The documentation about "Source Level Debugging with LLVM" says
>> "*LLVM debug information always provides information to accurately
>> read the source-level state of the program, regardless of which LLVM
>> optimizations have been run*, and without any modification to the
>> optimizations themselves." [1])
>
> It means the instructions with given line and column number matches
> the source code construct at that line/col number. It does not mean
> that optimizer/code generator will not reorder instruction. It also
> does not mean that optimizer/code generator will not emit instruction
> without line number information. It means, if there is a line number
> information, it is as accurate as possible to map source construct.
Yes, that matches my understanding, too. But I thought that clang would
be able to emit *more* than one (different) column number per line. As
in my example, for line number 9 (in Main.cpp), there are *three*
entries in the DWARF line table. But all of them contain the *same*
information. As a consequence, the associated assembler instructions
were all mapped to the same source line and thus, the column information
is useless...? I mean, what are the additional information included in
the column numbers?
I extracted the assembler instructions for the 9th line (x86):
.../Main.cpp: 9
double d = rand() / 10.341; int t = (int)d+j*sum;
^
8048983: e8 40 fe ff ff call 80487c8 <rand at plt>
8048988: 89 c7 mov %eax,%edi
804898a: 8b 5d f0 mov -0x10(%ebp),%ebx
804898d: 0f af de imul %esi,%ebx
80489a0: f2 0f 2a c7 cvtsi2sd %edi,%xmm0
80489a4: f2 0f 5e 05 f0 8a 04 divsd 0x8048af0,%xmm0
80489ab: 08
80489b5: f2 0f 2c f0 cvttsd2si %xmm0,%esi
80489b9: 01 de add %ebx,%esi
I hope that makes it clearer... ;-)
BTW, any hints to my cross-compilation-related question?
Best regards
Adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110531/d0a5296b/attachment.html>
More information about the llvm-dev
mailing list