[LLVMdev] Expressiveness of column numbers in dwarf using clang 3.0?
trash-stuff at gmx.de
trash-stuff at gmx.de
Tue May 31 12:11:39 PDT 2011
Update: I've found out, that the location information are possibly
incorrect,
if they point to standard C/C++ headers as shown in the following listing:
--------------------------------------------------------------------------------
[...]
/usr/include/c++/4.4/bits/basic_ios.h: 48
if (!__f)
^
58: 80489e5: e8 2e fd ff ff call 8048718
<_ZSt16__throw_bad_castv at plt>
59: 80489ea: 66 0f 1f 44 00 00 nopw 0x0(%eax,%eax,1)
60: 80489f0: 55 push %ebp
61: 80489f1: 89 e5 mov %esp,%ebp
62: 80489f3: 83 ec 18 sub $0x18,%esp
63: 80489f6: c7 04 24 94 a1 04 08 movl $0x804a194,(%esp)
64: 80489fd: e8 56 fd ff ff call 8048758
<_ZNSt8ios_base4InitC1Ev at plt>
65: 8048a02: c7 44 24 08 44 a0 04 movl $0x804a044,0x8(%esp)
66: 8048a09: 08
67: 8048a0a: c7 44 24 04 94 a1 04 movl $0x804a194,0x4(%esp)
68: 8048a11: 08
69: 8048a12: c7 04 24 78 87 04 08 movl $0x8048778,(%esp)
70: 8048a19: e8 ea fc ff ff call 8048708
<__cxa_atexit at plt>
71: 8048a1e: 83 c4 18 add $0x18,%esp
72: 8048a21: 5d pop %ebp
73: 8048a22: c3 ret
--------------------------------------------------------------------------------
/usr/include/c++/4.4/bits/basic_ios.h: 439
widen(char __c) const
^
74: 8048958: 8b 5c 30 7c mov 0x7c(%eax,%esi,1),%ebx
75: 804895c: 85 db test %ebx,%ebx
76: 804895e: 0f 84 81 00 00 00 je 80489e5 <main+0x135>
--------------------------------------------------------------------------------
/usr/include/c++/4.4/bits/locale_facets.h: 866
{
^
77: 8048964: 80 7b 1c 00 cmpb $0x0,0x1c(%ebx)
78: 8048968: 74 86 je 80488f0 <main+0x40>
--------------------------------------------------------------------------------
/usr/include/c++/4.4/bits/locale_facets.h: 867
if (_M_widen_ok)
^
79: 804896a: 8a 43 27 mov 0x27(%ebx),%al
80: 804896d: eb 99 jmp 8048908 <main+0x58>
--------------------------------------------------------------------------------
/usr/include/c++/4.4/bits/locale_facets.h: 868
return _M_widen[static_cast<unsigned char>(__c)];
^
81: 80488f0: 89 1c 24 mov %ebx,(%esp)
82: 80488f3: e8 50 fe ff ff call 8048748
<_ZNKSt5ctypeIcE13_M_widen_initEv at plt>
--------------------------------------------------------------------------------
/usr/include/c++/4.4/bits/locale_facets.h: 869
this->_M_widen_init();
^
83: 80488f8: 8b 03 mov (%ebx),%eax
84: 80488fa: 89 1c 24 mov %ebx,(%esp)
85: 80488fd: c7 44 24 04 0a 00 00 movl $0xa,0x4(%esp)
86: 8048904: 00
87: 8048905: ff 50 18 call *0x18(%eax)
--------------------------------------------------------------------------------
[...]
--------------------------------------------------------------------------------
/usr/include/c++/4.4/ostream: 538
endl(basic_ostream<_CharT, _Traits>& __os)
^
98: 8048908: 0f be c0 movsbl %al,%eax
99: 804890b: 89 44 24 04 mov %eax,0x4(%esp)
100: 804890f: 89 34 24 mov %esi,(%esp)
101: 8048912: e8 c1 fe ff ff call 80487d8
<_ZNSo3putEc at plt>
102: 8048953: 8b 06 mov (%esi),%eax
103: 8048955: 8b 40 f4 mov -0xc(%eax),%eax
--------------------------------------------------------------------------------
/usr/include/c++/4.4/ostream: 559
flush(basic_ostream<_CharT, _Traits>& __os)
^
104: 8048917: 89 04 24 mov %eax,(%esp)
105: 804891a: e8 79 fe ff ff call 8048798
<_ZNSo5flushEv at plt>
106: 804891f: 8b 75 ec mov -0x14(%ebp),%esi
107: 8048922: 47 inc %edi
--------------------------------------------------------------------------------
(The "^" marks the column position within the line.)
I am not completely sure but the mapping of line 868 in file
"locale_facets.h" might be wrong: There is a call-instruction which
calls "_M_widen_init" but this function is effectively called in the
next line (869).
Here is the extract from locale_facets.h:
char_type
widen(char __c) const
{
if (_M_widen_ok)
return _M_widen[static_cast<unsigned char>(__c)];
this->_M_widen_init();
return this->do_widen(__c);
}
In addition, line 48 of "basic_ios.h" contains a ret-instruction which
should be mapping to a return- or throw-statement. Thecolumnnumbersare
obviouslywrong.
Are these interpretations correct?
Best regards
Adrian
On 31.05.2011 20:17, trash-stuff at gmx.de wrote:
> On 31.05.2011 19:45, Devang Patel wrote:
>>
>> On May 31, 2011, at 10:36 AM, trash-stuff at gmx.de
>> <mailto:trash-stuff at gmx.de> wrote:
>>
>>> On 31.05.2011 19:22, Devang Patel wrote:
>>>>
>>>> On May 30, 2011, at 11:11 AM,trash-stuff at gmx.de
>>>> <mailto:trash-stuff at gmx.de>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am processing DWARF line and column information in (x86 and ARM)
>>>>> executables in order to produce a mapping from the machine
>>>>> instructions back to the original source code (C/C++). Using the
>>>>> line numbers is quite straightforward ("libdwarf" [1] is doing the
>>>>> work me.) But when comparing the column numbers (extracted from
>>>>> the DWARF line table) with the corresponding source code
>>>>> locations, it becomes clear that they are not very "useful".
>>>>>
>>>>> Consider the following small example (C++):
>>>>>
>>>>> 1: #include <iostream>
>>>>> 2: #include <ctime>
>>>>> 3: #include <cstdlib>
>>>>> 4: using namespace std;
>>>>> 5: int main() {
>>>>> 6: int j = 0; cin >> j; long sum = (j < 0 ? -5 : 4) + rand();
>>>>> 7: for(int i = 0; i < j; i++) { sum += j*j-2; cout << (sum
>>>>> / 2) << endl; }
>>>>> 8: srand(time(NULL));
>>>>> 9: double d = rand() / 10.341; int t = (int)d+j*sum;
>>>>> 10: cout << sum << d << t << j;
>>>>> 11: return (0);
>>>>> 12: }
>>>>>
>>>>> Compiling this with "clang++ Main.cpp -g -O3 -o column" result in
>>>>> the following location information within the generated executable:
>>>>>
>>>>> $ dwarfdump -l column
>>>>>
>>>>> .debug_line: line number info for a single cu
>>>>> Source lines (from CU-DIE at .debug_info offset 11):
>>>>> <source file> [line,column] <pc> //<new stmt or basic block
>>>>> .../locale_facets.h: [868, 2] 0x80488f0 // new statement
>>>>> [...]
>>>>> .../Main.cpp: [ 8, 2] 0x804896f // new statement
>>>>> .../Main.cpp: [ 9,28] 0x8048983 // new statement
>>>>> .../ostream: [165, 9] 0x8048990 // new statement
>>>>> .../Main.cpp: [ 9,28] 0x80489a0 // new statement
>>>>> .../ostream: [209, 9] 0x80489ac // new statement
>>>>> .../Main.cpp: [ 9,28] 0x80489b5 // new statement
>>>>> .../ostream: [209, 9] 0x80489bb // new statement
>>>>> [...]
>>>>> .../basic_ios.h: [ 48, 2] 0x8048a23 // new statement
>>>>> // end of text sequence
>>>>>
>>>>> Now, have a look at source code line 9. The extracted debug info
>>>>> above says that we've 3 "instruction sets" (beginning
>>>>> at0x8048983,0x80489a0and0x80489b5respectively) which correspond to
>>>>> line 9. But all of them are labeled with column number 28!
>>>>> According to my understanding, this does not contribute any
>>>>> further information to support my task (= mapping assembler code
>>>>> back to the source lines or even to statements within a line). Did
>>>>> i miss anything?
>>>>
>>>> You are looking at the line table produced at -O3, i.e. after
>>>> aggressive optimizer had opportunities to optimize code. Try -O0
>>>> and see if it helps.
>>> First of all, thanks for your reply!
>>>
>>> I've already checked that at -O0 but it results in the same information.
>>
>> You mean, the instructions with given line and column number do not
>> match the source code construct at that location ?
> No, they do.
>>
>>> (The documentation about "Source Level Debugging with LLVM" says
>>> "*LLVM debug information always provides information to accurately
>>> read the source-level state of the program, regardless of which LLVM
>>> optimizations have been run*, and without any modification to the
>>> optimizations themselves." [1])
>>
>> It means the instructions with given line and column number matches
>> the source code construct at that line/col number. It does not mean
>> that optimizer/code generator will not reorder instruction. It also
>> does not mean that optimizer/code generator will not emit instruction
>> without line number information. It means, if there is a line number
>> information, it is as accurate as possible to map source construct.
> Yes, that matches my understanding, too. But I thought that clang
> would be able to emit *more* than one (different) column number per
> line. As in my example, for line number 9 (in Main.cpp), there are
> *three* entries in the DWARF line table. But all of them contain the
> *same* information. As a consequence, the associated assembler
> instructions were all mapped to the same source line and thus, the
> column information is useless...? I mean, what are the additional
> information included in the column numbers?
>
> I extracted the assembler instructions for the 9th line (x86):
> .../Main.cpp: 9
> double d = rand() / 10.341; int t = (int)d+j*sum;
> ^
> 8048983: e8 40 fe ff ff call 80487c8 <rand at plt>
> 8048988: 89 c7 mov %eax,%edi
> 804898a: 8b 5d f0 mov -0x10(%ebp),%ebx
> 804898d: 0f af de imul %esi,%ebx
> 80489a0: f2 0f 2a c7 cvtsi2sd %edi,%xmm0
> 80489a4: f2 0f 5e 05 f0 8a 04 divsd 0x8048af0,%xmm0
> 80489ab: 08
> 80489b5: f2 0f 2c f0 cvttsd2si %xmm0,%esi
> 80489b9: 01 de add %ebx,%esi
>
> I hope that makes it clearer... ;-)
>
> BTW, any hints to my cross-compilation-related question?
>
> Best regards
> Adrian
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110531/d8e86ff1/attachment.html>
More information about the llvm-dev
mailing list