[LLVMdev] More DWARF problems

Talin viridia at gmail.com
Thu Apr 7 12:14:49 PDT 2011


On Sat, Apr 2, 2011 at 11:03 PM, Talin <viridia at gmail.com> wrote:

>
>
> On Wed, Mar 30, 2011 at 11:17 AM, Devang Patel <dpatel at apple.com> wrote:
>
>>
>> On Mar 29, 2011, at 7:29 PM, Talin wrote:
>>
>> I've been trying to track down the problem with the DWARF info that is
>> being emitted by my front end, which has been broken for about a month now.
>> Here's what happens when I attempt to use gdb to debug one of my programs on
>> OS X:
>>
>> gdb stack crawl at point of internal error:
>> [ 0 ] /usr/libexec/gdb/gdb-i386-apple-darwin (align_down+0x0) [0x122300]
>> [ 1 ] /usr/libexec/gdb/gdb-i386-apple-darwin
>> (find_partial_die_in_comp_unit+0x65) [0xc0e19]
>> [ 2 ] /usr/libexec/gdb/gdb-i386-apple-darwin (find_partial_die+0x2d4)
>> [0xcf07f]
>> [ 3 ] /usr/libexec/gdb/gdb-i386-apple-darwin (fixup_partial_die+0x29)
>> [0xcf0b3]
>> [ 4 ] /usr/libexec/gdb/gdb-i386-apple-darwin (scan_partial_symbols+0x26)
>> [0xcf9e7]
>> [ 5 ] /usr/libexec/gdb/gdb-i386-apple-darwin (dwarf2_build_psymtabs+0xc54)
>> [0xd093c]
>> [ 6 ] /usr/libexec/gdb/gdb-i386-apple-darwin (macho_symfile_read+0x145)
>> [0x163b15]
>> [ 7 ] /usr/libexec/gdb/gdb-i386-apple-darwin (syms_from_objfile+0x62d)
>> [0x52259]
>> [ 8 ] /usr/libexec/gdb/gdb-i386-apple-darwin
>> (symbol_file_add_with_addrs_or_offsets_using_objfile+0x338) [0x561e7]
>>  [ 9 ] /usr/libexec/gdb/gdb-i386-apple-darwin
>> (symbol_file_add_with_addrs_or_offsets_using_objfile+0x2da) [0x56189]
>> [ 10 ] /usr/libexec/gdb/gdb-i386-apple-darwin
>> (symbol_file_add_name_with_addrs_or_offsets+0x7a) [0x563c9]
>> [ 11 ] /usr/libexec/gdb/gdb-i386-apple-darwin
>> (symbol_file_add_main_1+0xf2) [0x56e36]
>> [ 12 ] /usr/libexec/gdb/gdb-i386-apple-darwin (catch_command_errors+0x4d)
>> [0x7ac88]
>> /SourceCache/gdb/gdb-966/src/gdb/dwarf2read.c:7593: internal-error: could
>> not find partial DIE in cache
>>
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>> Quit this debugging session? (y or n)
>>
>>
>> Now, all of this was working earlier, and I don't know whether it was
>> something I did or a change in LLVM, but that's not important. The real
>> question is how to track down the problem.
>>
>>
>> I have seen gdb crash with this back trace when it has seen a subprogram
>> specification DIE at top level, but the actual subprogram definition is not
>> found. The definition DIE may not be found because either it is hiding deep
>> in nested subclass or it may be missing  all together in compiler output.
>>  One easy way to rule out this is to check all specification DIE's
>> indentation level in dwarfdump output and check corresponding level of
>> definition die referred by it.
>>
>>
>> In the past, the way that I have dealt with DWARF-related problems is to
>> try a number of strategies:
>>
>> 1) Reduce the problem to the smallest reproducible case. In the past I
>> have had some success with this, but not in this case. You see, one of the
>> problems with object-oriented languages is that even simple operations -
>> such as appending an element to an array - can end up pulling in a very
>> large number of classes (For example, the array class might throw an
>> exception if your index is invalid, which pulls in the exception hierarchy
>> and so on...)
>>
>> I have a special script which attempts to compile a "minimal" test case,
>> without the standard library and with garbage collection disabled.
>> Unfortunately, none of the "small" test cases that I have been able to come
>> up with exhibit the problem, and any time I use certain language features I
>> am forced to link in the standard library which makes the test program huge.
>> I have plenty of example cases which exhibit the problem, but they are all
>> bitcode files on the order of 100K or more in size. And I'm not going to
>> have much luck tracking down a needle in such a large haystack.
>>
>> 2) Use dwarfdump to try and verify the validity of the debug symbols.
>>
>> Unfortunately, the information from dwarfdump is not too useful in this
>> case. Here's what I get:
>>
>>    - On OS X, with the "small" test cases I created, I get no errors at
>>    all.
>>    - On OS X, with my normal unit tests (with the standard library) I get
>>    hundreds of error messages of the following form:
>>
>>     0x00000882: DIE attribute 0x00000883:  AT_type/FORM_ref4 has a value
>> 0x00000592 that is not in the current compile unit in the .debug_info
>> section.
>>
>>
>> This indicates that while DwarfDebug.cpp was preparing dwarf info, it
>> created a DIE  0x00000592 that was referred by another DIE 0x00000883 but
>> somehow DIE 0x00000592 was not emitted. This could be a bug in
>> DwarfDebug.cpp or how debug info is  generated by FE.
>>
>> In DwarfDebug.cpp, you'll see code like
>>
>>       addDIEEntry(VariableSpecDIE,
>> dwarf::DW_AT_specification,  dwarf::DW_FORM_ref4, VariableDIE);
>>
>> Here VariableSpecDIE is referring VariableDIE, but VariableDIE is missing
>> from the output. There are other uses of DW_FORM_ref4 also. So check in our
>> dwarfdump output what is  0x00000883 and set appropriate breakpoint in
>> debugger and see why it is not reaching to DwarfDebug::emitDIE().
>>
>
> OK I've been checking this out some more, and the DIEs don't look valid to
> me. Take a look at this output from dwarfdump -v:
>
> 0x000000c7:     TAG_subprogram [3]
> 0x000000c8:      AT_name( .debug_str[0x000001bd] = "construct" )
> 0x000000cc:      AT_MIPS_linkage_name( .debug_str[0x000001c7] =
> "tart.reflect.Parameter.construct(tart.core.String)" )
> 0x000000d0:      AT_decl_file( 0x3d (
> "/Users/talin/Projects/tart/trunk/lib/std/tart/reflect/Parameter.tart" ) )
> 0x000000d1:      AT_decl_line( 0x0d ( 13 ) )
> 0x000000d2:      AT_type( cu + 0x00000066 => {0x00000103} (  ) )
> 0x000000d6:      AT_external( 0x01 )
> 0x000000d7:      AT_low_pc( 0x0000f780 )
> 0x000000db:      AT_high_pc( 0x0000f7b1 )
> 0x000000df:      AT_frame_base( <0x1> 55  ( reg5 ) )
>
> 0x000000e1:     NULL
>
> 0x000000e2: Compile Unit: length = 0x00000071  version = 0x0002
>  abbr_offset = 0x00000000  addr_size = 0x04  (next CU at 0x00000157)
>
> 0x000000ed: TAG_compile_unit [1] *
> 0x000000ee:  AT_producer( .debug_str[0x00000001] = "0.1 tartc" )
> 0x000000f2:  AT_language( 0x0002 ( DW_LANG_C ) )
> 0x000000f4:  AT_name( .debug_str[0x000001fa] = "range.tart" )
> 0x000000f8:  AT_entry_pc( 0x00004360 )
> 0x000000fc:  AT_stmt_list( 0x00000000 ( 0x00000000 ) )
> 0x00000100:  AT_comp_dir( .debug_str[0x00000205] =
> "/Users/talin/Projects/tart/trunk/lib/std/tart/core" )
> 0x00000104:  AT_APPLE_major_runtime_vers( 0x01 )
>
> In particular note that the DIE starting at 0x0c7, which is a
> TAG_subprogram, has a return type (AT_type) which points to 0x103. However
> if you look further down, you'll see that there is no DIE at offset 0x103.
> Instead it looks like it's pointing into the middle of another DIE.
>

Not to be a pest, but I'm still stuck on this one.

>
> At least, this is true if I'm interpreting this right.
>
>
>> -
>> Devang
>>
>>     0x000009a9: DIE attribute 0x000009ae:  AT_type/FORM_ref4 has a value
>> 0x000001c2 that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000b85: DIE attribute 0x00000b8a:  AT_type/FORM_ref4 has a value
>> 0x0000055c that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000c88: DIE attribute 0x00000c89:  AT_type/FORM_ref4 has a value
>> 0x0000055c that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000d2f: DIE attribute 0x00000d34:  AT_type/FORM_ref4 has a value
>> 0x0000055c that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000d9a: DIE attribute 0x00000d9f:  AT_type/FORM_ref4 has a value
>> 0x00000584 that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000e43: DIE attribute 0x00000e48:  AT_type/FORM_ref4 has a value
>> 0x000011ac that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000ea3: DIE attribute 0x00000ea8:  AT_type/FORM_ref4 has a value
>> 0x00001225 that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000ebe: DIE attribute 0x00000ebf:  AT_type/FORM_ref4 has a value
>> 0x00001248 that is not in the current compile unit in the .debug_info
>> section.
>>     0x00000ee3: DIE attribute 0x00000ee4:  AT_type/FORM_ref4 has a value
>> 0x00001285 that is not in the current compile unit in the .debug_info
>> section.
>>
>>
>>    - On Linux - well the problem here is that even when my DWARF info was
>>    working, dwarfdump would spit out a ton of error messages about bad file
>>    DIEs and other spam - in other words, I've never been able to use LLVM to
>>    produce a binary on Linux that was dwarfdump-error free. So any "new" errors
>>    are mixed in with all of the "old" errors I was seeing before.
>>
>> 3) Use llbrowse to manually inspect the DIEs and see if they make sense.
>> (Which is part of the reason why I wrote llbrowse.) Again, the problem is
>> that I don't know where to look, and the files are simply too large to
>> inspect manually.
>>
>> --
>> -- Talin
>>  _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>
>
> --
> -- Talin
>



-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110407/fd835f38/attachment.html>


More information about the llvm-dev mailing list