[LLVMdev] More DWARF problems

Talin viridia at gmail.com
Tue Mar 29 19:29:01 PDT 2011


I've been trying to track down the problem with the DWARF info that is being
emitted by my front end, which has been broken for about a month now. Here's
what happens when I attempt to use gdb to debug one of my programs on OS X:

gdb stack crawl at point of internal error:
[ 0 ] /usr/libexec/gdb/gdb-i386-apple-darwin (align_down+0x0) [0x122300]
[ 1 ] /usr/libexec/gdb/gdb-i386-apple-darwin
(find_partial_die_in_comp_unit+0x65) [0xc0e19]
[ 2 ] /usr/libexec/gdb/gdb-i386-apple-darwin (find_partial_die+0x2d4)
[0xcf07f]
[ 3 ] /usr/libexec/gdb/gdb-i386-apple-darwin (fixup_partial_die+0x29)
[0xcf0b3]
[ 4 ] /usr/libexec/gdb/gdb-i386-apple-darwin (scan_partial_symbols+0x26)
[0xcf9e7]
[ 5 ] /usr/libexec/gdb/gdb-i386-apple-darwin (dwarf2_build_psymtabs+0xc54)
[0xd093c]
[ 6 ] /usr/libexec/gdb/gdb-i386-apple-darwin (macho_symfile_read+0x145)
[0x163b15]
[ 7 ] /usr/libexec/gdb/gdb-i386-apple-darwin (syms_from_objfile+0x62d)
[0x52259]
[ 8 ] /usr/libexec/gdb/gdb-i386-apple-darwin
(symbol_file_add_with_addrs_or_offsets_using_objfile+0x338) [0x561e7]
[ 9 ] /usr/libexec/gdb/gdb-i386-apple-darwin
(symbol_file_add_with_addrs_or_offsets_using_objfile+0x2da) [0x56189]
[ 10 ] /usr/libexec/gdb/gdb-i386-apple-darwin
(symbol_file_add_name_with_addrs_or_offsets+0x7a) [0x563c9]
[ 11 ] /usr/libexec/gdb/gdb-i386-apple-darwin (symbol_file_add_main_1+0xf2)
[0x56e36]
[ 12 ] /usr/libexec/gdb/gdb-i386-apple-darwin (catch_command_errors+0x4d)
[0x7ac88]
/SourceCache/gdb/gdb-966/src/gdb/dwarf2read.c:7593: internal-error: could
not find partial DIE in cache

A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)


Now, all of this was working earlier, and I don't know whether it was
something I did or a change in LLVM, but that's not important. The real
question is how to track down the problem.

In the past, the way that I have dealt with DWARF-related problems is to try
a number of strategies:

1) Reduce the problem to the smallest reproducible case. In the past I have
had some success with this, but not in this case. You see, one of the
problems with object-oriented languages is that even simple operations -
such as appending an element to an array - can end up pulling in a very
large number of classes (For example, the array class might throw an
exception if your index is invalid, which pulls in the exception hierarchy
and so on...)

I have a special script which attempts to compile a "minimal" test case,
without the standard library and with garbage collection disabled.
Unfortunately, none of the "small" test cases that I have been able to come
up with exhibit the problem, and any time I use certain language features I
am forced to link in the standard library which makes the test program huge.
I have plenty of example cases which exhibit the problem, but they are all
bitcode files on the order of 100K or more in size. And I'm not going to
have much luck tracking down a needle in such a large haystack.

2) Use dwarfdump to try and verify the validity of the debug symbols.

Unfortunately, the information from dwarfdump is not too useful in this
case. Here's what I get:

   - On OS X, with the "small" test cases I created, I get no errors at all.
   - On OS X, with my normal unit tests (with the standard library) I get
   hundreds of error messages of the following form:

    0x00000882: DIE attribute 0x00000883:  AT_type/FORM_ref4 has a value
0x00000592 that is not in the current compile unit in the .debug_info
section.
    0x000009a9: DIE attribute 0x000009ae:  AT_type/FORM_ref4 has a value
0x000001c2 that is not in the current compile unit in the .debug_info
section.
    0x00000b85: DIE attribute 0x00000b8a:  AT_type/FORM_ref4 has a value
0x0000055c that is not in the current compile unit in the .debug_info
section.
    0x00000c88: DIE attribute 0x00000c89:  AT_type/FORM_ref4 has a value
0x0000055c that is not in the current compile unit in the .debug_info
section.
    0x00000d2f: DIE attribute 0x00000d34:  AT_type/FORM_ref4 has a value
0x0000055c that is not in the current compile unit in the .debug_info
section.
    0x00000d9a: DIE attribute 0x00000d9f:  AT_type/FORM_ref4 has a value
0x00000584 that is not in the current compile unit in the .debug_info
section.
    0x00000e43: DIE attribute 0x00000e48:  AT_type/FORM_ref4 has a value
0x000011ac that is not in the current compile unit in the .debug_info
section.
    0x00000ea3: DIE attribute 0x00000ea8:  AT_type/FORM_ref4 has a value
0x00001225 that is not in the current compile unit in the .debug_info
section.
    0x00000ebe: DIE attribute 0x00000ebf:  AT_type/FORM_ref4 has a value
0x00001248 that is not in the current compile unit in the .debug_info
section.
    0x00000ee3: DIE attribute 0x00000ee4:  AT_type/FORM_ref4 has a value
0x00001285 that is not in the current compile unit in the .debug_info
section.


   - On Linux - well the problem here is that even when my DWARF info was
   working, dwarfdump would spit out a ton of error messages about bad file
   DIEs and other spam - in other words, I've never been able to use LLVM to
   produce a binary on Linux that was dwarfdump-error free. So any "new" errors
   are mixed in with all of the "old" errors I was seeing before.

3) Use llbrowse to manually inspect the DIEs and see if they make sense.
(Which is part of the reason why I wrote llbrowse.) Again, the problem is
that I don't know where to look, and the files are simply too large to
inspect manually.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110329/0930f288/attachment.html>


More information about the llvm-dev mailing list