[llvm-dev] [lldb-dev] Adding DWARF5 accelerator table support to llvm

Thu Jun 14 07:01:24 PDT 2018

Thank you all. I am going to try to reply to all comments in a single email.

Regarding the  .apple_objc idea, I am afraid the situation is not as
simple as just flipping a switch. (If it was, I don't think I would
have embarked on this adventure in the first place -- I would just
emit .apple_*** everywhere and call it done :)). The issue is that the
apple tables have assumptions about the macos debug info distribution
model hardcoded in them -- they assume they will either stay in the .o
file or be linked by a smart debug-info-aware linker (dsymutil). In
particular, this means they are not self-delimiting (no length field
as is typical for other dwarf artifacts), so if a linker which is not
aware of them would simply concatenate individual .o tables (which elf
linkers are really good at), the debugger would have no way to pry
them apart. And even if it somehow managed that, it still wouldn't
know if the indexes covered all of the compile units in the linked
file or only some of them (in case some of the object files were
compiled with the tables and some without).

In light of that, I don't think it's worth trying to combine
.apple_objc with .debug_names in some way, and it would be much
simpler to just extend .debug_names with the necessary information. I
think the simplest way of achieving this (one which would require
least amount of standard-bending) is to take the index entry for the
objc class and add a special attribute to it (DW_IDX_method_list?)
with form DW_FORM_blockXXX and just have the references to the method
DIEs in the block data. This should make the implementation an almost
drop-in for the current .apple_objc functionality (we would still need
to figure out what to do with category methods, but it's not clear to
me whether lldb actually uses those anywhere).

But, other options may be possible as well. What's not clear to me is
whether these tables couldn't be replaced by extra information in the
.debug_info section. It seems to me that these tables are trying to
work around the issue that there is no straight way to go from a
DW_TAG_structure type DIE describing an ObjC class to it's methods. If
these methods (their forward declarations) were be present as children
of the type DIE (as they are for c++ classes), then these tables may
not be necessary. But maybe (probably) that has already been
considered and deemed infeasible for some reason. In any case this
seemed like a thing best left for people who actually work on ObjC
support to figure out.

As far as the .debug_names size goes, I should also point out that the
binary in question was built with -fno-limit-debug-info, which isn't a
default setup on linux. I have tried measuring the sizes without that
flag and with fission enabled (-gsplit-dwarf) and the results are:
without compression:
- clang binary: 960 MB
- .debug_names: 130 MB (13%)
- debug_pubnames: 175 MB (18%)
- debug_pubtypes: 204 MB (21%)
- median time for setting a breakpoint on non-existent function
(variance +/- 2%):
real 0m3.526s
user 0m3.156s
sys 0m0.364s

with -Wl,--compress-debug-sections=zlib:
- clang binary: 440 MB
- .debug_names: 80MB (18%)
- .debug_pubnames: 31 MB (7.2%)
- .debug_pubtypes: 42MB (9.5%)
- median time for setting a breakpoint on non-existent function:
real 0m4.369s
user 0m3.948s
sys 0m0.416s

So, .debug_names indeed compresses worse than .debug_pubnames/types,
but that is not surprising as it has a more condensed encoding to
begin with (no inline strings). However, even in it's compressed form
its size is only slightly larger that the two other sections combined
(while being infinitely more useful). As for the compression, my
takeaway from this is that compression definitely has a measurable
impact on startup time, but, on the grand scale of things, the impact
is not actually that big. And if a user deliberately adds the
compression flag to his command line, I would assume he really cares
about binary size, and is willing to sacrifice some debug performance
in return. So, I would honor his request and compress .debug_names as
well.

I have tried David Anderson's dwarfdump (after Paul pointed it out to
me), but as far as I can tell, it has no support from printing out the
.debug_names section (the print_debug_names function is stubbed out).
**I think** I got the correct source repository
(git://git.code.sf.net/p/libdwarf/code) as the last commit there is
dated yesterday.

For testing on the lldb side I have been deliberately trying to avoid
adding another dimensions to the ever-growing test matrix. I don't
think this functionality is worth it, especially not if you view the
test suite as a regression test suite. The entire functionality of
this in lldb is encompassed in a single .cpp file which is about 250
LOC. The class has about a dozen entry points and most of them are
accessible through the lldb-test tool, which I've used to write
targeted regression tests for this (it could probably use more of
those). I did use the "dotest" suite as an integration test suite, but
I did that by simply passing --env CFLAGS_EXTRAS="-mllvm
-accel-tables=Dwarf" to dotest (I also tried hacking clang to always
emit the new tables to make sure I'm not missing anything).
Ironically, if you try that now, you will see one test failing, but
that's because I have already added one test passing that flag
explicitly (I couldn't find a way to test this functionality through
lldb-test) and clang then complains about a duplicate argument. This
should go away once we have better -g flag to control this behavior. I
haven't yet figured out whether I want to set up a bot to run the
tests in this configuration, but I know I don't want to inflict that
extra overhead on developers running tests during day-to-day
development.

regards,
pavel