[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).

Eric Christopher via llvm-dev llvm-dev at lists.llvm.org
Mon Dec 4 14:59:22 PST 2017


This isn't a particularly productive email - especially as a number of
people on this list are current contributors to the standard. Mostly dwarf5
support is lined up behind one of us having the spare cycles to implement
it rather than anything else FWIW :)

That said, if you have specific feedback about confusing items I'm
definitely happy to help figure out:

a) some better way to say it,
b) some other implementation to avoid it being confusing

Having partially implemented a couple of readers and writers at this point
I agree that it's not the friendliest of documents, but sometimes being
inside of it makes it harder to see where it's causing issues.

Thanks!

-eric

On Mon, Dec 4, 2017 at 1:23 PM UE US via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

> An old co-worker told me that writing a dwarf support library was the most
> painful experience of his life due to the confusing standards documents, so
> it's not surprising DWARF5 is going slow.
>
> GNOMETOYS
>
> On Mon, Dec 4, 2017 at 12:49 PM, Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Thanks for providing the experimental data!  It clearly shows the value
>> of type sections in DWARF.
>>
>> Regarding why type sections are off by default, aside from the issue of
>> consumers needing to understand them, there is a size penalty to type
>> sections that becomes more evident in smaller projects (meaning, fewer
>> compilation units).  The size penalty can be balanced against the amount of
>> deduplication for a net win, if you have enough duplicates that you can
>> eliminate.  But it is a tradeoff.
>>
>> In Sony's case, it is not uncommon for studios to do what are called
>> "unity" builds, where you have basically one master .cpp file that does
>> #include of each other .cpp file, giving you an LTO-like build.  In this
>> case the debug-info production will automatically produce only one copy of
>> each type, and so using type sections would probably make the net debug
>> info bigger.  And of course an LTO build will deduplicate type info at the
>> metadata level, with a similar effect.
>>
>> So, I think whether type sections help or hurt will depend on how a
>> particular project's build procedure is set up.  Clang/LLVM are set up to
>> do lots of smaller compilations and link them all together, in a fairly
>> traditional model, and that is where type sections will provide the most
>> benefit.  Your data, then, is essentially for a best-case scenario.  Other
>> kinds of projects will not benefit as much.
>>
>>
>>
>> Regarding DWARF 5 and emitting type sections into the .debug_info section
>> rather than the .debug_types section:  The work to support DWARF 5 in LLVM
>> has not gotten very far yet.  Conforming to the standard in this respect is
>> certainly on my list, however there are other features that Sony considers
>> higher priority.  If you or someone else wants to contribute that feature
>> sooner, that would be excellent!  Otherwise, we will get to it in due time.
>>
>> Thanks,
>>
>> --paulr
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *George
>> Rimar via llvm-dev
>> *Sent:* Monday, December 04, 2017 7:11 AM
>> *To:* llvm-dev at lists.llvm.org
>> *Subject:* [llvm-dev] [RFC] - Deduplication of debug information in
>> linkers (LLD).
>>
>>
>>
>> Hi all !
>>
>>
>>
>> We have an issue with LLD, it is  "relocation R_X86_64_32 out of range"
>> (PR31109)
>>
>> which occurs during resolving relocations in debug sections. It looks
>> happens
>>
>> because .debug_info section can be too large sometimes and 32x relocation
>> is not enough
>>
>> to represent the value. One of possible solutions looks to be to
>> deduplicate information
>>
>> to reduce .debug_info size.
>>
>> The rest of mail contains information about experiments I did, the
>> obtained results and
>>
>> some questions and suggestions as well.
>>
>>
>>
>> I was investigating idea to deduplicate debug types information. Idea is
>> described at
>>
>> p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf).
>> It suggests
>>
>> to split types information out of .debug_info and emit multiple
>> .debug_types sections
>>
>> with use of COMDATs. Both clang and gcc I tested implements
>> -fdebug-types-section flag for that:
>>
>>
>>
>> -fdebug-types-section, -fno-debug-types-section
>>
>> Place debug types in their own section (ELF Only)
>>
>> gcc's description is here:
>> https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/Debugging-Options.html#Debugging-Options
>> .
>>
>>
>>
>> This flag is disabled by default. I compared clang binaries to see the
>> difference
>>
>> with and without the linker side optimisation.
>>
>> 1) Clang built with -g has size of 1.7 GB, .debug_info section size is
>> 894.5 Mb.
>>
>> 2) Clang built with -g -fdebug-types-section has size of 1.0 GB.
>>
>>    .debug_types size is 26.267 MB, .debug_info size is 227.7 MB.
>>
>>
>>
>> Difference is huge and I believe shows (though probably for most of
>> readers here it was
>>
>> already obvious) that optimization can be useful. Though
>> -fdebug-types-section is disabled by default.
>>
>> Looks it was initially disabled because not all of DWARF consumers were
>> aware of .debug_types section.
>>
>>
>>
>> Now in 2017 situation is different. I think most of DWARF consumers knows
>> about .debug_types, but:
>>
>> 1) DWARF5 specification explicitly eliminates the .debug_types section
>> introduced in DWARF4:
>>
>>    p8, "1.4 Changes from Version 4 to Version 5"
>> http://dwarfstd.org/doc/DWARF5.pdf
>>
>> 2) Instead of emiting multiple .debug_types it suggests to emit multiple
>> .debug_info COMDAT
>>
>>    sections. (p375, p376).
>>
>>
>>
>> And it seems currently there is no way to make clang to emit multiple
>> .debug_info with type information
>>
>> like DWARF5 suggests. I tried command line below:
>>
>> -g -fdebug-types-section -gdwarf-5
>>
>> It still emits .debug_types and does not look there is a flag for emiting
>> multiple .debug_info.
>>
>> Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is
>> just always assumed .debug_info is
>>
>> a unique section in object.
>>
>> (also not sure why clang emits .debug_types when -gdwarf-5 flag is set,
>> as this section is incompatible with v5,
>>
>> probably it is a bug).
>>
>>
>>
>> So my questions are following:
>>
>> 1) Do we want to try to implement multiple .debug_info approach ? As it
>> seems can be very useful sometimes.
>>
>> 2) For now in LLD may be we may want to extend our error message from
>> "relocation X out of range" to something
>>
>>    suggesting to use -fdebug-types-section (only for relocations in debug
>> sections) ?
>>
>> 3) Why -fdebug-types-section is disabled by default ?
>>
>>>>
>> Best regards,
>> George | Developer | Access Softek, Inc
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171204/f7d97e3d/attachment-0001.html>


More information about the llvm-dev mailing list