[llvm-dev] Fragmented DWARF

Mon Oct 19 01:50:20 PDT 2020

Great, thanks Alexey! I'll try to take a look at this in the near future,
and will report my results back here. I imagine our clang results will
differ, purely because we probably used different toolchains to build the
input in the first place.

On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com> wrote:

>
> On 13.10.2020 10:20, James Henderson wrote:
>
> The script included in the patch can be used to convert an object
> containing normal DWARF into an object using fragmented DWARF. It does this
> by using llvm-dwarfdump to dump the various sections, parses the output to
> identify where it should split (using the offsets of the various entries),
> and then writes new section headers accordingly - you can see roughly what
> it's doing if you get a chance to watch the talk recording. The additional
> section headers are appended to the end of the ELF section header table,
> whilst the original DWARF is left in the same place it was before (making
> use of the fact that section headers don't have to appear in offset order).
> The script also parses and fragments the relocation sections targeting the
> DWARF sections so that they match up with the fragmented DWARF sections.
> This is clearly all suboptimal - in practice the compiler should be
> modified to do the fragmenting upfront, to save having to parse a tool's
> stdout, but that was just the simplest thing I could come up with to
> quickly write the script. Full details of the script usage are included in
> the patch description, if you want to play around with it.
>
> If Alexey could point me at the latest version of his patch, I'd be happy
> to run that through either or both of the packages I used to see what
> happens. Equally, I'd be happy if Alexey is able to run my script to
> fragment and measure the performance of a couple of projects he's been
> working with. Based purely on the two packages I've tried this with, I can
> tell already that the results can vary wildly. My expectation is that
> Alexey's approach will be slower (at least in its current form, but
> probably more generally), but produce smaller output, but to what scale I
> have no idea.
>
> James, I updated the patch - https://reviews.llvm.org/D74169.
>
> To make it working it is necessary to build example with
> -ffunction-sections and specify following options to the linker :
>
> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
> For clang binary I got following results:
>
> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
> decrease, Debug Info size 542M
>
> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size 1,3G,
> 16x performance decrease, Debug Info size 1G
>
> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
> I added option --gc-debuginfo-no-odr, so that size reduction could be
> compared correctly. Without that option D74169 does types deduplication and
> then it is not correct to compare resulting size with "Fragmented DWARF"
> solution which does not do types deduplication.
>
> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and would
> share results some time later.
>
> Thank you, Alexey.
>
>
> I think linkers parse .eh_frame partly because they have no other choice.
> That being said, I think it's format is not too complex, so similarly the
> parser isn't too complex. You can see LLD's ELF implementation in
> ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp (see the bits to do
> with EhInputSection) and EhFrameSection in ELF/SyntheticSections.h (plus
> various usages of these two throughout the LLD code). I think the key to
> any structural changes in the DWARF format to make them more amenable to
> link-time parsing is being able to read a minimal amount without needing to
> parse the payload (e.g. a length field, some sort of type, and then using
> the relocations to associate it accordingly).
>
> James
>
> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com> wrote:
>
>> Awesome! Sorry I missed the lightning talk, but really interested to see
>> this sort of thing (though it's not directly/immediately applicable to the
>> use case I work with - Split DWARF, something similar could be used there
>> with further work)
>>
>> Though it looks like the patch has mostly linker changes - where/how do
>> you generate the fragmented DWARF to begin with? Via the Python script? Run
>> over assembly? I'd be surprised if it was achievable that way - curious to
>> know more.
>>
>> Got a rough sense/are you able to run apples-to-apples comparisons with
>> Alexey's linker-based patches to compare linker time/memory overhead versus
>> resulting output size gains?
>>
>> (& yeah, I'm a bit curious about how the linkers do eh_frame rewriting,
>> if the format is especially amenable to a lightweight parsing/rewriting and
>> how we could make the DWARF more amenable to that too)
>>
>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>
>>> Hi all,
>>>
>>> At the recent LLVM developers' meeting, I presented a lightning talk on
>>> an approach to reduce the amount of dead debug data left in an executable
>>> following operations such as --gc-sections and duplicate COMDAT removal. In
>>> that presentation, I presented some figures based on linking a game that
>>> had been built by our downstream clang port and fragmented using the
>>> described approach. Since recording the presentation, I ran the same
>>> experiment on a clang package (this time built with a GCC version). The
>>> comparable figures are below:
>>>
>>> Link-time speed (s):
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>> | Package variant    | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5
>>> | GC 6 |
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>> | Game (plain)       |  4.5  |  4.9          |  4.2 |  3.6 |  3.4 |  3.3
>>> |  3.2 |
>>> | Game (fragmented)  | 11.1  | 11.8          |  9.7 |  8.6 |  7.9 |  7.7
>>> |  7.5 |
>>> | Clang (plain)      | 13.9  | 17.9          | 17.0 | 16.7 | 16.3 | 16.2
>>> | 16.1 |
>>> | Clang (fragmented) | 18.6  | 22.8          | 21.6 | 21.1 | 20.8 | 20.5
>>> | 20.2 |
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>>
>>> Output size - Game package (MB):
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938 |  930 |  928 |
>>> | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845 |  845 |  845 |
>>> | Plain (other)       |  304  |  276 |  172 |  120 |   93 |   85 |   82 |
>>> | Fragmented (total)  | 1044  |  940 |  556 |  373 |  287 |  263 |  255 |
>>> | Fragmented (DWARF*) |  740  |  664 |  384 |  253 |  194 |  178 |  173 |
>>> | Fragmented (other)  |  304  |  276 |  172 |  120 |   93 |   85 |   82 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>>
>>>
>>> Output size - Clang (MB):
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293 | 2273 | 2251 |
>>> | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 |
>>> | Plain (other)       |  616  |  567 |  426 |  353 |  314 |  294 |  272 |
>>> | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017 | 1990 | 1963 |
>>> | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 | 1696 | 1691 |
>>> | Fragmented (other)  |  616  |  567 |  426 |  353 |  314 |  294 |  272 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>>
>>> *DWARF size == total size of .debug_info + .debug_line + .debug_ranges +
>>> .debug_aranges + .debug_loc
>>>
>>> Additionally, I have posted https://reviews.llvm.org/D89229 which
>>> provides the python script and linker patches used to reproduce the above
>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option
>>> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>> respectively.
>>>
>>> During the conference, the question was asked what the memory usage and
>>> input size impact was. I've summarised these below:
>>>
>>> Input file size total (GB):
>>> +--------------------+------------+
>>> | Package variant    | Total Size |
>>> +--------------------+------------+
>>> | Game (plain)       |     2.9    |
>>> | Game (fragmented)  |     4.2    |
>>> | Clang (plain)      |    10.9    |
>>> | Clang (fragmented) |    12.3    |
>>> +--------------------+------------+
>>>
>>> Peak Working Set Memory usage (GB):
>>> +--------------------+-------+------+
>>> | Package variant    | No GC | GC 1 |
>>> +--------------------+-------+------+
>>> | Game (plain)       |  4.3  |  4.7 |
>>> | Game (fragmented)  |  8.9  |  8.6 |
>>> | Clang (plain)      | 15.7  | 15.6 |
>>> | Clang (fragmented) | 19.4  | 19.2 |
>>> +--------------------+-------+------+
>>>
>>> I'm keen to hear what people's feedback is, and also interested to see
>>> what results others might see by running this experiment on other input
>>> packages. Also, if anybody has any alternative ideas that meet the goals
>>> listed below, I'd love to hear them!
>>>
>>> To reiterate some key goals of fragmented DWARF, similar to what I said
>>> in the presentation:
>>> 1) Devise a scheme that gives significant size savings without being too
>>> costly. It's clear from just the two packages I've tried this on that there
>>> is a fairly hefty link time performance cost, although the exact cost
>>> depends on the nature of the input package. On the other hand, depending on
>>> the nature of the input package, there can also be some big gains.
>>> 2) Devise a scheme that doesn't require any linker knowledge of DWARF.
>>> The current approach doesn't quite achieve this properly due to the slight
>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using non-COMDAT
>>> group sections should solve this problem.
>>> 3) Provide some kind of halfway house between simply writing tombstone
>>> values into dead DWARF and fully parsing the DWARF to reoptimise
>>> its/discard the dead bits.
>>>
>>> I'm hopeful that changes could be made to the linker to improve the
>>> link-time cost. There seems to be a significant amount of the link time
>>> spent creating the input sections. An alternative would be to devise a
>>> scheme that would avoid the literal splitting into section headers, in
>>> favour of some sort of list of split-points that the linker uses to split
>>> things up (a bit like it already does for .eh_frame or mergeable sections).
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201019/87bb1569/attachment.html>