[llvm-dev] Fragmented DWARF
James Henderson via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 19 01:50:20 PDT 2020
Great, thanks Alexey! I'll try to take a look at this in the near future,
and will report my results back here. I imagine our clang results will
differ, purely because we probably used different toolchains to build the
input in the first place.
On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com> wrote:
>
> On 13.10.2020 10:20, James Henderson wrote:
>
> The script included in the patch can be used to convert an object
> containing normal DWARF into an object using fragmented DWARF. It does this
> by using llvm-dwarfdump to dump the various sections, parses the output to
> identify where it should split (using the offsets of the various entries),
> and then writes new section headers accordingly - you can see roughly what
> it's doing if you get a chance to watch the talk recording. The additional
> section headers are appended to the end of the ELF section header table,
> whilst the original DWARF is left in the same place it was before (making
> use of the fact that section headers don't have to appear in offset order).
> The script also parses and fragments the relocation sections targeting the
> DWARF sections so that they match up with the fragmented DWARF sections.
> This is clearly all suboptimal - in practice the compiler should be
> modified to do the fragmenting upfront, to save having to parse a tool's
> stdout, but that was just the simplest thing I could come up with to
> quickly write the script. Full details of the script usage are included in
> the patch description, if you want to play around with it.
>
> If Alexey could point me at the latest version of his patch, I'd be happy
> to run that through either or both of the packages I used to see what
> happens. Equally, I'd be happy if Alexey is able to run my script to
> fragment and measure the performance of a couple of projects he's been
> working with. Based purely on the two packages I've tried this with, I can
> tell already that the results can vary wildly. My expectation is that
> Alexey's approach will be slower (at least in its current form, but
> probably more generally), but produce smaller output, but to what scale I
> have no idea.
>
> James, I updated the patch - https://reviews.llvm.org/D74169.
>
> To make it working it is necessary to build example with
> -ffunction-sections and specify following options to the linker :
>
> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
> For clang binary I got following results:
>
> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
> decrease, Debug Info size 542M
>
> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size 1,3G,
> 16x performance decrease, Debug Info size 1G
>
> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
> I added option --gc-debuginfo-no-odr, so that size reduction could be
> compared correctly. Without that option D74169 does types deduplication and
> then it is not correct to compare resulting size with "Fragmented DWARF"
> solution which does not do types deduplication.
>
> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and would
> share results some time later.
>
> Thank you, Alexey.
>
>
> I think linkers parse .eh_frame partly because they have no other choice.
> That being said, I think it's format is not too complex, so similarly the
> parser isn't too complex. You can see LLD's ELF implementation in
> ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp (see the bits to do
> with EhInputSection) and EhFrameSection in ELF/SyntheticSections.h (plus
> various usages of these two throughout the LLD code). I think the key to
> any structural changes in the DWARF format to make them more amenable to
> link-time parsing is being able to read a minimal amount without needing to
> parse the payload (e.g. a length field, some sort of type, and then using
> the relocations to associate it accordingly).
>
> James
>
> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com> wrote:
>
>> Awesome! Sorry I missed the lightning talk, but really interested to see
>> this sort of thing (though it's not directly/immediately applicable to the
>> use case I work with - Split DWARF, something similar could be used there
>> with further work)
>>
>> Though it looks like the patch has mostly linker changes - where/how do
>> you generate the fragmented DWARF to begin with? Via the Python script? Run
>> over assembly? I'd be surprised if it was achievable that way - curious to
>> know more.
>>
>> Got a rough sense/are you able to run apples-to-apples comparisons with
>> Alexey's linker-based patches to compare linker time/memory overhead versus
>> resulting output size gains?
>>
>> (& yeah, I'm a bit curious about how the linkers do eh_frame rewriting,
>> if the format is especially amenable to a lightweight parsing/rewriting and
>> how we could make the DWARF more amenable to that too)
>>
>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>
>>> Hi all,
>>>
>>> At the recent LLVM developers' meeting, I presented a lightning talk on
>>> an approach to reduce the amount of dead debug data left in an executable
>>> following operations such as --gc-sections and duplicate COMDAT removal. In
>>> that presentation, I presented some figures based on linking a game that
>>> had been built by our downstream clang port and fragmented using the
>>> described approach. Since recording the presentation, I ran the same
>>> experiment on a clang package (this time built with a GCC version). The
>>> comparable figures are below:
>>>
>>> Link-time speed (s):
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>> | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5
>>> | GC 6 |
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>> | Game (plain) | 4.5 | 4.9 | 4.2 | 3.6 | 3.4 | 3.3
>>> | 3.2 |
>>> | Game (fragmented) | 11.1 | 11.8 | 9.7 | 8.6 | 7.9 | 7.7
>>> | 7.5 |
>>> | Clang (plain) | 13.9 | 17.9 | 17.0 | 16.7 | 16.3 | 16.2
>>> | 16.1 |
>>> | Clang (fragmented) | 18.6 | 22.8 | 21.6 | 21.1 | 20.8 | 20.5
>>> | 20.2 |
>>>
>>> +--------------------+-------+---------------+------+------+------+------+------+
>>>
>>> Output size - Game package (MB):
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total) | 1149 | 1121 | 1017 | 965 | 938 | 930 | 928 |
>>> | Plain (DWARF*) | 845 | 845 | 845 | 845 | 845 | 845 | 845 |
>>> | Plain (other) | 304 | 276 | 172 | 120 | 93 | 85 | 82 |
>>> | Fragmented (total) | 1044 | 940 | 556 | 373 | 287 | 263 | 255 |
>>> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 | 194 | 178 | 173 |
>>> | Fragmented (other) | 304 | 276 | 172 | 120 | 93 | 85 | 82 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>>
>>>
>>> Output size - Clang (MB):
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total) | 2596 | 2546 | 2406 | 2332 | 2293 | 2273 | 2251 |
>>> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 |
>>> | Plain (other) | 616 | 567 | 426 | 353 | 314 | 294 | 272 |
>>> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 | 2017 | 1990 | 1963 |
>>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 | 1696 | 1691 |
>>> | Fragmented (other) | 616 | 567 | 426 | 353 | 314 | 294 | 272 |
>>> +---------------------+-------+------+------+------+------+------+------+
>>>
>>> *DWARF size == total size of .debug_info + .debug_line + .debug_ranges +
>>> .debug_aranges + .debug_loc
>>>
>>> Additionally, I have posted https://reviews.llvm.org/D89229 which
>>> provides the python script and linker patches used to reproduce the above
>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option
>>> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>> respectively.
>>>
>>> During the conference, the question was asked what the memory usage and
>>> input size impact was. I've summarised these below:
>>>
>>> Input file size total (GB):
>>> +--------------------+------------+
>>> | Package variant | Total Size |
>>> +--------------------+------------+
>>> | Game (plain) | 2.9 |
>>> | Game (fragmented) | 4.2 |
>>> | Clang (plain) | 10.9 |
>>> | Clang (fragmented) | 12.3 |
>>> +--------------------+------------+
>>>
>>> Peak Working Set Memory usage (GB):
>>> +--------------------+-------+------+
>>> | Package variant | No GC | GC 1 |
>>> +--------------------+-------+------+
>>> | Game (plain) | 4.3 | 4.7 |
>>> | Game (fragmented) | 8.9 | 8.6 |
>>> | Clang (plain) | 15.7 | 15.6 |
>>> | Clang (fragmented) | 19.4 | 19.2 |
>>> +--------------------+-------+------+
>>>
>>> I'm keen to hear what people's feedback is, and also interested to see
>>> what results others might see by running this experiment on other input
>>> packages. Also, if anybody has any alternative ideas that meet the goals
>>> listed below, I'd love to hear them!
>>>
>>> To reiterate some key goals of fragmented DWARF, similar to what I said
>>> in the presentation:
>>> 1) Devise a scheme that gives significant size savings without being too
>>> costly. It's clear from just the two packages I've tried this on that there
>>> is a fairly hefty link time performance cost, although the exact cost
>>> depends on the nature of the input package. On the other hand, depending on
>>> the nature of the input package, there can also be some big gains.
>>> 2) Devise a scheme that doesn't require any linker knowledge of DWARF.
>>> The current approach doesn't quite achieve this properly due to the slight
>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using non-COMDAT
>>> group sections should solve this problem.
>>> 3) Provide some kind of halfway house between simply writing tombstone
>>> values into dead DWARF and fully parsing the DWARF to reoptimise
>>> its/discard the dead bits.
>>>
>>> I'm hopeful that changes could be made to the linker to improve the
>>> link-time cost. There seems to be a significant amount of the link time
>>> spent creating the input sections. An alternative would be to devise a
>>> scheme that would avoid the literal splitting into section headers, in
>>> favour of some sort of list of split-points that the linker uses to split
>>> things up (a bit like it already does for .eh_frame or mergeable sections).
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201019/87bb1569/attachment.html>
More information about the llvm-dev
mailing list