[llvm-dev] Fragmented DWARF
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 15 02:08:46 PDT 2020
On 13.10.2020 10:20, James Henderson wrote:
> The script included in the patch can be used to convert an object
> containing normal DWARF into an object using fragmented DWARF. It does
> this by using llvm-dwarfdump to dump the various sections, parses the
> output to identify where it should split (using the offsets of the
> various entries), and then writes new section headers accordingly -
> you can see roughly what it's doing if you get a chance to watch the
> talk recording. The additional section headers are appended to the end
> of the ELF section header table, whilst the original DWARF is left in
> the same place it was before (making use of the fact that section
> headers don't have to appear in offset order). The script also parses
> and fragments the relocation sections targeting the DWARF sections so
> that they match up with the fragmented DWARF sections. This is clearly
> all suboptimal - in practice the compiler should be modified to do the
> fragmenting upfront, to save having to parse a tool's stdout, but that
> was just the simplest thing I could come up with to quickly write the
> script. Full details of the script usage are included in the patch
> description, if you want to play around with it.
>
> If Alexey could point me at the latest version of his patch, I'd be
> happy to run that through either or both of the packages I used to see
> what happens. Equally, I'd be happy if Alexey is able to run my script
> to fragment and measure the performance of a couple of projects he's
> been working with. Based purely on the two packages I've tried this
> with, I can tell already that the results can vary wildly. My
> expectation is that Alexey's approach will be slower (at least in its
> current form, but probably more generally), but produce smaller
> output, but to what scale I have no idea.
James, I updated the patch - https://reviews.llvm.org/D74169.
To make it working it is necessary to build example with
-ffunction-sections and specify following options to the linker :
--gc-sections --gc-debuginfo --gc-debuginfo-no-odr
For clang binary I got following results:
1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
decrease, Debug Info size 542M
3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size
1,3G, 16x performance decrease, Debug Info size 1G
(*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
I added option --gc-debuginfo-no-odr, so that size reduction could be
compared correctly. Without that option D74169 does types deduplication
and then it is not correct to compare resulting size with "Fragmented
DWARF" solution which does not do types deduplication.
Also, I look at your D89229 <https://reviews.llvm.org/D89229> and would
share results some time later.
Thank you, Alexey.
>
> I think linkers parse .eh_frame partly because they have no other
> choice. That being said, I think it's format is not too complex, so
> similarly the parser isn't too complex. You can see LLD's ELF
> implementation in ELF/EhFrame.cpp, how it is used in
> ELF/InputSection.cpp (see the bits to do with EhInputSection) and
> EhFrameSection in ELF/SyntheticSections.h (plus various usages of
> these two throughout the LLD code). I think the key to any structural
> changes in the DWARF format to make them more amenable to link-time
> parsing is being able to read a minimal amount without needing to
> parse the payload (e.g. a length field, some sort of type, and then
> using the relocations to associate it accordingly).
>
> James
>
> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com
> <mailto:dblaikie at gmail.com>> wrote:
>
> Awesome! Sorry I missed the lightning talk, but really interested
> to see this sort of thing (though it's not directly/immediately
> applicable to the use case I work with - Split DWARF, something
> similar could be used there with further work)
>
> Though it looks like the patch has mostly linker changes -
> where/how do you generate the fragmented DWARF to begin with? Via
> the Python script? Run over assembly? I'd be surprised if it was
> achievable that way - curious to know more.
>
> Got a rough sense/are you able to run apples-to-apples comparisons
> with Alexey's linker-based patches to compare linker time/memory
> overhead versus resulting output size gains?
>
> (& yeah, I'm a bit curious about how the linkers do eh_frame
> rewriting, if the format is especially amenable to a lightweight
> parsing/rewriting and how we could make the DWARF more amenable to
> that too)
>
> On Mon, Oct 12, 2020 at 6:41 AM James Henderson
> <jh7370.2008 at my.bristol.ac.uk
> <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>
> Hi all,
>
> At the recent LLVM developers' meeting, I presented a
> lightning talk on an approach to reduce the amount of dead
> debug data left in an executable following operations such as
> --gc-sections and duplicate COMDAT removal. In that
> presentation, I presented some figures based on linking a game
> that had been built by our downstream clang port and
> fragmented using the described approach. Since recording the
> presentation, I ran the same experiment on a clang package
> (this time built with a GCC version). The comparable figures
> are below:
>
> Link-time speed (s):
> +--------------------+-------+---------------+------+------+------+------+------+
> | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 |
> GC 4 | GC 5 | GC 6 |
> +--------------------+-------+---------------+------+------+------+------+------+
> | Game (plain) | 4.5 | 4.9 | 4.2 | 3.6 |
> 3.4 | 3.3 | 3.2 |
> | Game (fragmented) | 11.1 | 11.8 | 9.7 | 8.6 |
> 7.9 | 7.7 | 7.5 |
> | Clang (plain) | 13.9 | 17.9 | 17.0 | 16.7 |
> 16.3 | 16.2 | 16.1 |
> | Clang (fragmented) | 18.6 | 22.8 | 21.6 | 21.1 |
> 20.8 | 20.5 | 20.2 |
> +--------------------+-------+---------------+------+------+------+------+------+
>
> Output size - Game package (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
> 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total) | 1149 | 1121 | 1017 | 965 | 938 |
> 930 | 928 |
> | Plain (DWARF*) | 845 | 845 | 845 | 845 | 845 |
> 845 | 845 |
> | Plain (other) | 304 | 276 | 172 | 120 | 93 |
> 85 | 82 |
> | Fragmented (total) | 1044 | 940 | 556 | 373 | 287 |
> 263 | 255 |
> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 | 194 |
> 178 | 173 |
> | Fragmented (other) | 304 | 276 | 172 | 120 | 93 |
> 85 | 82 |
> +---------------------+-------+------+------+------+------+------+------+
>
>
> Output size - Clang (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
> 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total) | 2596 | 2546 | 2406 | 2332 | 2293 |
> 2273 | 2251 |
> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979 |
> 1979 | 1979 |
> | Plain (other) | 616 | 567 | 426 | 353 | 314 |
> 294 | 272 |
> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 | 2017 |
> 1990 | 1963 |
> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 |
> 1696 | 1691 |
> | Fragmented (other) | 616 | 567 | 426 | 353 | 314 |
> 294 | 272 |
> +---------------------+-------+------+------+------+------+------+------+
>
> *DWARF size == total size of .debug_info + .debug_line +
> .debug_ranges + .debug_aranges + .debug_loc
>
> Additionally, I have posted https://reviews.llvm.org/D89229
> which provides the python script and linker patches used to
> reproduce the above results on my machine. The GC 1/2/3/4/5/6
> correspond to the linker option added in that patch
> --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0 respectively.
>
> During the conference, the question was asked what the memory
> usage and input size impact was. I've summarised these below:
>
> Input file size total (GB):
> +--------------------+------------+
> | Package variant | Total Size |
> +--------------------+------------+
> | Game (plain) | 2.9 |
> | Game (fragmented) | 4.2 |
> | Clang (plain) | 10.9 |
> | Clang (fragmented) | 12.3 |
> +--------------------+------------+
>
> Peak Working Set Memory usage (GB):
> +--------------------+-------+------+
> | Package variant | No GC | GC 1 |
> +--------------------+-------+------+
> | Game (plain) | 4.3 | 4.7 |
> | Game (fragmented) | 8.9 | 8.6 |
> | Clang (plain) | 15.7 | 15.6 |
> | Clang (fragmented) | 19.4 | 19.2 |
> +--------------------+-------+------+
>
> I'm keen to hear what people's feedback is, and also
> interested to see what results others might see by running
> this experiment on other input packages. Also, if anybody has
> any alternative ideas that meet the goals listed below, I'd
> love to hear them!
>
> To reiterate some key goals of fragmented DWARF, similar to
> what I said in the presentation:
> 1) Devise a scheme that gives significant size savings without
> being too costly. It's clear from just the two packages I've
> tried this on that there is a fairly hefty link time
> performance cost, although the exact cost depends on the
> nature of the input package. On the other hand, depending on
> the nature of the input package, there can also be some big gains.
> 2) Devise a scheme that doesn't require any linker knowledge
> of DWARF. The current approach doesn't quite achieve this
> properly due to the slight misuse of SHF_LINK_ORDER, but I
> expect that a pivot to using non-COMDAT group sections should
> solve this problem.
> 3) Provide some kind of halfway house between simply writing
> tombstone values into dead DWARF and fully parsing the DWARF
> to reoptimise its/discard the dead bits.
>
> I'm hopeful that changes could be made to the linker to
> improve the link-time cost. There seems to be a significant
> amount of the link time spent creating the input sections. An
> alternative would be to devise a scheme that would avoid the
> literal splitting into section headers, in favour of some sort
> of list of split-points that the linker uses to split things
> up (a bit like it already does for .eh_frame or mergeable
> sections).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201015/40aeea9a/attachment-0001.html>
More information about the llvm-dev
mailing list