[llvm-dev] Fragmented DWARF
David Blaikie via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 12 12:48:09 PDT 2020
Awesome! Sorry I missed the lightning talk, but really interested to see
this sort of thing (though it's not directly/immediately applicable to the
use case I work with - Split DWARF, something similar could be used there
with further work)
Though it looks like the patch has mostly linker changes - where/how do you
generate the fragmented DWARF to begin with? Via the Python script? Run
over assembly? I'd be surprised if it was achievable that way - curious to
know more.
Got a rough sense/are you able to run apples-to-apples comparisons with
Alexey's linker-based patches to compare linker time/memory overhead versus
resulting output size gains?
(& yeah, I'm a bit curious about how the linkers do eh_frame rewriting, if
the format is especially amenable to a lightweight parsing/rewriting and
how we could make the DWARF more amenable to that too)
On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
jh7370.2008 at my.bristol.ac.uk> wrote:
> Hi all,
>
> At the recent LLVM developers' meeting, I presented a lightning talk on an
> approach to reduce the amount of dead debug data left in an executable
> following operations such as --gc-sections and duplicate COMDAT removal. In
> that presentation, I presented some figures based on linking a game that
> had been built by our downstream clang port and fragmented using the
> described approach. Since recording the presentation, I ran the same
> experiment on a clang package (this time built with a GCC version). The
> comparable figures are below:
>
> Link-time speed (s):
>
> +--------------------+-------+---------------+------+------+------+------+------+
> | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5 |
> GC 6 |
>
> +--------------------+-------+---------------+------+------+------+------+------+
> | Game (plain) | 4.5 | 4.9 | 4.2 | 3.6 | 3.4 | 3.3
> | 3.2 |
> | Game (fragmented) | 11.1 | 11.8 | 9.7 | 8.6 | 7.9 | 7.7
> | 7.5 |
> | Clang (plain) | 13.9 | 17.9 | 17.0 | 16.7 | 16.3 | 16.2 |
> 16.1 |
> | Clang (fragmented) | 18.6 | 22.8 | 21.6 | 21.1 | 20.8 | 20.5 |
> 20.2 |
>
> +--------------------+-------+---------------+------+------+------+------+------+
>
> Output size - Game package (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total) | 1149 | 1121 | 1017 | 965 | 938 | 930 | 928 |
> | Plain (DWARF*) | 845 | 845 | 845 | 845 | 845 | 845 | 845 |
> | Plain (other) | 304 | 276 | 172 | 120 | 93 | 85 | 82 |
> | Fragmented (total) | 1044 | 940 | 556 | 373 | 287 | 263 | 255 |
> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 | 194 | 178 | 173 |
> | Fragmented (other) | 304 | 276 | 172 | 120 | 93 | 85 | 82 |
> +---------------------+-------+------+------+------+------+------+------+
>
> Output size - Clang (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total) | 2596 | 2546 | 2406 | 2332 | 2293 | 2273 | 2251 |
> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 |
> | Plain (other) | 616 | 567 | 426 | 353 | 314 | 294 | 272 |
> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 | 2017 | 1990 | 1963 |
> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 | 1696 | 1691 |
> | Fragmented (other) | 616 | 567 | 426 | 353 | 314 | 294 | 272 |
> +---------------------+-------+------+------+------+------+------+------+
>
> *DWARF size == total size of .debug_info + .debug_line + .debug_ranges +
> .debug_aranges + .debug_loc
>
> Additionally, I have posted https://reviews.llvm.org/D89229 which
> provides the python script and linker patches used to reproduce the above
> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option
> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
> respectively.
>
> During the conference, the question was asked what the memory usage and
> input size impact was. I've summarised these below:
>
> Input file size total (GB):
> +--------------------+------------+
> | Package variant | Total Size |
> +--------------------+------------+
> | Game (plain) | 2.9 |
> | Game (fragmented) | 4.2 |
> | Clang (plain) | 10.9 |
> | Clang (fragmented) | 12.3 |
> +--------------------+------------+
>
> Peak Working Set Memory usage (GB):
> +--------------------+-------+------+
> | Package variant | No GC | GC 1 |
> +--------------------+-------+------+
> | Game (plain) | 4.3 | 4.7 |
> | Game (fragmented) | 8.9 | 8.6 |
> | Clang (plain) | 15.7 | 15.6 |
> | Clang (fragmented) | 19.4 | 19.2 |
> +--------------------+-------+------+
>
> I'm keen to hear what people's feedback is, and also interested to see
> what results others might see by running this experiment on other input
> packages. Also, if anybody has any alternative ideas that meet the goals
> listed below, I'd love to hear them!
>
> To reiterate some key goals of fragmented DWARF, similar to what I said in
> the presentation:
> 1) Devise a scheme that gives significant size savings without being too
> costly. It's clear from just the two packages I've tried this on that there
> is a fairly hefty link time performance cost, although the exact cost
> depends on the nature of the input package. On the other hand, depending on
> the nature of the input package, there can also be some big gains.
> 2) Devise a scheme that doesn't require any linker knowledge of DWARF. The
> current approach doesn't quite achieve this properly due to the slight
> misuse of SHF_LINK_ORDER, but I expect that a pivot to using non-COMDAT
> group sections should solve this problem.
> 3) Provide some kind of halfway house between simply writing tombstone
> values into dead DWARF and fully parsing the DWARF to reoptimise
> its/discard the dead bits.
>
> I'm hopeful that changes could be made to the linker to improve the
> link-time cost. There seems to be a significant amount of the link time
> spent creating the input sections. An alternative would be to devise a
> scheme that would avoid the literal splitting into section headers, in
> favour of some sort of list of split-points that the linker uses to split
> things up (a bit like it already does for .eh_frame or mergeable sections).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201012/0d1dcc5e/attachment.html>
More information about the llvm-dev
mailing list