[llvm-dev] Fragmented DWARF

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 12 12:48:09 PDT 2020


Awesome! Sorry I missed the lightning talk, but really interested to see
this sort of thing (though it's not directly/immediately applicable to the
use case I work with - Split DWARF, something similar could be used there
with further work)

Though it looks like the patch has mostly linker changes - where/how do you
generate the fragmented DWARF to begin with? Via the Python script? Run
over assembly? I'd be surprised if it was achievable that way - curious to
know more.

Got a rough sense/are you able to run apples-to-apples comparisons with
Alexey's linker-based patches to compare linker time/memory overhead versus
resulting output size gains?

(& yeah, I'm a bit curious about how the linkers do eh_frame rewriting, if
the format is especially amenable to a lightweight parsing/rewriting and
how we could make the DWARF more amenable to that too)

On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
jh7370.2008 at my.bristol.ac.uk> wrote:

> Hi all,
>
> At the recent LLVM developers' meeting, I presented a lightning talk on an
> approach to reduce the amount of dead debug data left in an executable
> following operations such as --gc-sections and duplicate COMDAT removal. In
> that presentation, I presented some figures based on linking a game that
> had been built by our downstream clang port and fragmented using the
> described approach. Since recording the presentation, I ran the same
> experiment on a clang package (this time built with a GCC version). The
> comparable figures are below:
>
> Link-time speed (s):
>
> +--------------------+-------+---------------+------+------+------+------+------+
> | Package variant    | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5 |
> GC 6 |
>
> +--------------------+-------+---------------+------+------+------+------+------+
> | Game (plain)       |  4.5  |  4.9          |  4.2 |  3.6 |  3.4 |  3.3
> |  3.2 |
> | Game (fragmented)  | 11.1  | 11.8          |  9.7 |  8.6 |  7.9 |  7.7
> |  7.5 |
> | Clang (plain)      | 13.9  | 17.9          | 17.0 | 16.7 | 16.3 | 16.2 |
> 16.1 |
> | Clang (fragmented) | 18.6  | 22.8          | 21.6 | 21.1 | 20.8 | 20.5 |
> 20.2 |
>
> +--------------------+-------+---------------+------+------+------+------+------+
>
> Output size - Game package (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938 |  930 |  928 |
> | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845 |  845 |  845 |
> | Plain (other)       |  304  |  276 |  172 |  120 |   93 |   85 |   82 |
> | Fragmented (total)  | 1044  |  940 |  556 |  373 |  287 |  263 |  255 |
> | Fragmented (DWARF*) |  740  |  664 |  384 |  253 |  194 |  178 |  173 |
> | Fragmented (other)  |  304  |  276 |  172 |  120 |   93 |   85 |   82 |
> +---------------------+-------+------+------+------+------+------+------+
>
> Output size - Clang (MB):
> +---------------------+-------+------+------+------+------+------+------+
> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
> +---------------------+-------+------+------+------+------+------+------+
> | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293 | 2273 | 2251 |
> | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 |
> | Plain (other)       |  616  |  567 |  426 |  353 |  314 |  294 |  272 |
> | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017 | 1990 | 1963 |
> | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 | 1696 | 1691 |
> | Fragmented (other)  |  616  |  567 |  426 |  353 |  314 |  294 |  272 |
> +---------------------+-------+------+------+------+------+------+------+
>
> *DWARF size == total size of .debug_info + .debug_line + .debug_ranges +
> .debug_aranges + .debug_loc
>
> Additionally, I have posted https://reviews.llvm.org/D89229 which
> provides the python script and linker patches used to reproduce the above
> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option
> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
> respectively.
>
> During the conference, the question was asked what the memory usage and
> input size impact was. I've summarised these below:
>
> Input file size total (GB):
> +--------------------+------------+
> | Package variant    | Total Size |
> +--------------------+------------+
> | Game (plain)       |     2.9    |
> | Game (fragmented)  |     4.2    |
> | Clang (plain)      |    10.9    |
> | Clang (fragmented) |    12.3    |
> +--------------------+------------+
>
> Peak Working Set Memory usage (GB):
> +--------------------+-------+------+
> | Package variant    | No GC | GC 1 |
> +--------------------+-------+------+
> | Game (plain)       |  4.3  |  4.7 |
> | Game (fragmented)  |  8.9  |  8.6 |
> | Clang (plain)      | 15.7  | 15.6 |
> | Clang (fragmented) | 19.4  | 19.2 |
> +--------------------+-------+------+
>
> I'm keen to hear what people's feedback is, and also interested to see
> what results others might see by running this experiment on other input
> packages. Also, if anybody has any alternative ideas that meet the goals
> listed below, I'd love to hear them!
>
> To reiterate some key goals of fragmented DWARF, similar to what I said in
> the presentation:
> 1) Devise a scheme that gives significant size savings without being too
> costly. It's clear from just the two packages I've tried this on that there
> is a fairly hefty link time performance cost, although the exact cost
> depends on the nature of the input package. On the other hand, depending on
> the nature of the input package, there can also be some big gains.
> 2) Devise a scheme that doesn't require any linker knowledge of DWARF. The
> current approach doesn't quite achieve this properly due to the slight
> misuse of SHF_LINK_ORDER, but I expect that a pivot to using non-COMDAT
> group sections should solve this problem.
> 3) Provide some kind of halfway house between simply writing tombstone
> values into dead DWARF and fully parsing the DWARF to reoptimise
> its/discard the dead bits.
>
> I'm hopeful that changes could be made to the linker to improve the
> link-time cost. There seems to be a significant amount of the link time
> spent creating the input sections. An alternative would be to devise a
> scheme that would avoid the literal splitting into section headers, in
> favour of some sort of list of split-points that the linker uses to split
> things up (a bit like it already does for .eh_frame or mergeable sections).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201012/0d1dcc5e/attachment.html>


More information about the llvm-dev mailing list