[llvm-dev] Fragmented DWARF

Thu Oct 29 08:04:33 PDT 2020

Hi James,

Thank you very much for the information.
According to the first problem: Could you send me a clang build 
configuration that you used so that I could reproduce the problem, please?

For the second problem: yes, I built the experiment with 
-ffunction-sections -fdata-sections.
According to the error message, it seems, that address ranges were read 
incorrectly.
As a quick guess, Could it be that incorrect address ranges are marked 
with -1/-2 value? Then they might be handled incorrectly, since this 
patch does not support(and was not tested) with LowPC>HighPC case. The 
simplest solution would be not to use -1/-2 values with this patch.

Thank you, Alexey.

On 29.10.2020 13:52, James Henderson wrote:
> Hi Alexey,
>
> I've just started looking at running your patch on the clang and game 
> packages I used for the Fragmented DWARF experiment, and on both 
> occasions, I got "warning: Generated debug info is broken" near the 
> end of the link. Digging further, the actual error this represented 
> (for the clang case) was "invalid e_shentsize in ELF header: 16912" 
> (aside: there are several Expected instances around where the former 
> warning was reported which are being thrown away and will cause 
> assertions under the right configuration). I don't really follow the 
> code enough to understand whether this is a bug in the code or 
> possibly some weird interaction with our downstream patches (I don't 
> expect the latter, for the clang build, as our patches are supposed to 
> be a no-op when not using our target). I'll check what happens with 
> the clang package if I try using a completely vanilla LLVM with your 
> patch applied.
>
> I also got a large number of "no mapping for range" warnings when 
> linking the game package. I tried debugging the code in the area, but 
> the data types are all difficult to debug, and I don't really 
> understand the relevant area of code enough to be able to theorise 
> what actually is causing this. llvm-dwarfdump --verify doesn't flag up 
> any issues, and there's nothing obviously broken looking at the dump 
> of the debug data either. Any pointers as to what might be going wrong 
> would be appreciated. I assume with your experiments that you build 
> with -ffunction-sections/-fdata-sections for maximum GC opportunities?
>
> Thanks,
>
> James
>
> On Mon, 19 Oct 2020 at 09:50, James Henderson 
> <jh7370.2008 at my.bristol.ac.uk <mailto:jh7370.2008 at my.bristol.ac.uk>> 
> wrote:
>
>     Great, thanks Alexey! I'll try to take a look at this in the near
>     future, and will report my results back here. I imagine our clang
>     results will differ, purely because we probably used different
>     toolchains to build the input in the first place.
>
>     On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
>     <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>         On 13.10.2020 10:20, James Henderson wrote:
>>         The script included in the patch can be used to convert an
>>         object containing normal DWARF into an object using
>>         fragmented DWARF. It does this by using llvm-dwarfdump to
>>         dump the various sections, parses the output to identify
>>         where it should split (using the offsets of the various
>>         entries), and then writes new section headers accordingly -
>>         you can see roughly what it's doing if you get a chance to
>>         watch the talk recording. The additional section headers are
>>         appended to the end of the ELF section header table, whilst
>>         the original DWARF is left in the same place it was before
>>         (making use of the fact that section headers don't have to
>>         appear in offset order). The script also parses and fragments
>>         the relocation sections targeting the DWARF sections so that
>>         they match up with the fragmented DWARF sections. This is
>>         clearly all suboptimal - in practice the compiler should be
>>         modified to do the fragmenting upfront, to save having to
>>         parse a tool's stdout, but that was just the simplest thing I
>>         could come up with to quickly write the script. Full details
>>         of the script usage are included in the patch description, if
>>         you want to play around with it.
>>
>>         If Alexey could point me at the latest version of his patch,
>>         I'd be happy to run that through either or both of the
>>         packages I used to see what happens. Equally, I'd be happy if
>>         Alexey is able to run my script to fragment and measure the
>>         performance of a couple of projects he's been working with.
>>         Based purely on the two packages I've tried this with, I can
>>         tell already that the results can vary wildly. My expectation
>>         is that Alexey's approach will be slower (at least in its
>>         current form, but probably more generally), but produce
>>         smaller output, but to what scale I have no idea.
>
>         James, I updated the patch - https://reviews.llvm.org/D74169.
>
>         To make it working it is necessary to build example with
>         -ffunction-sections and specify following options to the linker :
>
>         --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
>         For clang binary I got following results:
>
>         1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
>         2. --gc-sections --gc-debuginfo = binary size 840M, 8x
>         performance decrease, Debug Info size 542M
>
>         3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary
>         size 1,3G, 16x performance decrease, Debug Info size 1G
>
>         (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
>         I added option --gc-debuginfo-no-odr, so that size reduction
>         could be compared correctly. Without that option D74169 does
>         types deduplication and then it is not correct to compare
>         resulting size with "Fragmented DWARF" solution which does not
>         do types deduplication.
>
>         Also, I look at your D89229 <https://reviews.llvm.org/D89229>
>         and would share results some time later.
>
>         Thank you, Alexey.
>
>>
>>         I think linkers parse .eh_frame partly because they have no
>>         other choice. That being said, I think it's format is not too
>>         complex, so similarly the parser isn't too complex. You can
>>         see LLD's ELF implementation in ELF/EhFrame.cpp, how it is
>>         used in ELF/InputSection.cpp (see the bits to do with
>>         EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
>>         (plus various usages of these two throughout the LLD code). I
>>         think the key to any structural changes in the DWARF format
>>         to make them more amenable to link-time parsing is being able
>>         to read a minimal amount without needing to parse the payload
>>         (e.g. a length field, some sort of type, and then using the
>>         relocations to associate it accordingly).
>>
>>         James
>>
>>         On Mon, 12 Oct 2020 at 20:48, David Blaikie
>>         <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>
>>             Awesome! Sorry I missed the lightning talk, but really
>>             interested to see this sort of thing (though it's not
>>             directly/immediately applicable to the use case I work
>>             with - Split DWARF, something similar could be used there
>>             with further work)
>>
>>             Though it looks like the patch has mostly linker changes
>>             - where/how do you generate the fragmented DWARF to begin
>>             with? Via the Python script? Run over assembly? I'd be
>>             surprised if it was achievable that way - curious to know
>>             more.
>>
>>             Got a rough sense/are you able to run apples-to-apples
>>             comparisons with Alexey's linker-based patches to compare
>>             linker time/memory overhead versus resulting output size
>>             gains?
>>
>>             (& yeah, I'm a bit curious about how the linkers do
>>             eh_frame rewriting, if the format is especially amenable
>>             to a lightweight parsing/rewriting and how we could make
>>             the DWARF more amenable to that too)
>>
>>             On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>             <jh7370.2008 at my.bristol.ac.uk
>>             <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>
>>                 Hi all,
>>
>>                 At the recent LLVM developers' meeting, I presented a
>>                 lightning talk on an approach to reduce the amount of
>>                 dead debug data left in an executable following
>>                 operations such as --gc-sections and duplicate COMDAT
>>                 removal. In that presentation, I presented some
>>                 figures based on linking a game that had been built
>>                 by our downstream clang port and fragmented using the
>>                 described approach. Since recording the presentation,
>>                 I ran the same experiment on a clang package (this
>>                 time built with a GCC version). The comparable
>>                 figures are below:
>>
>>                 Link-time speed (s):
>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>                 | Package variant    | No GC | GC 1 (normal) | GC 2 |
>>                 GC 3 | GC 4 | GC 5 | GC 6 |
>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>                 | Game (plain)       |  4.5  |  4.9          | 4.2 | 
>>                 3.6 |  3.4 |  3.3 |  3.2 |
>>                 | Game (fragmented)  | 11.1  | 11.8          | 9.7 | 
>>                 8.6 |  7.9 |  7.7 |  7.5 |
>>                 | Clang (plain)      | 13.9  | 17.9          | 17.0 |
>>                 16.7 | 16.3 | 16.2 | 16.1 |
>>                 | Clang (fragmented) | 18.6  | 22.8          | 21.6 |
>>                 21.1 | 20.8 | 20.5 | 20.2 |
>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>
>>                 Output size - Game package (MB):
>>                 +---------------------+-------+------+------+------+------+------+------+
>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>                 GC 4 | GC 5 | GC 6 |
>>                 +---------------------+-------+------+------+------+------+------+------+
>>                 | Plain (total)       | 1149  | 1121 | 1017 |  965 | 
>>                 938 |  930 |  928 |
>>                 | Plain (DWARF*)      |  845  |  845 | 845 |  845 | 
>>                 845 |  845 |  845 |
>>                 | Plain (other)       |  304  |  276 | 172 |  120 |  
>>                 93 |   85 |   82 |
>>                 | Fragmented (total)  | 1044  |  940 | 556 |  373 | 
>>                 287 |  263 |  255 |
>>                 | Fragmented (DWARF*) |  740  |  664 | 384 |  253 | 
>>                 194 |  178 |  173 |
>>                 | Fragmented (other)  |  304  |  276 | 172 |  120 |  
>>                 93 |   85 |   82 |
>>                 +---------------------+-------+------+------+------+------+------+------+
>>
>>
>>                 Output size - Clang (MB):
>>                 +---------------------+-------+------+------+------+------+------+------+
>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>                 GC 4 | GC 5 | GC 6 |
>>                 +---------------------+-------+------+------+------+------+------+------+
>>                 | Plain (total)       | 2596  | 2546 | 2406 | 2332 |
>>                 2293 | 2273 | 2251 |
>>                 | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 |
>>                 1979 | 1979 | 1979 |
>>                 | Plain (other)       |  616  |  567 | 426 |  353 | 
>>                 314 |  294 |  272 |
>>                 | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 |
>>                 2017 | 1990 | 1963 |
>>                 | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 |
>>                 1703 | 1696 | 1691 |
>>                 | Fragmented (other)  |  616  |  567 | 426 |  353 | 
>>                 314 |  294 |  272 |
>>                 +---------------------+-------+------+------+------+------+------+------+
>>
>>                 *DWARF size == total size of .debug_info +
>>                 .debug_line + .debug_ranges + .debug_aranges + .debug_loc
>>
>>                 Additionally, I have posted
>>                 https://reviews.llvm.org/D89229 which provides the
>>                 python script and linker patches used to reproduce
>>                 the above results on my machine. The GC 1/2/3/4/5/6
>>                 correspond to the linker option added in that patch
>>                 --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>                 respectively.
>>
>>                 During the conference, the question was asked what
>>                 the memory usage and input size impact was. I've
>>                 summarised these below:
>>
>>                 Input file size total (GB):
>>                 +--------------------+------------+
>>                 | Package variant    | Total Size |
>>                 +--------------------+------------+
>>                 | Game (plain)       |     2.9    |
>>                 | Game (fragmented)  |     4.2    |
>>                 | Clang (plain)      |    10.9    |
>>                 | Clang (fragmented) |    12.3    |
>>                 +--------------------+------------+
>>
>>                 Peak Working Set Memory usage (GB):
>>                 +--------------------+-------+------+
>>                 | Package variant    | No GC | GC 1 |
>>                 +--------------------+-------+------+
>>                 | Game (plain)       |  4.3  |  4.7 |
>>                 | Game (fragmented)  |  8.9  |  8.6 |
>>                 | Clang (plain)      | 15.7  | 15.6 |
>>                 | Clang (fragmented) | 19.4  | 19.2 |
>>                 +--------------------+-------+------+
>>
>>                 I'm keen to hear what people's feedback is, and also
>>                 interested to see what results others might see by
>>                 running this experiment on other input packages.
>>                 Also, if anybody has any alternative ideas that meet
>>                 the goals listed below, I'd love to hear them!
>>
>>                 To reiterate some key goals of fragmented DWARF,
>>                 similar to what I said in the presentation:
>>                 1) Devise a scheme that gives significant size
>>                 savings without being too costly. It's clear from
>>                 just the two packages I've tried this on that there
>>                 is a fairly hefty link time performance cost,
>>                 although the exact cost depends on the nature of the
>>                 input package. On the other hand, depending on the
>>                 nature of the input package, there can also be some
>>                 big gains.
>>                 2) Devise a scheme that doesn't require any linker
>>                 knowledge of DWARF. The current approach doesn't
>>                 quite achieve this properly due to the slight misuse
>>                 of SHF_LINK_ORDER, but I expect that a pivot to using
>>                 non-COMDAT group sections should solve this problem.
>>                 3) Provide some kind of halfway house between simply
>>                 writing tombstone values into dead DWARF and fully
>>                 parsing the DWARF to reoptimise its/discard the dead
>>                 bits.
>>
>>                 I'm hopeful that changes could be made to the linker
>>                 to improve the link-time cost. There seems to be a
>>                 significant amount of the link time spent creating
>>                 the input sections. An alternative would be to devise
>>                 a scheme that would avoid the literal splitting into
>>                 section headers, in favour of some sort of list of
>>                 split-points that the linker uses to split things up
>>                 (a bit like it already does for .eh_frame or
>>                 mergeable sections).
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/1ef6ccfa/attachment-0001.html>