[llvm-dev] Fragmented DWARF
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 29 08:04:33 PDT 2020
Hi James,
Thank you very much for the information.
According to the first problem: Could you send me a clang build
configuration that you used so that I could reproduce the problem, please?
For the second problem: yes, I built the experiment with
-ffunction-sections -fdata-sections.
According to the error message, it seems, that address ranges were read
incorrectly.
As a quick guess, Could it be that incorrect address ranges are marked
with -1/-2 value? Then they might be handled incorrectly, since this
patch does not support(and was not tested) with LowPC>HighPC case. The
simplest solution would be not to use -1/-2 values with this patch.
Thank you, Alexey.
On 29.10.2020 13:52, James Henderson wrote:
> Hi Alexey,
>
> I've just started looking at running your patch on the clang and game
> packages I used for the Fragmented DWARF experiment, and on both
> occasions, I got "warning: Generated debug info is broken" near the
> end of the link. Digging further, the actual error this represented
> (for the clang case) was "invalid e_shentsize in ELF header: 16912"
> (aside: there are several Expected instances around where the former
> warning was reported which are being thrown away and will cause
> assertions under the right configuration). I don't really follow the
> code enough to understand whether this is a bug in the code or
> possibly some weird interaction with our downstream patches (I don't
> expect the latter, for the clang build, as our patches are supposed to
> be a no-op when not using our target). I'll check what happens with
> the clang package if I try using a completely vanilla LLVM with your
> patch applied.
>
> I also got a large number of "no mapping for range" warnings when
> linking the game package. I tried debugging the code in the area, but
> the data types are all difficult to debug, and I don't really
> understand the relevant area of code enough to be able to theorise
> what actually is causing this. llvm-dwarfdump --verify doesn't flag up
> any issues, and there's nothing obviously broken looking at the dump
> of the debug data either. Any pointers as to what might be going wrong
> would be appreciated. I assume with your experiments that you build
> with -ffunction-sections/-fdata-sections for maximum GC opportunities?
>
> Thanks,
>
> James
>
> On Mon, 19 Oct 2020 at 09:50, James Henderson
> <jh7370.2008 at my.bristol.ac.uk <mailto:jh7370.2008 at my.bristol.ac.uk>>
> wrote:
>
> Great, thanks Alexey! I'll try to take a look at this in the near
> future, and will report my results back here. I imagine our clang
> results will differ, purely because we probably used different
> toolchains to build the input in the first place.
>
> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>
>
> On 13.10.2020 10:20, James Henderson wrote:
>> The script included in the patch can be used to convert an
>> object containing normal DWARF into an object using
>> fragmented DWARF. It does this by using llvm-dwarfdump to
>> dump the various sections, parses the output to identify
>> where it should split (using the offsets of the various
>> entries), and then writes new section headers accordingly -
>> you can see roughly what it's doing if you get a chance to
>> watch the talk recording. The additional section headers are
>> appended to the end of the ELF section header table, whilst
>> the original DWARF is left in the same place it was before
>> (making use of the fact that section headers don't have to
>> appear in offset order). The script also parses and fragments
>> the relocation sections targeting the DWARF sections so that
>> they match up with the fragmented DWARF sections. This is
>> clearly all suboptimal - in practice the compiler should be
>> modified to do the fragmenting upfront, to save having to
>> parse a tool's stdout, but that was just the simplest thing I
>> could come up with to quickly write the script. Full details
>> of the script usage are included in the patch description, if
>> you want to play around with it.
>>
>> If Alexey could point me at the latest version of his patch,
>> I'd be happy to run that through either or both of the
>> packages I used to see what happens. Equally, I'd be happy if
>> Alexey is able to run my script to fragment and measure the
>> performance of a couple of projects he's been working with.
>> Based purely on the two packages I've tried this with, I can
>> tell already that the results can vary wildly. My expectation
>> is that Alexey's approach will be slower (at least in its
>> current form, but probably more generally), but produce
>> smaller output, but to what scale I have no idea.
>
> James, I updated the patch - https://reviews.llvm.org/D74169.
>
> To make it working it is necessary to build example with
> -ffunction-sections and specify following options to the linker :
>
> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
> For clang binary I got following results:
>
> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x
> performance decrease, Debug Info size 542M
>
> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary
> size 1,3G, 16x performance decrease, Debug Info size 1G
>
> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
> I added option --gc-debuginfo-no-odr, so that size reduction
> could be compared correctly. Without that option D74169 does
> types deduplication and then it is not correct to compare
> resulting size with "Fragmented DWARF" solution which does not
> do types deduplication.
>
> Also, I look at your D89229 <https://reviews.llvm.org/D89229>
> and would share results some time later.
>
> Thank you, Alexey.
>
>>
>> I think linkers parse .eh_frame partly because they have no
>> other choice. That being said, I think it's format is not too
>> complex, so similarly the parser isn't too complex. You can
>> see LLD's ELF implementation in ELF/EhFrame.cpp, how it is
>> used in ELF/InputSection.cpp (see the bits to do with
>> EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
>> (plus various usages of these two throughout the LLD code). I
>> think the key to any structural changes in the DWARF format
>> to make them more amenable to link-time parsing is being able
>> to read a minimal amount without needing to parse the payload
>> (e.g. a length field, some sort of type, and then using the
>> relocations to associate it accordingly).
>>
>> James
>>
>> On Mon, 12 Oct 2020 at 20:48, David Blaikie
>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>
>> Awesome! Sorry I missed the lightning talk, but really
>> interested to see this sort of thing (though it's not
>> directly/immediately applicable to the use case I work
>> with - Split DWARF, something similar could be used there
>> with further work)
>>
>> Though it looks like the patch has mostly linker changes
>> - where/how do you generate the fragmented DWARF to begin
>> with? Via the Python script? Run over assembly? I'd be
>> surprised if it was achievable that way - curious to know
>> more.
>>
>> Got a rough sense/are you able to run apples-to-apples
>> comparisons with Alexey's linker-based patches to compare
>> linker time/memory overhead versus resulting output size
>> gains?
>>
>> (& yeah, I'm a bit curious about how the linkers do
>> eh_frame rewriting, if the format is especially amenable
>> to a lightweight parsing/rewriting and how we could make
>> the DWARF more amenable to that too)
>>
>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>> <jh7370.2008 at my.bristol.ac.uk
>> <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>
>> Hi all,
>>
>> At the recent LLVM developers' meeting, I presented a
>> lightning talk on an approach to reduce the amount of
>> dead debug data left in an executable following
>> operations such as --gc-sections and duplicate COMDAT
>> removal. In that presentation, I presented some
>> figures based on linking a game that had been built
>> by our downstream clang port and fragmented using the
>> described approach. Since recording the presentation,
>> I ran the same experiment on a clang package (this
>> time built with a GCC version). The comparable
>> figures are below:
>>
>> Link-time speed (s):
>> +--------------------+-------+---------------+------+------+------+------+------+
>> | Package variant | No GC | GC 1 (normal) | GC 2 |
>> GC 3 | GC 4 | GC 5 | GC 6 |
>> +--------------------+-------+---------------+------+------+------+------+------+
>> | Game (plain) | 4.5 | 4.9 | 4.2 |
>> 3.6 | 3.4 | 3.3 | 3.2 |
>> | Game (fragmented) | 11.1 | 11.8 | 9.7 |
>> 8.6 | 7.9 | 7.7 | 7.5 |
>> | Clang (plain) | 13.9 | 17.9 | 17.0 |
>> 16.7 | 16.3 | 16.2 | 16.1 |
>> | Clang (fragmented) | 18.6 | 22.8 | 21.6 |
>> 21.1 | 20.8 | 20.5 | 20.2 |
>> +--------------------+-------+---------------+------+------+------+------+------+
>>
>> Output size - Game package (MB):
>> +---------------------+-------+------+------+------+------+------+------+
>> | Category | No GC | GC 1 | GC 2 | GC 3 |
>> GC 4 | GC 5 | GC 6 |
>> +---------------------+-------+------+------+------+------+------+------+
>> | Plain (total) | 1149 | 1121 | 1017 | 965 |
>> 938 | 930 | 928 |
>> | Plain (DWARF*) | 845 | 845 | 845 | 845 |
>> 845 | 845 | 845 |
>> | Plain (other) | 304 | 276 | 172 | 120 |
>> 93 | 85 | 82 |
>> | Fragmented (total) | 1044 | 940 | 556 | 373 |
>> 287 | 263 | 255 |
>> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 |
>> 194 | 178 | 173 |
>> | Fragmented (other) | 304 | 276 | 172 | 120 |
>> 93 | 85 | 82 |
>> +---------------------+-------+------+------+------+------+------+------+
>>
>>
>> Output size - Clang (MB):
>> +---------------------+-------+------+------+------+------+------+------+
>> | Category | No GC | GC 1 | GC 2 | GC 3 |
>> GC 4 | GC 5 | GC 6 |
>> +---------------------+-------+------+------+------+------+------+------+
>> | Plain (total) | 2596 | 2546 | 2406 | 2332 |
>> 2293 | 2273 | 2251 |
>> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 |
>> 1979 | 1979 | 1979 |
>> | Plain (other) | 616 | 567 | 426 | 353 |
>> 314 | 294 | 272 |
>> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 |
>> 2017 | 1990 | 1963 |
>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 |
>> 1703 | 1696 | 1691 |
>> | Fragmented (other) | 616 | 567 | 426 | 353 |
>> 314 | 294 | 272 |
>> +---------------------+-------+------+------+------+------+------+------+
>>
>> *DWARF size == total size of .debug_info +
>> .debug_line + .debug_ranges + .debug_aranges + .debug_loc
>>
>> Additionally, I have posted
>> https://reviews.llvm.org/D89229 which provides the
>> python script and linker patches used to reproduce
>> the above results on my machine. The GC 1/2/3/4/5/6
>> correspond to the linker option added in that patch
>> --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>> respectively.
>>
>> During the conference, the question was asked what
>> the memory usage and input size impact was. I've
>> summarised these below:
>>
>> Input file size total (GB):
>> +--------------------+------------+
>> | Package variant | Total Size |
>> +--------------------+------------+
>> | Game (plain) | 2.9 |
>> | Game (fragmented) | 4.2 |
>> | Clang (plain) | 10.9 |
>> | Clang (fragmented) | 12.3 |
>> +--------------------+------------+
>>
>> Peak Working Set Memory usage (GB):
>> +--------------------+-------+------+
>> | Package variant | No GC | GC 1 |
>> +--------------------+-------+------+
>> | Game (plain) | 4.3 | 4.7 |
>> | Game (fragmented) | 8.9 | 8.6 |
>> | Clang (plain) | 15.7 | 15.6 |
>> | Clang (fragmented) | 19.4 | 19.2 |
>> +--------------------+-------+------+
>>
>> I'm keen to hear what people's feedback is, and also
>> interested to see what results others might see by
>> running this experiment on other input packages.
>> Also, if anybody has any alternative ideas that meet
>> the goals listed below, I'd love to hear them!
>>
>> To reiterate some key goals of fragmented DWARF,
>> similar to what I said in the presentation:
>> 1) Devise a scheme that gives significant size
>> savings without being too costly. It's clear from
>> just the two packages I've tried this on that there
>> is a fairly hefty link time performance cost,
>> although the exact cost depends on the nature of the
>> input package. On the other hand, depending on the
>> nature of the input package, there can also be some
>> big gains.
>> 2) Devise a scheme that doesn't require any linker
>> knowledge of DWARF. The current approach doesn't
>> quite achieve this properly due to the slight misuse
>> of SHF_LINK_ORDER, but I expect that a pivot to using
>> non-COMDAT group sections should solve this problem.
>> 3) Provide some kind of halfway house between simply
>> writing tombstone values into dead DWARF and fully
>> parsing the DWARF to reoptimise its/discard the dead
>> bits.
>>
>> I'm hopeful that changes could be made to the linker
>> to improve the link-time cost. There seems to be a
>> significant amount of the link time spent creating
>> the input sections. An alternative would be to devise
>> a scheme that would avoid the literal splitting into
>> section headers, in favour of some sort of list of
>> split-points that the linker uses to split things up
>> (a bit like it already does for .eh_frame or
>> mergeable sections).
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/1ef6ccfa/attachment-0001.html>
More information about the llvm-dev
mailing list