[llvm-dev] Fragmented DWARF

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 5 11:58:29 PST 2020


On 04.11.2020 16:57, James Henderson wrote:
> Great, thanks! Those results are about roughly what I was expecting. I 
> assume "compilation time" is actually just the link time?

yep, that is link time.


>
> I find it particularly interesting that the DWARFLinker rewriting 
> solution produces the same size improvement in .debug_line as the 
> fragmented DWARF approach. That suggests that in that case, fragmented 
> DWARF output is probably about as optimal as it can get. I'm not 
> surprised that the same can't be said for other sections, but I'm also 
> pleased to see that the full rewrite option isn't so much better in 
> size improvements.
>
> Regarding the problems I was having with the patch, if you want to try 
> reproducing the problems with clang, I built commit 05d02e5a of clang 
> using gcc 7.5.0 on Ubuntu 18.04, to generate an ELF package. I then 
> used LLD to relink it to create a reproducible package. As I'm 
> primarily a Windows developer, I transferred this package to my 
> Windows machine so that I could use my existing Windows checkout of 
> LLVM, applied your patch, rebuilt LLD, and used that to try linking 
> the package, getting the stated message. I'm going to have another try 
> at the latter now to see if I can figure out what the issue is myself.
>
> James
>
> On Wed, 4 Nov 2020 at 13:35, Alexey Lapshin <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 04.11.2020 15:28, James Henderson wrote:
>>     Hi Alexey,
>>
>>     Thanks for taking a look at these. I noticed you set the
>>     --mark-live-pc value to a value other than 1 for the fragmented
>>     DWARF version. This will mean additional GC-ing will be done
>>     beyond the amount that --gc-sections will do, so unless you use
>>     the same value for the option for other versions, the result will
>>     not be comparable. (The option is purely there to experiment with
>>     the effects were different amounts of the input codebase to be
>>     considered dead). Would you be okay to run those figures again
>>     without the option specified?
>
>     Oh, mis-interpreted that option. Following are updated results:
>
>     1. llvm-strings:
>
>        source object files size: 381M.
>        fragmented source object files size: 451M(18% increase).
>
>        a. upstream version,
>           command line options: --gc-sections
>           binary size: 6,5M
>           compilation time: 0:00.13 sec
>           run-time memory: 111kb
>
>        b. "fragmented DWARF" version,
>           command line options: --gc-sections
>           binary size: 5,3M
>           compilation time: 0:00.11 sec
>           run-time memory: 125kb
>
>        c. DWARFLinker version,
>           command line options: --gc-sections --gc-debuginfo
>           binary size: 3,8M
>           compilation time: 0:00.33 sec
>           run-time memory: 141kb
>
>        d. DWARFLinker no-odr version,
>           command line options: --gc-sections --gc-debuginfo
>     --gc-debuginfo-no-odr
>           binary size: 4,3M
>           compilation time: 0:00.38 sec
>           run-time memory: 142kb
>
>
>     2. clang:
>
>        source object files size: 6,5G.
>        fragmented source object files size: 7,3G(13% increase).
>
>        a. upstream version,
>           command line options: --gc-sections
>           binary size: 1,5G
>           compilation time: 6 sec
>           run-time memory: 9.7G
>
>        b. "fragmented DWARF" version,
>           command line options: --gc-sections
>           binary size: 1,4G
>           compilation time: 8 sec
>           run-time memory: 12G
>
>        c. DWARFLinker version,
>           command line options: --gc-sections --gc-debuginfo
>           binary size: 836M
>           compilation time: 62 sec
>           run-time memory: 15G
>
>        d. DWARFLinker no-odr version,
>           command line options: --gc-sections --gc-debuginfo
>     --gc-debuginfo-no-odr
>           binary size: 1,3G
>           compilation time: 128 sec
>           run-time memory: 17G
>
>     Detailed size results:
>
>     1. a)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       41.1%  2.64Mi   0.0%       0    .debug_info
>       24.9%  1.60Mi   0.0%       0    .debug_str
>       12.6%   827Ki   0.0%       0    .debug_line
>        6.5%   428Ki  63.8%   428Ki    .text
>        4.8%   317Ki   0.0%       0    .strtab
>        3.4%   223Ki   0.0%       0    .debug_ranges
>        2.0%   133Ki  19.8%   133Ki    .eh_frame
>        1.7%   110Ki   0.0%       0    .symtab
>        1.2%  77.6Ki   0.0%       0    .debug_abbrev
>
>        b)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       40.2%  2.10Mi   0.0%       0    .debug_info
>       30.7%  1.60Mi   0.0%       0    .debug_str
>        8.0%   428Ki  63.8%   428Ki    .text
>        5.9%   317Ki   0.0%       0    .strtab
>        5.9%   313Ki   0.0%       0    .debug_line
>        2.5%   133Ki  19.8%   133Ki    .eh_frame
>        2.1%   110Ki   0.0%       0    .symtab
>        1.5%  77.6Ki   0.0%       0    .debug_abbrev
>        1.3%  69.2Ki   0.0%       0    .debug_ranges
>
>        c)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       33.0%  1.25Mi   0.0%       0    .debug_info
>       29.2%  1.11Mi   0.0%       0    .debug_str
>       11.0%   428Ki  63.8%   428Ki    .text
>        8.2%   317Ki   0.0%       0    .strtab
>        7.8%   304Ki   0.0%       0    .debug_line
>        3.4%   133Ki  19.8%   133Ki    .eh_frame
>        2.8%   110Ki   0.0%       0    .symtab
>        1.7%  65.9Ki   0.0%       0    .debug_ranges
>        1.0%  38.4Ki   5.7%  38.4Ki    .rodata
>
>        d)
>
>            FILE SIZE        VM SIZE
>      --------------  --------------
>       39.7%  1.68Mi   0.0%       0    .debug_info
>       26.3%  1.11Mi   0.0%       0    .debug_str
>        9.9%   428Ki  63.8%   428Ki    .text
>        7.3%   317Ki   0.0%       0    .strtab
>        7.0%   304Ki   0.0%       0    .debug_line
>        3.1%   133Ki  19.8%   133Ki    .eh_frame
>        2.6%   110Ki   0.0%       0    .symtab
>        1.5%  65.9Ki   0.0%       0    .debug_ranges
>
>
>     2. a)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       58.3%   878Mi   0.0%       0    .debug_info
>       11.8%   177Mi   0.0%       0    .debug_str
>        7.7%   115Mi  62.2%   115Mi    .text
>        7.7%   115Mi   0.0%       0    .debug_line
>        6.0%  90.7Mi   0.0%       0    .strtab
>        2.4%  35.4Mi   0.0%       0    .debug_ranges
>        1.5%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        1.5%  23.0Mi  12.4%  23.0Mi    .rodata
>        1.2%  17.9Mi   0.0%       0    .symtab
>
>        b)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       59.6%   807Mi   0.0%       0    .debug_info
>       13.1%   177Mi   0.0%       0    .debug_str
>        8.5%   115Mi  62.2%   115Mi    .text
>        6.7%  90.7Mi   0.0%       0    .strtab
>        4.2%  57.4Mi   0.0%       0    .debug_line
>        1.7%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        1.7%  23.0Mi  12.4%  23.0Mi    .rodata
>        1.3%  17.9Mi   0.0%       0    .symtab
>        1.0%  13.0Mi   0.0%       0    .debug_ranges
>        0.8%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>        c)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       35.1%   293Mi   0.0%       0    .debug_info
>       21.2%   177Mi   0.0%       0    .debug_str
>       13.9%   115Mi  62.2%   115Mi    .text
>       10.9%  90.7Mi   0.0%       0    .strtab
>        6.9%  57.4Mi   0.0%       0    .debug_line
>        2.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        2.8%  23.0Mi  12.4%  23.0Mi    .rodata
>        2.1%  17.9Mi   0.0%       0    .symtab
>        1.5%  12.4Mi   0.0%       0    .debug_ranges
>        1.3%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>        d)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       58.3%   758Mi   0.0%       0    .debug_info
>       13.6%   177Mi   0.0%       0    .debug_str
>        8.9%   115Mi  62.2%   115Mi    .text
>        7.0%  90.7Mi   0.0%       0    .strtab
>        4.4%  57.4Mi   0.0%       0    .debug_line
>        1.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        1.8%  23.0Mi  12.4%  23.0Mi    .rodata
>        1.4%  17.9Mi   0.0%       0    .symtab
>        1.0%  12.4Mi   0.0%       0    .debug_ranges
>        0.8%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>
>>
>>     I'm still trying to figure out the problems on my end to try
>>     running your experiment on the game package I used in my
>>     presentation, but have been interrupted by other unrelated
>>     issues. I'll try to get back to this in the coming days.
>>
>>     James
>>
>>     On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin
>>     <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>         Hi James,
>>
>>         I did experiments with the clang code base and will do
>>         experiments with our local codebase later.
>>         Overall, both solutions("Fragmented DWARF" and "DWARFLinker
>>         without odr types deduplication") look having similar size
>>         savings results for the final binary. "DWARFLinker with odr
>>         types deduplication" has a bigger size saving effect.
>>         "Fragmented DWARF" increases the size of original object
>>         files up to 15%.
>>         LLD with "fragmented DWARF" works significantly faster than
>>         with "DWARFLinker".
>>
>>         Following are the results for "llvm-strings" and "clang"
>>         binaries:
>>
>>         1. llvm-strings:
>>
>>            source object files size: 381M.
>>            fragmented source object files size: 451M(18% increase).
>>
>>            a. upstream version,
>>               command line options: --gc-sections
>>               binary size: 6,5M
>>               compilation time: 0:00.13 sec
>>               run-time memory: 111kb
>>
>>            b. "fragmented DWARF" version,
>>               command line options: --gc-sections --mark-live-pc=0.45
>>               binary size: 3,7M
>>               compilation time: 0:00.10 sec
>>               run-time memory: 122kb
>>
>>            c. DWARFLinker version,
>>               command line options: --gc-sections --gc-debuginfo
>>               binary size: 3,8M
>>               compilation time: 0:00.33 sec
>>               run-time memory: 141kb
>>
>>            d. DWARFLinker no-odr version,
>>               command line options: --gc-sections --gc-debuginfo
>>         --gc-debuginfo-no-odr
>>               binary size: 4,3M
>>               compilation time: 0:00.38 sec
>>               run-time memory: 142kb
>>
>>
>>         2. clang:
>>
>>            source object files size: 6,5G.
>>            fragmented source object files size: 7,3G(13% increase).
>>
>>            a. upstream version,
>>               command line options: --gc-sections
>>               binary size: 1,5G
>>               compilation time: 6 sec
>>               run-time memory: 9.7G
>>
>>            b. "fragmented DWARF" version,
>>               command line options: --gc-sections --mark-live-pc=0.43
>>               binary size: 1,1G
>>               compilation time: 9 sec
>>               run-time memory: 11G
>>
>>            c. DWARFLinker version,
>>               command line options: --gc-sections --gc-debuginfo
>>               binary size: 836M
>>               compilation time: 62 sec
>>               run-time memory: 15G
>>
>>            d. DWARFLinker no-odr version,
>>               command line options: --gc-sections --gc-debuginfo
>>         --gc-debuginfo-no-odr
>>               binary size: 1,3G
>>               compilation time: 128 sec
>>               run-time memory: 17G
>>
>>         Detailed size results:
>>
>>         1. llvm-strings
>>
>>            a)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           41.1%  2.64Mi   0.0%       0 .debug_info
>>           24.9%  1.60Mi   0.0%       0 .debug_str
>>           12.6%   827Ki   0.0%       0 .debug_line
>>            6.5%   428Ki  63.8%   428Ki    .text
>>            4.8%   317Ki   0.0%       0    .strtab
>>            3.4%   223Ki   0.0%       0 .debug_ranges
>>            2.0%   133Ki  19.8%   133Ki .eh_frame
>>            1.7%   110Ki   0.0%       0    .symtab
>>            1.2%  77.6Ki   0.0%       0 .debug_abbrev
>>
>>            b)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           50.3%  1.85Mi   0.0%       0 .debug_info
>>           43.6%  1.60Mi   0.0%       0 .debug_str
>>            2.6%  98.2Ki   0.0%       0 .debug_line
>>            2.1%  77.6Ki   0.0%       0 .debug_abbrev
>>            0.5%  17.5Ki  54.9%  17.4Ki    .text
>>            0.3%  9.94Ki   0.0%       0    .strtab
>>            0.2%  6.27Ki   0.0%       0    .symtab
>>            0.1%  5.09Ki  15.9%  5.03Ki .eh_frame
>>            0.1%  3.28Ki   0.0%       0 .debug_ranges
>>
>>            c)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           33.0%  1.25Mi   0.0%       0 .debug_info
>>           29.2%  1.11Mi   0.0%       0 .debug_str
>>           11.0%   428Ki  63.8%   428Ki    .text
>>            8.2%   317Ki   0.0%       0    .strtab
>>            7.8%   304Ki   0.0%       0 .debug_line
>>            3.4%   133Ki  19.8%   133Ki .eh_frame
>>            2.8%   110Ki   0.0%       0    .symtab
>>            1.7%  65.9Ki   0.0%       0 .debug_ranges
>>            1.0%  38.4Ki   5.7%  38.4Ki    .rodata
>>
>>            d)
>>
>>                FILE SIZE        VM SIZE
>>          --------------  --------------
>>           39.7%  1.68Mi   0.0%       0 .debug_info
>>           26.3%  1.11Mi   0.0%       0 .debug_str
>>            9.9%   428Ki  63.8%   428Ki    .text
>>            7.3%   317Ki   0.0%       0    .strtab
>>            7.0%   304Ki   0.0%       0 .debug_line
>>            3.1%   133Ki  19.8%   133Ki .eh_frame
>>            2.6%   110Ki   0.0%       0    .symtab
>>            1.5%  65.9Ki   0.0%       0 .debug_ranges
>>
>>
>>         2. clang
>>
>>            a)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           58.3%   878Mi   0.0%       0 .debug_info
>>           11.8%   177Mi   0.0%       0 .debug_str
>>            7.7%   115Mi  62.2%   115Mi    .text
>>            7.7%   115Mi   0.0%       0 .debug_line
>>            6.0%  90.7Mi   0.0%       0    .strtab
>>            2.4%  35.4Mi   0.0%       0 .debug_ranges
>>            1.5%  23.3Mi  12.5%  23.3Mi .eh_frame
>>            1.5%  23.0Mi  12.4%  23.0Mi    .rodata
>>            1.2%  17.9Mi   0.0%       0    .symtab
>>
>>            b)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           71.5%   772Mi   0.0%       0 .debug_info
>>           16.5%   177Mi   0.0%       0 .debug_str
>>            3.7%  40.2Mi  59.2%  40.2Mi    .text
>>            2.4%  25.8Mi   0.0%       0 .debug_line
>>            2.1%  23.0Mi   0.0%       0    .strtab
>>            1.0%  10.6Mi  15.6%  10.6Mi    .dynstr
>>            0.7%  7.18Mi  10.6%  7.18Mi .eh_frame
>>            0.5%  5.60Mi   0.0%       0    .symtab
>>            0.4%  4.28Mi   0.0%       0 .debug_ranges
>>            0.4%  4.04Mi   0.0%       0 .debug_abbrev
>>
>>
>>            c)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           35.1%   293Mi   0.0%       0 .debug_info
>>           21.2%   177Mi   0.0%       0 .debug_str
>>           13.9%   115Mi  62.2%   115Mi    .text
>>           10.9%  90.7Mi   0.0%       0    .strtab
>>            6.9%  57.4Mi   0.0%       0 .debug_line
>>            2.8%  23.3Mi  12.5%  23.3Mi .eh_frame
>>            2.8%  23.0Mi  12.4%  23.0Mi    .rodata
>>            2.1%  17.9Mi   0.0%       0    .symtab
>>            1.5%  12.4Mi   0.0%       0 .debug_ranges
>>            1.3%  10.6Mi   5.7%  10.6Mi    .dynstr
>>
>>            d)
>>
>>             FILE SIZE        VM SIZE
>>          --------------  --------------
>>           58.3%   758Mi   0.0%       0 .debug_info
>>           13.6%   177Mi   0.0%       0 .debug_str
>>            8.9%   115Mi  62.2%   115Mi    .text
>>            7.0%  90.7Mi   0.0%       0    .strtab
>>            4.4%  57.4Mi   0.0%       0 .debug_line
>>            1.8%  23.3Mi  12.5%  23.3Mi .eh_frame
>>            1.8%  23.0Mi  12.4%  23.0Mi    .rodata
>>            1.4%  17.9Mi   0.0%       0    .symtab
>>            1.0%  12.4Mi   0.0%       0 .debug_ranges
>>            0.8%  10.6Mi   5.7%  10.6Mi    .dynstr
>>
>>         Thank you, Alexey.
>>
>>         On 19.10.2020 11:50, James Henderson wrote:
>>>         Great, thanks Alexey! I'll try to take a look at this in the
>>>         near future, and will report my results back here. I imagine
>>>         our clang results will differ, purely because we probably
>>>         used different toolchains to build the input in the first place.
>>>
>>>         On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
>>>         <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>>
>>>
>>>             On 13.10.2020 10:20, James Henderson wrote:
>>>>             The script included in the patch can be used to convert
>>>>             an object containing normal DWARF into an object using
>>>>             fragmented DWARF. It does this by using llvm-dwarfdump
>>>>             to dump the various sections, parses the output to
>>>>             identify where it should split (using the offsets of
>>>>             the various entries), and then writes new section
>>>>             headers accordingly - you can see roughly what it's
>>>>             doing if you get a chance to watch the talk recording.
>>>>             The additional section headers are appended to the end
>>>>             of the ELF section header table, whilst the original
>>>>             DWARF is left in the same place it was before (making
>>>>             use of the fact that section headers don't have to
>>>>             appear in offset order). The script also parses and
>>>>             fragments the relocation sections targeting the DWARF
>>>>             sections so that they match up with the fragmented
>>>>             DWARF sections. This is clearly all suboptimal - in
>>>>             practice the compiler should be modified to do the
>>>>             fragmenting upfront, to save having to parse a tool's
>>>>             stdout, but that was just the simplest thing I could
>>>>             come up with to quickly write the script. Full details
>>>>             of the script usage are included in the patch
>>>>             description, if you want to play around with it.
>>>>
>>>>             If Alexey could point me at the latest version of his
>>>>             patch, I'd be happy to run that through either or both
>>>>             of the packages I used to see what happens. Equally,
>>>>             I'd be happy if Alexey is able to run my script to
>>>>             fragment and measure the performance of a couple of
>>>>             projects he's been working with. Based purely on the
>>>>             two packages I've tried this with, I can tell already
>>>>             that the results can vary wildly. My expectation is
>>>>             that Alexey's approach will be slower (at least in its
>>>>             current form, but probably more generally), but produce
>>>>             smaller output, but to what scale I have no idea.
>>>
>>>             James, I updated the patch -
>>>             https://reviews.llvm.org/D74169.
>>>
>>>             To make it working it is necessary to build example with
>>>             -ffunction-sections and specify following options to the
>>>             linker :
>>>
>>>             --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>>>
>>>             For clang binary I got following results:
>>>
>>>             1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>>>
>>>             2. --gc-sections --gc-debuginfo = binary size 840M, 8x
>>>             performance decrease, Debug Info size 542M
>>>
>>>             3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr =
>>>             binary size 1,3G, 16x performance decrease, Debug Info
>>>             size 1G
>>>
>>>             (*)
>>>             .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>>>
>>>
>>>             I added option --gc-debuginfo-no-odr, so that size
>>>             reduction could be compared correctly. Without that
>>>             option D74169 does types deduplication and then it is
>>>             not correct to compare resulting size with "Fragmented
>>>             DWARF" solution which does not do types deduplication.
>>>
>>>             Also, I look at your D89229
>>>             <https://reviews.llvm.org/D89229> and would share
>>>             results some time later.
>>>
>>>             Thank you, Alexey.
>>>
>>>>
>>>>             I think linkers parse .eh_frame partly because they
>>>>             have no other choice. That being said, I think it's
>>>>             format is not too complex, so similarly the parser
>>>>             isn't too complex. You can see LLD's ELF implementation
>>>>             in ELF/EhFrame.cpp, how it is used in
>>>>             ELF/InputSection.cpp (see the bits to do with
>>>>             EhInputSection) and EhFrameSection in
>>>>             ELF/SyntheticSections.h (plus various usages of these
>>>>             two throughout the LLD code). I think the key to any
>>>>             structural changes in the DWARF format to make them
>>>>             more amenable to link-time parsing is being able to
>>>>             read a minimal amount without needing to parse the
>>>>             payload (e.g. a length field, some sort of type, and
>>>>             then using the relocations to associate it accordingly).
>>>>
>>>>             James
>>>>
>>>>             On Mon, 12 Oct 2020 at 20:48, David Blaikie
>>>>             <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>>
>>>>                 Awesome! Sorry I missed the lightning talk, but
>>>>                 really interested to see this sort of thing (though
>>>>                 it's not directly/immediately applicable to the use
>>>>                 case I work with - Split DWARF, something similar
>>>>                 could be used there with further work)
>>>>
>>>>                 Though it looks like the patch has mostly linker
>>>>                 changes - where/how do you generate the fragmented
>>>>                 DWARF to begin with? Via the Python script? Run
>>>>                 over assembly? I'd be surprised if it was
>>>>                 achievable that way - curious to know more.
>>>>
>>>>                 Got a rough sense/are you able to run
>>>>                 apples-to-apples comparisons with Alexey's
>>>>                 linker-based patches to compare linker time/memory
>>>>                 overhead versus resulting output size gains?
>>>>
>>>>                 (& yeah, I'm a bit curious about how the linkers do
>>>>                 eh_frame rewriting, if the format is especially
>>>>                 amenable to a lightweight parsing/rewriting and how
>>>>                 we could make the DWARF more amenable to that too)
>>>>
>>>>                 On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>>>                 <jh7370.2008 at my.bristol.ac.uk
>>>>                 <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>>>
>>>>                     Hi all,
>>>>
>>>>                     At the recent LLVM developers' meeting, I
>>>>                     presented a lightning talk on an approach to
>>>>                     reduce the amount of dead debug data left in an
>>>>                     executable following operations such as
>>>>                     --gc-sections and duplicate COMDAT removal. In
>>>>                     that presentation, I presented some figures
>>>>                     based on linking a game that had been built by
>>>>                     our downstream clang port and fragmented using
>>>>                     the described approach. Since recording the
>>>>                     presentation, I ran the same experiment on a
>>>>                     clang package (this time built with a GCC
>>>>                     version). The comparable figures are below:
>>>>
>>>>                     Link-time speed (s):
>>>>                     +--------------------+-------+---------------+------+------+------+------+------+
>>>>                     | Package variant    | No GC | GC 1 (normal) |
>>>>                     GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |
>>>>                     +--------------------+-------+---------------+------+------+------+------+------+
>>>>                     | Game (plain)       |  4.5 |  4.9          | 
>>>>                     4.2 | 3.6 |  3.4 |  3.3 |  3.2 |
>>>>                     | Game (fragmented)  | 11.1  | 11.8          | 
>>>>                     9.7 |  8.6 |  7.9 |  7.7 |  7.5 |
>>>>                     | Clang (plain)      | 13.9  | 17.9          |
>>>>                     17.0 | 16.7 | 16.3 | 16.2 | 16.1 |
>>>>                     | Clang (fragmented) | 18.6  | 22.8          |
>>>>                     21.6 | 21.1 | 20.8 | 20.5 | 20.2 |
>>>>                     +--------------------+-------+---------------+------+------+------+------+------+
>>>>
>>>>                     Output size - Game package (MB):
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>                     | Category            | No GC | GC 1 | GC 2 |
>>>>                     GC 3 | GC 4 | GC 5 | GC 6 |
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>                     | Plain (total)       | 1149 | 1121 | 1017 | 
>>>>                     965 |  938 |  930 |  928 |
>>>>                     | Plain (DWARF*)      |  845 |  845 |  845 | 
>>>>                     845 |  845 |  845 |  845 |
>>>>                     | Plain (other)       |  304 |  276 |  172 | 
>>>>                     120 |   93 |   85 |   82 |
>>>>                     | Fragmented (total)  | 1044 |  940 |  556 | 
>>>>                     373 |  287 |  263 |  255 |
>>>>                     | Fragmented (DWARF*) |  740 |  664 |  384 | 
>>>>                     253 |  194 |  178 |  173 |
>>>>                     | Fragmented (other)  |  304 |  276 |  172 | 
>>>>                     120 |   93 |   85 |   82 |
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>
>>>>
>>>>                     Output size - Clang (MB):
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>                     | Category            | No GC | GC 1 | GC 2 |
>>>>                     GC 3 | GC 4 | GC 5 | GC 6 |
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>                     | Plain (total)       | 2596  | 2546 | 2406 |
>>>>                     2332 | 2293 | 2273 | 2251 |
>>>>                     | Plain (DWARF*)      | 1979  | 1979 | 1979 |
>>>>                     1979 | 1979 | 1979 | 1979 |
>>>>                     | Plain (other)       | 616  |  567 |  426 | 
>>>>                     353 |  314 |  294 |  272 |
>>>>                     | Fragmented (total)  | 2397  | 2346 | 2164 |
>>>>                     2069 | 2017 | 1990 | 1963 |
>>>>                     | Fragmented (DWARF*) | 1780  | 1780 | 1738 |
>>>>                     1716 | 1703 | 1696 | 1691 |
>>>>                     | Fragmented (other)  | 616  |  567 |  426 | 
>>>>                     353 |  314 |  294 |  272 |
>>>>                     +---------------------+-------+------+------+------+------+------+------+
>>>>
>>>>                     *DWARF size == total size of .debug_info +
>>>>                     .debug_line + .debug_ranges + .debug_aranges +
>>>>                     .debug_loc
>>>>
>>>>                     Additionally, I have posted
>>>>                     https://reviews.llvm.org/D89229 which provides
>>>>                     the python script and linker patches used to
>>>>                     reproduce the above results on my machine. The
>>>>                     GC 1/2/3/4/5/6 correspond to the linker option
>>>>                     added in that patch --mark-live-pc with values
>>>>                     1/0.8/0.6/0.4/0.2/0 respectively.
>>>>
>>>>                     During the conference, the question was asked
>>>>                     what the memory usage and input size impact
>>>>                     was. I've summarised these below:
>>>>
>>>>                     Input file size total (GB):
>>>>                     +--------------------+------------+
>>>>                     | Package variant    | Total Size |
>>>>                     +--------------------+------------+
>>>>                     | Game (plain)       | 2.9    |
>>>>                     | Game (fragmented)  | 4.2    |
>>>>                     | Clang (plain)      | 10.9    |
>>>>                     | Clang (fragmented) | 12.3    |
>>>>                     +--------------------+------------+
>>>>
>>>>                     Peak Working Set Memory usage (GB):
>>>>                     +--------------------+-------+------+
>>>>                     | Package variant    | No GC | GC 1 |
>>>>                     +--------------------+-------+------+
>>>>                     | Game (plain)       | 4.3  |  4.7 |
>>>>                     | Game (fragmented)  | 8.9  |  8.6 |
>>>>                     | Clang (plain)      | 15.7  | 15.6 |
>>>>                     | Clang (fragmented) | 19.4  | 19.2 |
>>>>                     +--------------------+-------+------+
>>>>
>>>>                     I'm keen to hear what people's feedback is, and
>>>>                     also interested to see what results others
>>>>                     might see by running this experiment on other
>>>>                     input packages. Also, if anybody has any
>>>>                     alternative ideas that meet the goals listed
>>>>                     below, I'd love to hear them!
>>>>
>>>>                     To reiterate some key goals of fragmented
>>>>                     DWARF, similar to what I said in the presentation:
>>>>                     1) Devise a scheme that gives significant size
>>>>                     savings without being too costly. It's clear
>>>>                     from just the two packages I've tried this on
>>>>                     that there is a fairly hefty link time
>>>>                     performance cost, although the exact cost
>>>>                     depends on the nature of the input package. On
>>>>                     the other hand, depending on the nature of the
>>>>                     input package, there can also be some big gains.
>>>>                     2) Devise a scheme that doesn't require any
>>>>                     linker knowledge of DWARF. The current approach
>>>>                     doesn't quite achieve this properly due to the
>>>>                     slight misuse of SHF_LINK_ORDER, but I expect
>>>>                     that a pivot to using non-COMDAT group sections
>>>>                     should solve this problem.
>>>>                     3) Provide some kind of halfway house between
>>>>                     simply writing tombstone values into dead DWARF
>>>>                     and fully parsing the DWARF to reoptimise
>>>>                     its/discard the dead bits.
>>>>
>>>>                     I'm hopeful that changes could be made to the
>>>>                     linker to improve the link-time cost. There
>>>>                     seems to be a significant amount of the link
>>>>                     time spent creating the input sections. An
>>>>                     alternative would be to devise a scheme that
>>>>                     would avoid the literal splitting into section
>>>>                     headers, in favour of some sort of list of
>>>>                     split-points that the linker uses to split
>>>>                     things up (a bit like it already does for
>>>>                     .eh_frame or mergeable sections).
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/cbfb6511/attachment-0001.html>


More information about the llvm-dev mailing list