[llvm-dev] Fragmented DWARF

Alexey Lapshin via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 4 05:35:46 PST 2020


On 04.11.2020 15:28, James Henderson wrote:
> Hi Alexey,
>
> Thanks for taking a look at these. I noticed you set the 
> --mark-live-pc value to a value other than 1 for the fragmented DWARF 
> version. This will mean additional GC-ing will be done beyond the 
> amount that --gc-sections will do, so unless you use the same value 
> for the option for other versions, the result will not be comparable. 
> (The option is purely there to experiment with the effects were 
> different amounts of the input codebase to be considered dead). Would 
> you be okay to run those figures again without the option specified?

Oh, mis-interpreted that option. Following are updated results:

1. llvm-strings:

    source object files size: 381M.
    fragmented source object files size: 451M(18% increase).

    a. upstream version,
       command line options: --gc-sections
       binary size: 6,5M
       compilation time: 0:00.13 sec
       run-time memory: 111kb

    b. "fragmented DWARF" version,
       command line options: --gc-sections
       binary size: 5,3M
       compilation time: 0:00.11 sec
       run-time memory: 125kb

    c. DWARFLinker version,
       command line options: --gc-sections --gc-debuginfo
       binary size: 3,8M
       compilation time: 0:00.33 sec
       run-time memory: 141kb

    d. DWARFLinker no-odr version,
       command line options: --gc-sections --gc-debuginfo 
--gc-debuginfo-no-odr
       binary size: 4,3M
       compilation time: 0:00.38 sec
       run-time memory: 142kb


2. clang:

    source object files size: 6,5G.
    fragmented source object files size: 7,3G(13% increase).

    a. upstream version,
       command line options: --gc-sections
       binary size: 1,5G
       compilation time: 6 sec
       run-time memory: 9.7G

    b. "fragmented DWARF" version,
       command line options: --gc-sections
       binary size: 1,4G
       compilation time: 8 sec
       run-time memory: 12G

    c. DWARFLinker version,
       command line options: --gc-sections --gc-debuginfo
       binary size: 836M
       compilation time: 62 sec
       run-time memory: 15G

    d. DWARFLinker no-odr version,
       command line options: --gc-sections --gc-debuginfo 
--gc-debuginfo-no-odr
       binary size: 1,3G
       compilation time: 128 sec
       run-time memory: 17G

Detailed size results:

1. a)

     FILE SIZE        VM SIZE
  --------------  --------------
   41.1%  2.64Mi   0.0%       0    .debug_info
   24.9%  1.60Mi   0.0%       0    .debug_str
   12.6%   827Ki   0.0%       0    .debug_line
    6.5%   428Ki  63.8%   428Ki    .text
    4.8%   317Ki   0.0%       0    .strtab
    3.4%   223Ki   0.0%       0    .debug_ranges
    2.0%   133Ki  19.8%   133Ki    .eh_frame
    1.7%   110Ki   0.0%       0    .symtab
    1.2%  77.6Ki   0.0%       0    .debug_abbrev

    b)

     FILE SIZE        VM SIZE
  --------------  --------------
   40.2%  2.10Mi   0.0%       0    .debug_info
   30.7%  1.60Mi   0.0%       0    .debug_str
    8.0%   428Ki  63.8%   428Ki    .text
    5.9%   317Ki   0.0%       0    .strtab
    5.9%   313Ki   0.0%       0    .debug_line
    2.5%   133Ki  19.8%   133Ki    .eh_frame
    2.1%   110Ki   0.0%       0    .symtab
    1.5%  77.6Ki   0.0%       0    .debug_abbrev
    1.3%  69.2Ki   0.0%       0    .debug_ranges

    c)

     FILE SIZE        VM SIZE
  --------------  --------------
   33.0%  1.25Mi   0.0%       0    .debug_info
   29.2%  1.11Mi   0.0%       0    .debug_str
   11.0%   428Ki  63.8%   428Ki    .text
    8.2%   317Ki   0.0%       0    .strtab
    7.8%   304Ki   0.0%       0    .debug_line
    3.4%   133Ki  19.8%   133Ki    .eh_frame
    2.8%   110Ki   0.0%       0    .symtab
    1.7%  65.9Ki   0.0%       0    .debug_ranges
    1.0%  38.4Ki   5.7%  38.4Ki    .rodata

    d)

        FILE SIZE        VM SIZE
  --------------  --------------
   39.7%  1.68Mi   0.0%       0    .debug_info
   26.3%  1.11Mi   0.0%       0    .debug_str
    9.9%   428Ki  63.8%   428Ki    .text
    7.3%   317Ki   0.0%       0    .strtab
    7.0%   304Ki   0.0%       0    .debug_line
    3.1%   133Ki  19.8%   133Ki    .eh_frame
    2.6%   110Ki   0.0%       0    .symtab
    1.5%  65.9Ki   0.0%       0    .debug_ranges


2. a)

     FILE SIZE        VM SIZE
  --------------  --------------
   58.3%   878Mi   0.0%       0    .debug_info
   11.8%   177Mi   0.0%       0    .debug_str
    7.7%   115Mi  62.2%   115Mi    .text
    7.7%   115Mi   0.0%       0    .debug_line
    6.0%  90.7Mi   0.0%       0    .strtab
    2.4%  35.4Mi   0.0%       0    .debug_ranges
    1.5%  23.3Mi  12.5%  23.3Mi    .eh_frame
    1.5%  23.0Mi  12.4%  23.0Mi    .rodata
    1.2%  17.9Mi   0.0%       0    .symtab

    b)

     FILE SIZE        VM SIZE
  --------------  --------------
   59.6%   807Mi   0.0%       0    .debug_info
   13.1%   177Mi   0.0%       0    .debug_str
    8.5%   115Mi  62.2%   115Mi    .text
    6.7%  90.7Mi   0.0%       0    .strtab
    4.2%  57.4Mi   0.0%       0    .debug_line
    1.7%  23.3Mi  12.5%  23.3Mi    .eh_frame
    1.7%  23.0Mi  12.4%  23.0Mi    .rodata
    1.3%  17.9Mi   0.0%       0    .symtab
    1.0%  13.0Mi   0.0%       0    .debug_ranges
    0.8%  10.6Mi   5.7%  10.6Mi    .dynstr

    c)

     FILE SIZE        VM SIZE
  --------------  --------------
   35.1%   293Mi   0.0%       0    .debug_info
   21.2%   177Mi   0.0%       0    .debug_str
   13.9%   115Mi  62.2%   115Mi    .text
   10.9%  90.7Mi   0.0%       0    .strtab
    6.9%  57.4Mi   0.0%       0    .debug_line
    2.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
    2.8%  23.0Mi  12.4%  23.0Mi    .rodata
    2.1%  17.9Mi   0.0%       0    .symtab
    1.5%  12.4Mi   0.0%       0    .debug_ranges
    1.3%  10.6Mi   5.7%  10.6Mi    .dynstr

    d)

     FILE SIZE        VM SIZE
  --------------  --------------
   58.3%   758Mi   0.0%       0    .debug_info
   13.6%   177Mi   0.0%       0    .debug_str
    8.9%   115Mi  62.2%   115Mi    .text
    7.0%  90.7Mi   0.0%       0    .strtab
    4.4%  57.4Mi   0.0%       0    .debug_line
    1.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
    1.8%  23.0Mi  12.4%  23.0Mi    .rodata
    1.4%  17.9Mi   0.0%       0    .symtab
    1.0%  12.4Mi   0.0%       0    .debug_ranges
    0.8%  10.6Mi   5.7%  10.6Mi    .dynstr


>
> I'm still trying to figure out the problems on my end to try running 
> your experiment on the game package I used in my presentation, but 
> have been interrupted by other unrelated issues. I'll try to get back 
> to this in the coming days.
>
> James
>
> On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>     Hi James,
>
>     I did experiments with the clang code base and will do experiments
>     with our local codebase later.
>     Overall, both solutions("Fragmented DWARF" and "DWARFLinker
>     without odr types deduplication") look having similar size savings
>     results for the final binary. "DWARFLinker with odr types
>     deduplication" has a bigger size saving effect. "Fragmented DWARF"
>     increases the size of original object files up to 15%.
>     LLD with "fragmented DWARF" works significantly faster than with
>     "DWARFLinker".
>
>     Following are the results for "llvm-strings" and "clang" binaries:
>
>     1. llvm-strings:
>
>        source object files size: 381M.
>        fragmented source object files size: 451M(18% increase).
>
>        a. upstream version,
>           command line options: --gc-sections
>           binary size: 6,5M
>           compilation time: 0:00.13 sec
>           run-time memory: 111kb
>
>        b. "fragmented DWARF" version,
>           command line options: --gc-sections --mark-live-pc=0.45
>           binary size: 3,7M
>           compilation time: 0:00.10 sec
>           run-time memory: 122kb
>
>        c. DWARFLinker version,
>           command line options: --gc-sections --gc-debuginfo
>           binary size: 3,8M
>           compilation time: 0:00.33 sec
>           run-time memory: 141kb
>
>        d. DWARFLinker no-odr version,
>           command line options: --gc-sections --gc-debuginfo
>     --gc-debuginfo-no-odr
>           binary size: 4,3M
>           compilation time: 0:00.38 sec
>           run-time memory: 142kb
>
>
>     2. clang:
>
>        source object files size: 6,5G.
>        fragmented source object files size: 7,3G(13% increase).
>
>        a. upstream version,
>           command line options: --gc-sections
>           binary size: 1,5G
>           compilation time: 6 sec
>           run-time memory: 9.7G
>
>        b. "fragmented DWARF" version,
>           command line options: --gc-sections --mark-live-pc=0.43
>           binary size: 1,1G
>           compilation time: 9 sec
>           run-time memory: 11G
>
>        c. DWARFLinker version,
>           command line options: --gc-sections --gc-debuginfo
>           binary size: 836M
>           compilation time: 62 sec
>           run-time memory: 15G
>
>        d. DWARFLinker no-odr version,
>           command line options: --gc-sections --gc-debuginfo
>     --gc-debuginfo-no-odr
>           binary size: 1,3G
>           compilation time: 128 sec
>           run-time memory: 17G
>
>     Detailed size results:
>
>     1. llvm-strings
>
>        a)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       41.1%  2.64Mi   0.0%       0    .debug_info
>       24.9%  1.60Mi   0.0%       0    .debug_str
>       12.6%   827Ki   0.0%       0    .debug_line
>        6.5%   428Ki  63.8%   428Ki    .text
>        4.8%   317Ki   0.0%       0    .strtab
>        3.4%   223Ki   0.0%       0    .debug_ranges
>        2.0%   133Ki  19.8%   133Ki    .eh_frame
>        1.7%   110Ki   0.0%       0    .symtab
>        1.2%  77.6Ki   0.0%       0    .debug_abbrev
>
>        b)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       50.3%  1.85Mi   0.0%       0    .debug_info
>       43.6%  1.60Mi   0.0%       0    .debug_str
>        2.6%  98.2Ki   0.0%       0    .debug_line
>        2.1%  77.6Ki   0.0%       0    .debug_abbrev
>        0.5%  17.5Ki  54.9%  17.4Ki    .text
>        0.3%  9.94Ki   0.0%       0    .strtab
>        0.2%  6.27Ki   0.0%       0    .symtab
>        0.1%  5.09Ki  15.9%  5.03Ki    .eh_frame
>        0.1%  3.28Ki   0.0%       0    .debug_ranges
>
>        c)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       33.0%  1.25Mi   0.0%       0    .debug_info
>       29.2%  1.11Mi   0.0%       0    .debug_str
>       11.0%   428Ki  63.8%   428Ki    .text
>        8.2%   317Ki   0.0%       0    .strtab
>        7.8%   304Ki   0.0%       0    .debug_line
>        3.4%   133Ki  19.8%   133Ki    .eh_frame
>        2.8%   110Ki   0.0%       0    .symtab
>        1.7%  65.9Ki   0.0%       0    .debug_ranges
>        1.0%  38.4Ki   5.7%  38.4Ki    .rodata
>
>        d)
>
>            FILE SIZE        VM SIZE
>      --------------  --------------
>       39.7%  1.68Mi   0.0%       0    .debug_info
>       26.3%  1.11Mi   0.0%       0    .debug_str
>        9.9%   428Ki  63.8%   428Ki    .text
>        7.3%   317Ki   0.0%       0    .strtab
>        7.0%   304Ki   0.0%       0    .debug_line
>        3.1%   133Ki  19.8%   133Ki    .eh_frame
>        2.6%   110Ki   0.0%       0    .symtab
>        1.5%  65.9Ki   0.0%       0    .debug_ranges
>
>
>     2. clang
>
>        a)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       58.3%   878Mi   0.0%       0    .debug_info
>       11.8%   177Mi   0.0%       0    .debug_str
>        7.7%   115Mi  62.2%   115Mi    .text
>        7.7%   115Mi   0.0%       0    .debug_line
>        6.0%  90.7Mi   0.0%       0    .strtab
>        2.4%  35.4Mi   0.0%       0    .debug_ranges
>        1.5%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        1.5%  23.0Mi  12.4%  23.0Mi    .rodata
>        1.2%  17.9Mi   0.0%       0    .symtab
>
>        b)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       71.5%   772Mi   0.0%       0    .debug_info
>       16.5%   177Mi   0.0%       0    .debug_str
>        3.7%  40.2Mi  59.2%  40.2Mi    .text
>        2.4%  25.8Mi   0.0%       0    .debug_line
>        2.1%  23.0Mi   0.0%       0    .strtab
>        1.0%  10.6Mi  15.6%  10.6Mi    .dynstr
>        0.7%  7.18Mi  10.6%  7.18Mi    .eh_frame
>        0.5%  5.60Mi   0.0%       0    .symtab
>        0.4%  4.28Mi   0.0%       0    .debug_ranges
>        0.4%  4.04Mi   0.0%       0    .debug_abbrev
>
>
>        c)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       35.1%   293Mi   0.0%       0    .debug_info
>       21.2%   177Mi   0.0%       0    .debug_str
>       13.9%   115Mi  62.2%   115Mi    .text
>       10.9%  90.7Mi   0.0%       0    .strtab
>        6.9%  57.4Mi   0.0%       0    .debug_line
>        2.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        2.8%  23.0Mi  12.4%  23.0Mi    .rodata
>        2.1%  17.9Mi   0.0%       0    .symtab
>        1.5%  12.4Mi   0.0%       0    .debug_ranges
>        1.3%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>        d)
>
>         FILE SIZE        VM SIZE
>      --------------  --------------
>       58.3%   758Mi   0.0%       0    .debug_info
>       13.6%   177Mi   0.0%       0    .debug_str
>        8.9%   115Mi  62.2%   115Mi    .text
>        7.0%  90.7Mi   0.0%       0    .strtab
>        4.4%  57.4Mi   0.0%       0    .debug_line
>        1.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>        1.8%  23.0Mi  12.4%  23.0Mi    .rodata
>        1.4%  17.9Mi   0.0%       0    .symtab
>        1.0%  12.4Mi   0.0%       0    .debug_ranges
>        0.8%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>     Thank you, Alexey.
>
>     On 19.10.2020 11:50, James Henderson wrote:
>>     Great, thanks Alexey! I'll try to take a look at this in the near
>>     future, and will report my results back here. I imagine our clang
>>     results will differ, purely because we probably used different
>>     toolchains to build the input in the first place.
>>
>>     On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
>>     <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote:
>>
>>
>>         On 13.10.2020 10:20, James Henderson wrote:
>>>         The script included in the patch can be used to convert an
>>>         object containing normal DWARF into an object using
>>>         fragmented DWARF. It does this by using llvm-dwarfdump to
>>>         dump the various sections, parses the output to identify
>>>         where it should split (using the offsets of the various
>>>         entries), and then writes new section headers accordingly -
>>>         you can see roughly what it's doing if you get a chance to
>>>         watch the talk recording. The additional section headers are
>>>         appended to the end of the ELF section header table, whilst
>>>         the original DWARF is left in the same place it was before
>>>         (making use of the fact that section headers don't have to
>>>         appear in offset order). The script also parses and
>>>         fragments the relocation sections targeting the DWARF
>>>         sections so that they match up with the fragmented DWARF
>>>         sections. This is clearly all suboptimal - in practice the
>>>         compiler should be modified to do the fragmenting upfront,
>>>         to save having to parse a tool's stdout, but that was just
>>>         the simplest thing I could come up with to quickly write the
>>>         script. Full details of the script usage are included in the
>>>         patch description, if you want to play around with it.
>>>
>>>         If Alexey could point me at the latest version of his patch,
>>>         I'd be happy to run that through either or both of the
>>>         packages I used to see what happens. Equally, I'd be happy
>>>         if Alexey is able to run my script to fragment and measure
>>>         the performance of a couple of projects he's been working
>>>         with. Based purely on the two packages I've tried this with,
>>>         I can tell already that the results can vary wildly. My
>>>         expectation is that Alexey's approach will be slower (at
>>>         least in its current form, but probably more generally), but
>>>         produce smaller output, but to what scale I have no idea.
>>
>>         James, I updated the patch - https://reviews.llvm.org/D74169.
>>
>>         To make it working it is necessary to build example with
>>         -ffunction-sections and specify following options to the linker :
>>
>>         --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>>
>>         For clang binary I got following results:
>>
>>         1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>>
>>         2. --gc-sections --gc-debuginfo = binary size 840M, 8x
>>         performance decrease, Debug Info size 542M
>>
>>         3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr =
>>         binary size 1,3G, 16x performance decrease, Debug Info size 1G
>>
>>         (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>>
>>
>>         I added option --gc-debuginfo-no-odr, so that size reduction
>>         could be compared correctly. Without that option D74169 does
>>         types deduplication and then it is not correct to compare
>>         resulting size with "Fragmented DWARF" solution which does
>>         not do types deduplication.
>>
>>         Also, I look at your D89229 <https://reviews.llvm.org/D89229>
>>         and would share results some time later.
>>
>>         Thank you, Alexey.
>>
>>>
>>>         I think linkers parse .eh_frame partly because they have no
>>>         other choice. That being said, I think it's format is not
>>>         too complex, so similarly the parser isn't too complex. You
>>>         can see LLD's ELF implementation in ELF/EhFrame.cpp, how it
>>>         is used in ELF/InputSection.cpp (see the bits to do with
>>>         EhInputSection) and EhFrameSection in
>>>         ELF/SyntheticSections.h (plus various usages of these two
>>>         throughout the LLD code). I think the key to any structural
>>>         changes in the DWARF format to make them more amenable to
>>>         link-time parsing is being able to read a minimal amount
>>>         without needing to parse the payload (e.g. a length field,
>>>         some sort of type, and then using the relocations to
>>>         associate it accordingly).
>>>
>>>         James
>>>
>>>         On Mon, 12 Oct 2020 at 20:48, David Blaikie
>>>         <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>>>
>>>             Awesome! Sorry I missed the lightning talk, but really
>>>             interested to see this sort of thing (though it's not
>>>             directly/immediately applicable to the use case I work
>>>             with - Split DWARF, something similar could be used
>>>             there with further work)
>>>
>>>             Though it looks like the patch has mostly linker changes
>>>             - where/how do you generate the fragmented DWARF to
>>>             begin with? Via the Python script? Run over assembly?
>>>             I'd be surprised if it was achievable that way - curious
>>>             to know more.
>>>
>>>             Got a rough sense/are you able to run apples-to-apples
>>>             comparisons with Alexey's linker-based patches to
>>>             compare linker time/memory overhead versus resulting
>>>             output size gains?
>>>
>>>             (& yeah, I'm a bit curious about how the linkers do
>>>             eh_frame rewriting, if the format is especially amenable
>>>             to a lightweight parsing/rewriting and how we could make
>>>             the DWARF more amenable to that too)
>>>
>>>             On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>>             <jh7370.2008 at my.bristol.ac.uk
>>>             <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>>
>>>                 Hi all,
>>>
>>>                 At the recent LLVM developers' meeting, I presented
>>>                 a lightning talk on an approach to reduce the amount
>>>                 of dead debug data left in an executable following
>>>                 operations such as --gc-sections and duplicate
>>>                 COMDAT removal. In that presentation, I presented
>>>                 some figures based on linking a game that had been
>>>                 built by our downstream clang port and fragmented
>>>                 using the described approach. Since recording the
>>>                 presentation, I ran the same experiment on a clang
>>>                 package (this time built with a GCC version). The
>>>                 comparable figures are below:
>>>
>>>                 Link-time speed (s):
>>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>>                 | Package variant    | No GC | GC 1 (normal) | GC 2
>>>                 | GC 3 | GC 4 | GC 5 | GC 6 |
>>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>>                 | Game (plain)       |  4.5  | 4.9          |  4.2
>>>                 |  3.6 |  3.4 | 3.3 |  3.2 |
>>>                 | Game (fragmented)  | 11.1  | 11.8    |  9.7 |  8.6
>>>                 |  7.9 |  7.7 | 7.5 |
>>>                 | Clang (plain)      | 13.9  | 17.9 | 17.0 | 16.7 |
>>>                 16.3 | 16.2 | 16.1 |
>>>                 | Clang (fragmented) | 18.6  | 22.8 | 21.6 | 21.1 |
>>>                 20.8 | 20.5 | 20.2 |
>>>                 +--------------------+-------+---------------+------+------+------+------+------+
>>>
>>>                 Output size - Game package (MB):
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>>                 GC 4 | GC 5 | GC 6 |
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>                 | Plain (total)       | 1149  | 1121 | 1017 |  965
>>>                 |  938 |  930 |  928 |
>>>                 | Plain (DWARF*)      |  845  |  845 |  845 |  845
>>>                 |  845 |  845 |  845 |
>>>                 | Plain (other)       |  304  |  276 |  172 |  120
>>>                 |   93 |   85 |   82 |
>>>                 | Fragmented (total)  | 1044  |  940 |  556 |  373
>>>                 |  287 |  263 |  255 |
>>>                 | Fragmented (DWARF*) |  740  |  664 |  384 |  253
>>>                 |  194 |  178 |  173 |
>>>                 | Fragmented (other)  |  304  |  276 |  172 |  120
>>>                 |   93 |   85 |   82 |
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>
>>>
>>>                 Output size - Clang (MB):
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>>                 GC 4 | GC 5 | GC 6 |
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>                 | Plain (total)       | 2596  | 2546 | 2406 | 2332 |
>>>                 2293 | 2273 | 2251 |
>>>                 | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 |
>>>                 1979 | 1979 | 1979 |
>>>                 | Plain (other)       |  616  |  567 |  426 |  353
>>>                 |  314 |  294 |  272 |
>>>                 | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 |
>>>                 2017 | 1990 | 1963 |
>>>                 | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 |
>>>                 1703 | 1696 | 1691 |
>>>                 | Fragmented (other)  |  616  |  567 |  426 |  353
>>>                 |  314 |  294 |  272 |
>>>                 +---------------------+-------+------+------+------+------+------+------+
>>>
>>>                 *DWARF size == total size of .debug_info +
>>>                 .debug_line + .debug_ranges + .debug_aranges +
>>>                 .debug_loc
>>>
>>>                 Additionally, I have posted
>>>                 https://reviews.llvm.org/D89229 which provides the
>>>                 python script and linker patches used to reproduce
>>>                 the above results on my machine. The GC 1/2/3/4/5/6
>>>                 correspond to the linker option added in that patch
>>>                 --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>>                 respectively.
>>>
>>>                 During the conference, the question was asked what
>>>                 the memory usage and input size impact was. I've
>>>                 summarised these below:
>>>
>>>                 Input file size total (GB):
>>>                 +--------------------+------------+
>>>                 | Package variant    | Total Size |
>>>                 +--------------------+------------+
>>>                 | Game (plain)       |     2.9    |
>>>                 | Game (fragmented)  |     4.2    |
>>>                 | Clang (plain)      |    10.9    |
>>>                 | Clang (fragmented) |    12.3    |
>>>                 +--------------------+------------+
>>>
>>>                 Peak Working Set Memory usage (GB):
>>>                 +--------------------+-------+------+
>>>                 | Package variant    | No GC | GC 1 |
>>>                 +--------------------+-------+------+
>>>                 | Game (plain)       |  4.3  |  4.7 |
>>>                 | Game (fragmented)  |  8.9  |  8.6 |
>>>                 | Clang (plain)      | 15.7  | 15.6 |
>>>                 | Clang (fragmented) | 19.4  | 19.2 |
>>>                 +--------------------+-------+------+
>>>
>>>                 I'm keen to hear what people's feedback is, and also
>>>                 interested to see what results others might see by
>>>                 running this experiment on other input packages.
>>>                 Also, if anybody has any alternative ideas that meet
>>>                 the goals listed below, I'd love to hear them!
>>>
>>>                 To reiterate some key goals of fragmented DWARF,
>>>                 similar to what I said in the presentation:
>>>                 1) Devise a scheme that gives significant size
>>>                 savings without being too costly. It's clear from
>>>                 just the two packages I've tried this on that there
>>>                 is a fairly hefty link time performance cost,
>>>                 although the exact cost depends on the nature of the
>>>                 input package. On the other hand, depending on the
>>>                 nature of the input package, there can also be some
>>>                 big gains.
>>>                 2) Devise a scheme that doesn't require any linker
>>>                 knowledge of DWARF. The current approach doesn't
>>>                 quite achieve this properly due to the slight misuse
>>>                 of SHF_LINK_ORDER, but I expect that a pivot to
>>>                 using non-COMDAT group sections should solve this
>>>                 problem.
>>>                 3) Provide some kind of halfway house between simply
>>>                 writing tombstone values into dead DWARF and fully
>>>                 parsing the DWARF to reoptimise its/discard the dead
>>>                 bits.
>>>
>>>                 I'm hopeful that changes could be made to the linker
>>>                 to improve the link-time cost. There seems to be a
>>>                 significant amount of the link time spent creating
>>>                 the input sections. An alternative would be to
>>>                 devise a scheme that would avoid the literal
>>>                 splitting into section headers, in favour of some
>>>                 sort of list of split-points that the linker uses to
>>>                 split things up (a bit like it already does for
>>>                 .eh_frame or mergeable sections).
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201104/0f52a5d2/attachment-0001.html>


More information about the llvm-dev mailing list