<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi James, <br>
      <br>
      Thank you very much for the information.<br>
      According to the first problem: Could you send me a clang build
      configuration that you used so that I could reproduce the problem,
      please? <br>
    </p>
    <p>For the second problem: yes, I built the experiment with
      -ffunction-sections -fdata-sections.<br>
      According to the error message, it seems, that address ranges were
      read incorrectly.<br>
      As a quick guess, Could it be that incorrect address ranges are
      marked with -1/-2 value? Then they might be handled incorrectly,
      since this patch does not support(and was not tested) with
      LowPC>HighPC case. The simplest solution would be not to use
      -1/-2 values with this patch. <br>
    </p>
    <p>Thank you, Alexey.<br>
    </p>
    <div class="moz-cite-prefix">On 29.10.2020 13:52, James Henderson
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CABqSp3n5gfmQFQZ2Wq36CKX0_GE+fJxKkUmc-oRbEk3i=xqQnw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>Hi Alexey,</div>
        <div><br>
        </div>
        <div>I've just started looking at running your patch on the
          clang and game packages I used for the Fragmented DWARF
          experiment, and on both occasions, I got "warning: Generated
          debug info is broken" near the end of the link. Digging
          further, the actual error this represented (for the clang
          case) was "invalid e_shentsize in ELF header: 16912" (aside:
          there are several Expected instances around where the former
          warning was reported which are being thrown away and will
          cause assertions under the right configuration). I don't
          really follow the code enough to understand whether this is a
          bug in the code or possibly some weird interaction with our
          downstream patches (I don't expect the latter, for the clang
          build, as our patches are supposed to be a no-op when not
          using our target). I'll check what happens with the clang
          package if I try using a completely vanilla LLVM with your
          patch applied.</div>
        <div><br>
        </div>
        <div>I also got a large number of "no mapping for range"
          warnings when linking the game package. I tried debugging the
          code in the area, but the data types are all difficult to
          debug, and I don't really understand the relevant area of code
          enough to be able to theorise what actually is causing this.
          llvm-dwarfdump --verify doesn't flag up any issues, and
          there's nothing obviously broken looking at the dump of the
          debug data either. Any pointers as to what might be going
          wrong would be appreciated. I assume with your experiments
          that you build with -ffunction-sections/-fdata-sections for
          maximum GC opportunities?</div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div><br>
        </div>
        <div>James<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, 19 Oct 2020 at 09:50,
          James Henderson <<a
            href="mailto:jh7370.2008@my.bristol.ac.uk"
            moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">Great, thanks Alexey! I'll try to take a look
            at this in the near future, and will report my results back
            here. I imagine our clang results will differ, purely
            because we probably used different toolchains to build the
            input in the first place.<br>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Thu, 15 Oct 2020 at
              10:08, Alexey Lapshin <<a
                href="mailto:avl.lapshin@gmail.com" target="_blank"
                moz-do-not-send="true">avl.lapshin@gmail.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>
                <p><br>
                </p>
                <div>On 13.10.2020 10:20, James Henderson wrote:<br>
                </div>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div>The script included in the patch can be used to
                      convert an object containing normal DWARF into an
                      object using fragmented DWARF. It does this by
                      using llvm-dwarfdump to dump the various sections,
                      parses the output to identify where it should
                      split (using the offsets of the various entries),
                      and then writes new section headers accordingly -
                      you can see roughly what it's doing if you get a
                      chance to watch the talk recording. The additional
                      section headers are appended to the end of the ELF
                      section header table, whilst the original DWARF is
                      left in the same place it was before (making use
                      of the fact that section headers don't have to
                      appear in offset order). The script also parses
                      and fragments the relocation sections targeting
                      the DWARF sections so that they match up with the
                      fragmented DWARF sections. This is clearly all
                      suboptimal - in practice the compiler should be
                      modified to do the fragmenting upfront, to save
                      having to parse a tool's stdout, but that was just
                      the simplest thing I could come up with to quickly
                      write the script. Full details of the script usage
                      are included in the patch description, if you want
                      to play around with it.</div>
                    <div><br>
                    </div>
                    <div>If Alexey could point me at the latest version
                      of his patch, I'd be happy to run that through
                      either or both of the packages I used to see what
                      happens. Equally, I'd be happy if Alexey is able
                      to run my script to fragment and measure the
                      performance of a couple of projects he's been
                      working with. Based purely on the two packages
                      I've tried this with, I can tell already that the
                      results can vary wildly. My expectation is that
                      Alexey's approach will be slower (at least in its
                      current form, but probably more generally), but
                      produce smaller output, but to what scale I have
                      no idea.<br>
                    </div>
                  </div>
                </blockquote>
                <p>James, I updated the patch - <a
                    href="https://reviews.llvm.org/D74169"
                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D74169</a>.</p>
                <p>To make it working it is necessary to build example
                  with -ffunction-sections and specify following options
                  to the linker :</p>
                <p>--gc-sections --gc-debuginfo --gc-debuginfo-no-odr</p>
                <p>For clang binary I got following results:</p>
                <p>1. --gc-sections = binary size 1,5G, Debug Info
                  size(*)1.2G</p>
                <p>2. --gc-sections --gc-debuginfo = binary size 840M,
                  8x performance decrease, Debug Info size 542M<br>
                </p>
                <p>3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
                  = binary size 1,3G, 16x performance decrease, Debug
                  Info size 1G<br>
                </p>
                <p>(*)
                  .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc<br>
                </p>
                <p><br>
                </p>
                <p>I added option --gc-debuginfo-no-odr, so that size
                  reduction could be compared correctly. Without that
                  option D74169 does types deduplication and then it is
                  not correct to compare resulting size with "Fragmented
                  DWARF" solution which does not do types deduplication.<br>
                </p>
                <p>Also, I look at your <font face="monospace"><font
                      face="arial,sans-serif"><a
                        href="https://reviews.llvm.org/D89229"
                        target="_blank" moz-do-not-send="true">D89229</a>
                      and would share results some time later.<br>
                    </font></font></p>
                <p>Thank you, Alexey.<br>
                </p>
                <blockquote type="cite">
                  <div dir="ltr">
                    <div><br>
                    </div>
                    <div>I think linkers parse .eh_frame partly because
                      they have no other choice. That being said, I
                      think it's format is not too complex, so similarly
                      the parser isn't too complex. You can see LLD's
                      ELF implementation in ELF/EhFrame.cpp, how it is
                      used in ELF/InputSection.cpp (see the bits to do
                      with EhInputSection) and EhFrameSection in
                      ELF/SyntheticSections.h (plus various usages of
                      these two throughout the LLD code). I think the
                      key to any structural changes in the DWARF format
                      to make them more amenable to link-time parsing is
                      being able to read a minimal amount without
                      needing to parse the payload (e.g. a length field,
                      some sort of type, and then using the relocations
                      to associate it accordingly).</div>
                    <div><br>
                    </div>
                    <div>James<br>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Mon, 12 Oct
                      2020 at 20:48, David Blaikie <<a
                        href="mailto:dblaikie@gmail.com" target="_blank"
                        moz-do-not-send="true">dblaikie@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">Awesome! Sorry I missed the
                        lightning talk, but really interested to see
                        this sort of thing (though it's not
                        directly/immediately applicable to the use case
                        I work with - Split DWARF, something similar
                        could be used there with further work)<br>
                        <br>
                        Though it looks like the patch has mostly linker
                        changes - where/how do you generate the
                        fragmented DWARF to begin with? Via the Python
                        script? Run over assembly? I'd be surprised if
                        it was achievable that way - curious to know
                        more.<br>
                        <br>
                        Got a rough sense/are you able to run
                        apples-to-apples comparisons with Alexey's
                        linker-based patches to compare linker
                        time/memory overhead versus resulting output
                        size gains?<br>
                        <br>
                        (& yeah, I'm a bit curious about how the
                        linkers do eh_frame rewriting, if the format is
                        especially amenable to a lightweight
                        parsing/rewriting and how we could make the
                        DWARF more amenable to that too)</div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Mon, Oct
                          12, 2020 at 6:41 AM James Henderson <<a
                            href="mailto:jh7370.2008@my.bristol.ac.uk"
                            target="_blank" moz-do-not-send="true">jh7370.2008@my.bristol.ac.uk</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
                          0.8ex;border-left:1px solid
                          rgb(204,204,204);padding-left:1ex">
                          <div dir="ltr">
                            <div>Hi all,</div>
                            <div><br>
                            </div>
                            <div>At the recent LLVM developers' meeting,
                              I presented a lightning talk on an
                              approach to reduce the amount of dead
                              debug data left in an executable following
                              operations such as --gc-sections and
                              duplicate COMDAT removal. In that
                              presentation, I presented some figures
                              based on linking a game that had been
                              built by our downstream clang port and
                              fragmented using the described approach.
                              Since recording the presentation, I ran
                              the same experiment on a clang package
                              (this time built with a GCC version). The
                              comparable figures are below:</div>
                            <div><br>
                            </div>
                            <div>Link-time speed (s):</div>
                            <div><span style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
                            </div>
                            <div><font face="monospace">| Package
                                variant    | No GC | GC 1 (normal) | GC
                                2 | GC 3 | GC 4 | GC 5 | GC 6 |</font></div>
                            <div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
                              </font></div>
                            <div><font face="monospace">| Game
                                (plain)       |  4.5  |  4.9          | 
                                4.2 |  3.6 |  3.4 |  3.3 |  3.2 |<br>
                              </font></div>
                            <div><font face="monospace">| Game
                                (fragmented)  | 11.1  | 11.8          | 
                                9.7 |  8.6 |  7.9 |  7.7 |  7.5 |<br>
                              </font></div>
                            <div><font face="monospace">| Clang
                                (plain)      | 13.9  | 17.9          |
                                17.0 | 16.7 | 16.3 | 16.2 | 16.1 |<br>
                              </font></div>
                            <div><font face="monospace">| Clang
                                (fragmented) | 18.6  | 22.8          |
                                21.6 | 21.1 | 20.8 | 20.5 | 20.2 |</font></div>
                            <div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
                            <div><font face="monospace"><br>
                              </font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif">Output size -
                                  Game package (MB):</font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif"><span
                                    style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
                                </font></font></div>
                            <div><span style="font-family:monospace">|
                                Category            | No GC | GC 1 | GC
                                2 | GC 3 | GC 4 | GC 5 | GC 6 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Plain (total)       | 1149  | 1121 |
                                1017 |  965 |  938 |  930 |  928 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Plain (DWARF*)      |  845  |  845 | 
                                845 |  845 |  845 |  845 |  845 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Plain (other)       |  304  |  276 | 
                                172 |  120 |   93 |   85 |   82 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Fragmented (total)  | 1044  |  940 | 
                                556 |  373 |  287 |  263 |  255 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Fragmented (DWARF*) |  740  |  664 | 
                                384 |  253 |  194 |  178 |  173 |<br>
                              </span></div>
                            <div><span style="font-family:monospace">|
                                Fragmented (other)  |  304  |  276 | 
                                172 |  120 |   93 |   85 |   82 |<br>
                              </span></div>
                            <div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
                              </font>
                              <div><font face="monospace"><br>
                                </font></div>
                              <div><font face="monospace"><font
                                    face="arial,sans-serif">Output size
                                    - Clang (MB):</font></font></div>
                              <div><font face="monospace"><font
                                    face="arial,sans-serif"><span
                                      style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
                                  </font></font></div>
                              <div><span style="font-family:monospace">|
                                  Category            | No GC | GC 1 |
                                  GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Plain (total)       | 2596  | 2546 |
                                  2406 | 2332 | 2293 | 2273 | 2251 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Plain (DWARF*)      | 1979  | 1979 |
                                  1979 | 1979 | 1979 | 1979 | 1979 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Plain (other)       |  616  |  567 | 
                                  426 |  353 |  314 |  294 |  272 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Fragmented (total)  | 2397  | 2346 |
                                  2164 | 2069 | 2017 | 1990 | 1963 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Fragmented (DWARF*) | 1780  | 1780 |
                                  1738 | 1716 | 1703 | 1696 | 1691 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">|
                                  Fragmented (other)  |  616  |  567 | 
                                  426 |  353 |  314 |  294 |  272 |<br>
                                </span></div>
                              <div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
                            </div>
                            <div><font face="monospace"><br>
                              </font></div>
                            <div><font face="monospace">*DWARF size ==
                                total size of .debug_info + .debug_line
                                + .debug_ranges + .debug_aranges +
                                .debug_loc<br>
                              </font></div>
                            <div><font face="monospace"><br>
                              </font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif">Additionally,
                                  I have posted <a
                                    href="https://reviews.llvm.org/D89229"
                                    target="_blank"
                                    moz-do-not-send="true">https://reviews.llvm.org/D89229</a>
                                  which provides the python script and
                                  linker patches used to reproduce the
                                  above results on my machine. The GC
                                  1/2/3/4/5/6 correspond to the linker
                                  option added in that patch
                                  --mark-live-pc with values
                                  1/0.8/0.6/0.4/0.2/0 respectively.<br>
                                </font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif"><br>
                                </font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif">During the
                                  conference, the question was asked
                                  what the memory usage and input size
                                  impact was. I've summarised these
                                  below:</font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif"><br>
                                </font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif">Input file
                                  size total (GB):</font></font></div>
                            <div><font face="monospace"><font
                                  face="arial,sans-serif"> <span
                                    style="font-family:monospace">+--------------------+------------+
                                  </span></font></font></div>
                            <div><span style="font-family:monospace"> |
                                Package variant    | Total Size | <br>
                              </span></div>
                            <div><span style="font-family:monospace">
                                +--------------------+------------+<br>
                              </span></div>
                            <div><span style="font-family:monospace"> |
                                Game (plain)       |     2.9    |    <br>
                              </span></div>
                            <div><span style="font-family:monospace"> |
                                Game (fragmented)  |     4.2    |<br>
                              </span></div>
                            <div><span style="font-family:monospace"> |
                                Clang (plain)      |    10.9    |<br>
                              </span></div>
                            <div><span style="font-family:monospace"> |
                                Clang (fragmented) |    12.3    |<br>
                              </span></div>
                            <div><span style="font-family:monospace">
                                +--------------------+------------+</span></div>
                            <div><span style="font-family:monospace"><br>
                              </span></div>
                            <div><span style="font-family:monospace"><span
                                  style="font-family:arial,sans-serif">Peak
                                  Working Set Memory usage (GB):</span><br>
                              </span></div>
                            <div><span style="font-family:monospace"> </span>
                              <div><font face="monospace"><font
                                    face="arial,sans-serif"><span
                                      style="font-family:monospace">+--------------------+-------+------+
                                    </span></font></font></div>
                              <div><span style="font-family:monospace">
                                  | Package variant    | No GC | GC 1 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  +--------------------+-------+------+<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  | Game (plain)       |  4.3  |  4.7 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  | Game (fragmented)  |  8.9  |  8.6 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  | Clang (plain)      | 15.7  | 15.6 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  | Clang (fragmented) | 19.4  | 19.2 |<br>
                                </span></div>
                              <div><span style="font-family:monospace">
                                  +--------------------+-------+------+</span></div>
                              <div><span style="font-family:monospace"><br>
                                </span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">I'm keen to
                                    hear what people's feedback is, and
                                    also interested to see what results
                                    others might see by running this
                                    experiment on other input packages.
                                    Also, if anybody has any alternative
                                    ideas that meet the goals listed
                                    below, I'd love to hear them!<br>
                                  </font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif"><br>
                                  </font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">To reiterate
                                    some key goals of fragmented DWARF,
                                    similar to what I said in the
                                    presentation:</font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">1) Devise a
                                    scheme that gives significant size
                                    savings without being too costly.
                                    It's clear from just the two
                                    packages I've tried this on that
                                    there is a fairly hefty link time
                                    performance cost, although the exact
                                    cost depends on the nature of the
                                    input package. On the other hand,
                                    depending on the nature of the input
                                    package, there can also be some big
                                    gains.<br>
                                  </font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">2) Devise a
                                    scheme that doesn't require any
                                    linker knowledge of DWARF. The
                                    current approach doesn't quite
                                    achieve this properly due to the
                                    slight misuse of SHF_LINK_ORDER, but
                                    I expect that a pivot to using
                                    non-COMDAT group sections should
                                    solve this problem.</font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">3) Provide
                                    some kind of halfway house between
                                    simply writing tombstone values into
                                    dead DWARF and fully parsing the
                                    DWARF to reoptimise its/discard the
                                    dead bits.<br>
                                  </font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif"><br>
                                  </font></span></div>
                              <div><span style="font-family:monospace"><font
                                    face="arial,sans-serif">I'm hopeful
                                    that changes could be made to the
                                    linker to improve the link-time
                                    cost. There seems to be a
                                    significant amount of the link time
                                    spent creating the input sections.
                                    An alternative would be to devise a
                                    scheme that would avoid the literal
                                    splitting into section headers, in
                                    favour of some sort of list of
                                    split-points that the linker uses to
                                    split things up (a bit like it
                                    already does for .eh_frame or
                                    mergeable sections).</font><br>
                                </span></div>
                              <span style="font-family:monospace"> </span></div>
                          </div>
                        </blockquote>
                      </div>
                    </blockquote>
                  </div>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>