<div dir="ltr"><div>Hi Alexey,</div><div><br></div><div>I've just started looking at running your patch on the clang and game packages I used for the Fragmented DWARF experiment, and on both occasions, I got "warning: Generated debug info is broken" near the end of the link. Digging further, the actual error this represented (for the clang case) was "invalid e_shentsize in ELF header: 16912" (aside: there are several Expected instances around where the former warning was reported which are being thrown away and will cause assertions under the right configuration). I don't really follow the code enough to understand whether this is a bug in the code or possibly some weird interaction with our downstream patches (I don't expect the latter, for the clang build, as our patches are supposed to be a no-op when not using our target). I'll check what happens with the clang package if I try using a completely vanilla LLVM with your patch applied.</div><div><br></div><div>I also got a large number of "no mapping for range" warnings when linking the game package. I tried debugging the code in the area, but the data types are all difficult to debug, and I don't really understand the relevant area of code enough to be able to theorise what actually is causing this. llvm-dwarfdump --verify doesn't flag up any issues, and there's nothing obviously broken looking at the dump of the debug data either. Any pointers as to what might be going wrong would be appreciated. I assume with your experiments that you build with -ffunction-sections/-fdata-sections for maximum GC opportunities?</div><div><br></div><div>Thanks,</div><div><br></div><div>James<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 19 Oct 2020 at 09:50, James Henderson <<a href="mailto:jh7370.2008@my.bristol.ac.uk">jh7370.2008@my.bristol.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Great, thanks Alexey! I'll try to take a look at this in the near future, and will report my results back here. I imagine our clang results will differ, purely because we probably used different toolchains to build the input in the first place.<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 13.10.2020 10:20, James Henderson
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>The script included in the patch can be used to convert an
          object containing normal DWARF into an object using fragmented
          DWARF. It does this by using llvm-dwarfdump to dump the
          various sections, parses the output to identify where it
          should split (using the offsets of the various entries), and
          then writes new section headers accordingly - you can see
          roughly what it's doing if you get a chance to watch the talk
          recording. The additional section headers are appended to the
          end of the ELF section header table, whilst the original DWARF
          is left in the same place it was before (making use of the
          fact that section headers don't have to appear in offset
          order). The script also parses and fragments the relocation
          sections targeting the DWARF sections so that they match up
          with the fragmented DWARF sections. This is clearly all
          suboptimal - in practice the compiler should be modified to do
          the fragmenting upfront, to save having to parse a tool's
          stdout, but that was just the simplest thing I could come up
          with to quickly write the script. Full details of the script
          usage are included in the patch description, if you want to
          play around with it.</div>
        <div><br>
        </div>
        <div>If Alexey could point me at the latest version of his
          patch, I'd be happy to run that through either or both of the
          packages I used to see what happens. Equally, I'd be happy if
          Alexey is able to run my script to fragment and measure the
          performance of a couple of projects he's been working with.
          Based purely on the two packages I've tried this with, I can
          tell already that the results can vary wildly. My expectation
          is that Alexey's approach will be slower (at least in its
          current form, but probably more generally), but produce
          smaller output, but to what scale I have no idea.<br>
        </div>
      </div>
    </blockquote>
    <p>James, I updated the patch - <a href="https://reviews.llvm.org/D74169" target="_blank">https://reviews.llvm.org/D74169</a>.</p>
    <p>To make it working it is necessary to build example with
      -ffunction-sections and specify following options to the linker :</p>
    <p>--gc-sections --gc-debuginfo --gc-debuginfo-no-odr</p>
    <p>For clang binary I got following results:</p>
    <p>1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G</p>
    <p>2. --gc-sections --gc-debuginfo = binary size 840M, 8x
      performance decrease, Debug Info size 542M<br>
    </p>
    <p>3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary
      size 1,3G, 16x performance decrease, Debug Info size 1G<br>
    </p>
    <p>(*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc<br>
    </p>
    <p><br>
    </p>
    <p>I added option --gc-debuginfo-no-odr, so that size reduction
      could be compared correctly. Without that option D74169 does types
      deduplication and then it is not correct to compare resulting size
      with "Fragmented DWARF" solution which does not do types
      deduplication.<br>
    </p>
    <p>Also, I look at your <font face="monospace"><font face="arial,sans-serif"><a href="https://reviews.llvm.org/D89229" target="_blank">D89229</a>
          and would share results some time later.<br>
        </font></font></p>
    <p>Thank you, Alexey.<br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>I think linkers parse .eh_frame partly because they have no
          other choice. That being said, I think it's format is not too
          complex, so similarly the parser isn't too complex. You can
          see LLD's ELF implementation in ELF/EhFrame.cpp, how it is
          used in ELF/InputSection.cpp (see the bits to do with
          EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
          (plus various usages of these two throughout the LLD code). I
          think the key to any structural changes in the DWARF format to
          make them more amenable to link-time parsing is being able to
          read a minimal amount without needing to parse the payload
          (e.g. a length field, some sort of type, and then using the
          relocations to associate it accordingly).</div>
        <div><br>
        </div>
        <div>James<br>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, 12 Oct 2020 at 20:48,
          David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">Awesome! Sorry I missed the lightning talk, but
            really interested to see this sort of thing (though it's not
            directly/immediately applicable to the use case I work with
            - Split DWARF, something similar could be used there with
            further work)<br>
            <br>
            Though it looks like the patch has mostly linker changes -
            where/how do you generate the fragmented DWARF to begin
            with? Via the Python script? Run over assembly? I'd be
            surprised if it was achievable that way - curious to know
            more.<br>
            <br>
            Got a rough sense/are you able to run apples-to-apples
            comparisons with Alexey's linker-based patches to compare
            linker time/memory overhead versus resulting output size
            gains?<br>
            <br>
            (& yeah, I'm a bit curious about how the linkers do
            eh_frame rewriting, if the format is especially amenable to
            a lightweight parsing/rewriting and how we could make the
            DWARF more amenable to that too)</div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Mon, Oct 12, 2020 at
              6:41 AM James Henderson <<a href="mailto:jh7370.2008@my.bristol.ac.uk" target="_blank">jh7370.2008@my.bristol.ac.uk</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">
                <div>Hi all,</div>
                <div><br>
                </div>
                <div>At the recent LLVM developers' meeting, I presented
                  a lightning talk on an approach to reduce the amount
                  of dead debug data left in an executable following
                  operations such as --gc-sections and duplicate COMDAT
                  removal. In that presentation, I presented some
                  figures based on linking a game that had been built by
                  our downstream clang port and fragmented using the
                  described approach. Since recording the presentation,
                  I ran the same experiment on a clang package (this
                  time built with a GCC version). The comparable figures
                  are below:</div>
                <div><br>
                </div>
                <div>Link-time speed (s):</div>
                <div><span style="font-family:monospace">+--------------------+-------+---------------+------+------+------+------+------+</span><br>
                </div>
                <div><font face="monospace">| Package variant    | No GC
                    | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC 5 | GC 6 |</font></div>
                <div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+<br>
                  </font></div>
                <div><font face="monospace">| Game (plain)       |  4.5 
                    |  4.9          |  4.2 |  3.6 |  3.4 |  3.3 |  3.2 |<br>
                  </font></div>
                <div><font face="monospace">| Game (fragmented)  | 11.1 
                    | 11.8          |  9.7 |  8.6 |  7.9 |  7.7 |  7.5 |<br>
                  </font></div>
                <div><font face="monospace">| Clang (plain)      | 13.9 
                    | 17.9          | 17.0 | 16.7 | 16.3 | 16.2 | 16.1 |<br>
                  </font></div>
                <div><font face="monospace">| Clang (fragmented) | 18.6 
                    | 22.8          | 21.6 | 21.1 | 20.8 | 20.5 | 20.2 |</font></div>
                <div><font face="monospace">+--------------------+-------+---------------+------+------+------+------+------+</font></div>
                <div><font face="monospace"><br>
                  </font></div>
                <div><font face="monospace"><font face="arial,sans-serif">Output size - Game package
                      (MB):</font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
                    </font></font></div>
                <div><span style="font-family:monospace">|
                    Category            | No GC | GC 1 | GC 2 | GC 3 |
                    GC 4 | GC 5 | GC 6 |<br>
                  </span></div>
                <div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
                  </span></div>
                <div><span style="font-family:monospace">| Plain
                    (total)       | 1149  | 1121 | 1017 |  965 |  938 | 
                    930 |  928 |<br>
                  </span></div>
                <div><span style="font-family:monospace">| Plain
                    (DWARF*)      |  845  |  845 |  845 |  845 |  845 | 
                    845 |  845 |<br>
                  </span></div>
                <div><span style="font-family:monospace">| Plain
                    (other)       |  304  |  276 |  172 |  120 |   93
                    |   85 |   82 |<br>
                  </span></div>
                <div><span style="font-family:monospace">| Fragmented
                    (total)  | 1044  |  940 |  556 |  373 |  287 |  263
                    |  255 |<br>
                  </span></div>
                <div><span style="font-family:monospace">| Fragmented
                    (DWARF*) |  740  |  664 |  384 |  253 |  194 |  178
                    |  173 |<br>
                  </span></div>
                <div><span style="font-family:monospace">| Fragmented
                    (other)  |  304  |  276 |  172 |  120 |   93 |   85
                    |   82 |<br>
                  </span></div>
                <div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+
                  </font>
                  <div><font face="monospace"><br>
                    </font></div>
                  <div><font face="monospace"><font face="arial,sans-serif">Output size - Clang
                        (MB):</font></font></div>
                  <div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+</span><br>
                      </font></font></div>
                  <div><span style="font-family:monospace">|
                      Category            | No GC | GC 1 | GC 2 | GC 3 |
                      GC 4 | GC 5 | GC 6 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">+---------------------+-------+------+------+------+------+------+------+<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Plain
                      (total)       | 2596  | 2546 | 2406 | 2332 | 2293
                      | 2273 | 2251 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Plain
                      (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979
                      | 1979 | 1979 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Plain
                      (other)       |  616  |  567 |  426 |  353 |  314
                      |  294 |  272 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Fragmented
                      (total)  | 2397  | 2346 | 2164 | 2069 | 2017 |
                      1990 | 1963 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Fragmented
                      (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 |
                      1696 | 1691 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">| Fragmented
                      (other)  |  616  |  567 |  426 |  353 |  314 | 
                      294 |  272 |<br>
                    </span></div>
                  <div><font face="monospace">+---------------------+-------+------+------+------+------+------+------+</font></div>
                </div>
                <div><font face="monospace"><br>
                  </font></div>
                <div><font face="monospace">*DWARF size == total size of
                    .debug_info + .debug_line + .debug_ranges +
                    .debug_aranges + .debug_loc<br>
                  </font></div>
                <div><font face="monospace"><br>
                  </font></div>
                <div><font face="monospace"><font face="arial,sans-serif">Additionally, I have
                      posted <a href="https://reviews.llvm.org/D89229" target="_blank">https://reviews.llvm.org/D89229</a>
                      which provides the python script and linker
                      patches used to reproduce the above results on my
                      machine. The GC 1/2/3/4/5/6 correspond to the
                      linker option added in that patch --mark-live-pc
                      with values 1/0.8/0.6/0.4/0.2/0 respectively.<br>
                    </font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif"><br>
                    </font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif">During the conference, the
                      question was asked what the memory usage and input
                      size impact was. I've summarised these below:</font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif"><br>
                    </font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif">Input file size total
                      (GB):</font></font></div>
                <div><font face="monospace"><font face="arial,sans-serif">
                      <span style="font-family:monospace">+--------------------+------------+
                      </span></font></font></div>
                <div><span style="font-family:monospace">
                    | Package variant    | Total Size | <br>
                  </span></div>
                <div><span style="font-family:monospace">
                    +--------------------+------------+<br>
                  </span></div>
                <div><span style="font-family:monospace">
                    | Game (plain)       |     2.9    |    <br>
                  </span></div>
                <div><span style="font-family:monospace">
                    | Game (fragmented)  |     4.2    |<br>
                  </span></div>
                <div><span style="font-family:monospace">
                    | Clang (plain)      |    10.9    |<br>
                  </span></div>
                <div><span style="font-family:monospace">
                    | Clang (fragmented) |    12.3    |<br>
                  </span></div>
                <div><span style="font-family:monospace">
                    +--------------------+------------+</span></div>
                <div><span style="font-family:monospace"><br>
                  </span></div>
                <div><span style="font-family:monospace"><span style="font-family:arial,sans-serif">Peak Working
                      Set Memory usage (GB):</span><br>
                  </span></div>
                <div><span style="font-family:monospace">
                  </span>
                  <div><font face="monospace"><font face="arial,sans-serif"><span style="font-family:monospace">+--------------------+-------+------+
                        </span></font></font></div>
                  <div><span style="font-family:monospace">
                      | Package variant    | No GC | GC 1 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      +--------------------+-------+------+<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      | Game (plain)       |  4.3  |  4.7 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      | Game (fragmented)  |  8.9  |  8.6 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      | Clang (plain)      | 15.7  | 15.6 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      | Clang (fragmented) | 19.4  | 19.2 |<br>
                    </span></div>
                  <div><span style="font-family:monospace">
                      +--------------------+-------+------+</span></div>
                  <div><span style="font-family:monospace"><br>
                    </span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">I'm keen to hear what
                        people's feedback is, and also interested to see
                        what results others might see by running this
                        experiment on other input packages. Also, if
                        anybody has any alternative ideas that meet the
                        goals listed below, I'd love to hear them!<br>
                      </font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif"><br>
                      </font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">To reiterate some key
                        goals of fragmented DWARF, similar to what I
                        said in the presentation:</font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">1) Devise a scheme that
                        gives significant size savings without being too
                        costly. It's clear from just the two packages
                        I've tried this on that there is a fairly hefty
                        link time performance cost, although the exact
                        cost depends on the nature of the input package.
                        On the other hand, depending on the nature of
                        the input package, there can also be some big
                        gains.<br>
                      </font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">2) Devise a scheme that
                        doesn't require any linker knowledge of DWARF.
                        The current approach doesn't quite achieve this
                        properly due to the slight misuse of
                        SHF_LINK_ORDER, but I expect that a pivot to
                        using non-COMDAT group sections should solve
                        this problem.</font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">3) Provide some kind of
                        halfway house between simply writing tombstone
                        values into dead DWARF and fully parsing the
                        DWARF to reoptimise its/discard the dead bits.<br>
                      </font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif"><br>
                      </font></span></div>
                  <div><span style="font-family:monospace"><font face="arial,sans-serif">I'm hopeful that changes
                        could be made to the linker to improve the
                        link-time cost. There seems to be a significant
                        amount of the link time spent creating the input
                        sections. An alternative would be to devise a
                        scheme that would avoid the literal splitting
                        into section headers, in favour of some sort of
                        list of split-points that the linker uses to
                        split things up (a bit like it already does for
                        .eh_frame or mergeable sections).</font><br>
                    </span></div>
                  <span style="font-family:monospace">
                  </span></div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div>
</blockquote></div>