<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 28.10.2020 20:38, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01
            AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"
              moz-do-not-send="true">avl.lapshin@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p><br>
              </p>
              <div>On 28.10.2020 01:49, David Blaikie wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr"><br>
                  </div>
                </div>
              </blockquote>
               
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div> <br>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> <br>
                                  </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote"
                                          style="margin:0px 0px 0px
                                          0.8ex;border-left:1px solid
                                          rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>Without loading all CU
                                              into the memory it would
                                              require two passes
                                              solution. First to analyze
                                              <br>
                                              which part of DWARF
                                              relates to live code and
                                              then second pass to
                                              generate the result. <br>
                                            </p>
                                          </div>
                                        </blockquote>
                                        <div>Not sure it'd require any
                                          more second pass than a
                                          "fixup" pass, which it sounds
                                          like you're saying it already
                                          has? <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>It looks like it would need an
                                    additional pass to process inter-CU
                                    references(existed in incoming file)
                                    if we do not want to load all CUs
                                    into memory.<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Usually inter-CU references aren't
                                used, except in LTO - and in LTO all the
                                DWARF deduplication and function
                                discarding is already done by the IR
                                linker anyway. (ThinLTO is a bit
                                different, but really we'd be better off
                                teaching it the extra tricks anyway
                                (some can't be fixed in ThinLTO - like
                                emitting a "Home" definition of an
                                inline function, only to find out other
                                ThinLTO backend/shards managed to
                                optimize away all uses of the
                                function... so some cleanup may be
                                useful there)). It might be possible to
                                do a more dynamic/rolling cache - keep
                                only the CUs with unresolved cross-CU
                                references alive and only keep them
                                alive until their cross-CU references
                                are found/marked alive. This should make
                                things no worse than the traditional
                                dsymutil case - since cross-CU
                                references are only effective/generally
                                used within a single object file (it's
                                possible to create relocations for them
                                into other files - but I know LLVM
                                doesn't currently do this and I don't
                                think GCC does it) with multiple CUs
                                anyway - so at most you'd keep all the
                                CUs from a single original input file
                                alive together.<br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        But, since it is a DWARF documented case the
                        tool should be ready for such case(when inter-CU
                        <br>
                        references are heavily used).</div>
                    </blockquote>
                    <div><br>
                      Sure - but by implementing a CU liveness window
                      like that (keeping CUs live only so long as they
                      need to be rather than an all-or-nothing approach)
                      only especially quirky inputs would hit the worst
                      case while the more normal inputs could perform
                      better.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>It is not clear what should be put in such CU liveness
                window. If CU100 references CU1 - how could we know that
                we need to put CU1 into CU liveness window before we
                processed CU100?<br>
              </p>
            </div>
          </blockquote>
          <div>Fair point, not just forward references to worry about
            but backward references too. I wonder how much savings there
            is in the liveness analysis compared to "keep one copy of
            everything, no matter whether it's live or not", then it can
            be a pure forward progress situation. (with the quirk that
            you might emit a declaration for an entity once, then a
            definition for it later - alternatively if a declaration is
            seen it could be skipped under the assumption that a
            definition will follow (& use a forward ref fixup) - and
            if none is found, splat some stub declarations into a
            trailing CU at the end) <br>
          </div>
        </div>
      </div>
    </blockquote>
    That should probably be measured, but I think we would loss most of
    size reduction<br>
    (since we would start keep unreferenced data which is currently
    removed). <br>
    Which would lead to slowdown performance and bigger disk space
    usage.<br>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p><br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div> Moreover, llvm-dwarfutil would be the tool
                        producing <br>
                        exactly such situation. The resulting
                        file(produced by llvm-dwarfutil) would contain a
                        lot of <br>
                        inter-CU references. Probably, there is no
                        practical reasons to apply llvm-dwarfutil to the
                        same <br>
                        file twice but it would be a good test for the
                        tool.<br>
                      </div>
                    </blockquote>
                    <div><br>
                      It'd be a good stress test, but not necessarily
                      something that would need to perform the best
                      because it wouldn't be a common use case.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>I agree that we should not slow down the DWARFLinker in
                common cases only because we need to support the worst
                cases.<br>
                But we also need to implement a solution which works in
                some acceptable manner for the worst case. </p>
            </div>
          </blockquote>
          <div>I think that depends on "acceptable" - correct, yes.
            Practical to run in reasonable time/memory? Not necessarily,
            in my opinion. <br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p>The current solution - loading everything in memory -
                makes it hard to use in a non-dsymutil
                scenario(llvm-dwarfutil).<br>
              </p>
            </div>
          </blockquote>
          <div>I agree it's worth exploring the non-dsymutil scenario,
            as you are - I'm just saying we don't necessarily need to
            support high usability (fast/low memory usage/etc)
            llvm-dwarfutil on an already dwarfutil'd binary (but as
            you've pointed out, the "window" is unknowable because of
            backward references, so this whole subthread is perhaps
            irrelevant).<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>There could be several things which could be used to
                decide whether we need to go on a light or heavy path:<br>
                <br>
                1. If the input contains only a single CU we do not need
                to unload it from memory. Thus - we would not need to do
                an extra DWARF loading pass.<br>
                2. If abbreviations from the whole input file do not
                contain inter-CU references then while doing liveness
                analysis,  we do not need to wait until other CUs are
                processed.<br>
              </p>
            </div>
          </blockquote>
          <div>(2) Yeah, that /may/ be a good idea, cheap to test, etc.
            Though I'd still wonder if a more general implementation
            strategy could be found that would make it easier to get a
            sliding scale of efficiency depending on how much inter-CU
            references where were, not a "if there are none it's good,
            if there are any it's bad or otherwise very different to
            implement". <br>
          </div>
        </div>
      </div>
    </blockquote>
    At the current point, I do not see how that could be done. <br>
    One possibility is preliminary mark CU by IsReferenced flag.<br>
    Then we could delay cloning for such CU(either by putting into<br>
    CU liveness window/either by unloading). <br>
    Not referenced CU could be cloned immediately. Such a solution would
    be <br>
    more scalable and work well in cases when only a few inter-CU
    references<br>
    exist. Though it requires changes in DWARF format.<br>
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> <br>
                Then that scheme would be used for worst cases:<br>
                <br>
                1. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     analyse CU.<br>
                     unload CU.<br>
                   }    <br>
                2. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     clone CU.<br>
                     unload CU.<br>
                   }    <br>
                3. fixup forward references.<br>
                <br>
                and that scheme for light cases:<br>
                <br>
                1. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     analyse CU.<br>
                     clone CU.<br>
                     unload CU.<br>
                   }<br>
                2. fixup forward references.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>Generally, I think we should not assume that
                        inter-CU references would be used in a limited
                        way.<br>
                        <br>
                        Anyway, if this scheme:  <br>
                        <br>
                        1. analyse all CUs.<br>
                        2. clone all CUs.<br>
                        <p>would work slow then we would need to
                          continue with one-pass solution and not
                          support complex closely coupled inputs.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>yeah, certainly seeing the data/experiments
                      will be interesting, if you end up implementing
                      some different strategies, etc.<br>
                      <br>
                      I guess one possibility for parallel generation
                      could be something more like Microsoft's approach
                      with a central debug info server that compilers
                      communicate with - not that exact model, I mean,
                      but if you've got parallel threads generating
                      reduced DWARF into separate object files - they
                      could communicate with a single thread responsible
                      for type emission - the type emitter would be
                      given types from the separate threads and compute
                      their size, queue them up to be streamed out to
                      the type CU (& keep the source CU alive until
                      that work was done) - such a central type emitter
                      could quickly determine the size of the type to be
                      emitted and compute future type offsets (eg: if 5
                      types were in the queue, it could've figured out
                      the offset of those types already) to answer type
                      offset queries quickly and unblock the parallel
                      threads to continue emitting their CUs containing
                      type references.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>yes. Thank you. Would think about it.<br>
              </p>
              <p>Alexey.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div><br>
                      - Dave </div>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>