<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 28.10.2020 20:38, David Blaikie

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr"><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01

            AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"

              moz-do-not-send="true">avl.lapshin@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p><br>

              </p>

              <div>On 28.10.2020 01:49, David Blaikie wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div dir="ltr"><br>

                  </div>

                </div>

              </blockquote>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div> <br>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote"

                                style="margin:0px 0px 0px

                                0.8ex;border-left:1px solid

                                rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> <br>

                                  </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote"

                                          style="margin:0px 0px 0px

                                          0.8ex;border-left:1px solid

                                          rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>Without loading all CU

                                              into the memory it would

                                              require two passes

                                              solution. First to analyze

                                              <br>

                                              which part of DWARF

                                              relates to live code and

                                              then second pass to

                                              generate the result. <br>

                                            </p>

                                          </div>

                                        </blockquote>

                                        <div>Not sure it'd require any

                                          more second pass than a

                                          "fixup" pass, which it sounds

                                          like you're saying it already

                                          has? <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>It looks like it would need an

                                    additional pass to process inter-CU

                                    references(existed in incoming file)

                                    if we do not want to load all CUs

                                    into memory.<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Usually inter-CU references aren't

                                used, except in LTO - and in LTO all the

                                DWARF deduplication and function

                                discarding is already done by the IR

                                linker anyway. (ThinLTO is a bit

                                different, but really we'd be better off

                                teaching it the extra tricks anyway

                                (some can't be fixed in ThinLTO - like

                                emitting a "Home" definition of an

                                inline function, only to find out other

                                ThinLTO backend/shards managed to

                                optimize away all uses of the

                                function... so some cleanup may be

                                useful there)). It might be possible to

                                do a more dynamic/rolling cache - keep

                                only the CUs with unresolved cross-CU

                                references alive and only keep them

                                alive until their cross-CU references

                                are found/marked alive. This should make

                                things no worse than the traditional

                                dsymutil case - since cross-CU

                                references are only effective/generally

                                used within a single object file (it's

                                possible to create relocations for them

                                into other files - but I know LLVM

                                doesn't currently do this and I don't

                                think GCC does it) with multiple CUs

                                anyway - so at most you'd keep all the

                                CUs from a single original input file

                                alive together.<br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        But, since it is a DWARF documented case the

                        tool should be ready for such case(when inter-CU

                        <br>

                        references are heavily used).</div>

                    </blockquote>

                    <div><br>

                      Sure - but by implementing a CU liveness window

                      like that (keeping CUs live only so long as they

                      need to be rather than an all-or-nothing approach)

                      only especially quirky inputs would hit the worst

                      case while the more normal inputs could perform

                      better.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>It is not clear what should be put in such CU liveness

                window. If CU100 references CU1 - how could we know that

                we need to put CU1 into CU liveness window before we

                processed CU100?<br>

              </p>

            </div>

          </blockquote>

          <div>Fair point, not just forward references to worry about

            but backward references too. I wonder how much savings there

            is in the liveness analysis compared to "keep one copy of

            everything, no matter whether it's live or not", then it can

            be a pure forward progress situation. (with the quirk that

            you might emit a declaration for an entity once, then a

            definition for it later - alternatively if a declaration is

            seen it could be skipped under the assumption that a

            definition will follow (& use a forward ref fixup) - and

            if none is found, splat some stub declarations into a

            trailing CU at the end) <br>

          </div>

        </div>

      </div>

    </blockquote>

    That should probably be measured, but I think we would loss most of

    size reduction<br>

    (since we would start keep unreferenced data which is currently

    removed). <br>

    Which would lead to slowdown performance and bigger disk space

    usage.<br>

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <p><br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div> Moreover, llvm-dwarfutil would be the tool

                        producing <br>

                        exactly such situation. The resulting

                        file(produced by llvm-dwarfutil) would contain a

                        lot of <br>

                        inter-CU references. Probably, there is no

                        practical reasons to apply llvm-dwarfutil to the

                        same <br>

                        file twice but it would be a good test for the

                        tool.<br>

                      </div>

                    </blockquote>

                    <div><br>

                      It'd be a good stress test, but not necessarily

                      something that would need to perform the best

                      because it wouldn't be a common use case.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>I agree that we should not slow down the DWARFLinker in

                common cases only because we need to support the worst

                cases.<br>

                But we also need to implement a solution which works in

                some acceptable manner for the worst case. </p>

            </div>

          </blockquote>

          <div>I think that depends on "acceptable" - correct, yes.

            Practical to run in reasonable time/memory? Not necessarily,

            in my opinion. <br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p>The current solution - loading everything in memory -

                makes it hard to use in a non-dsymutil

                scenario(llvm-dwarfutil).<br>

              </p>

            </div>

          </blockquote>

          <div>I agree it's worth exploring the non-dsymutil scenario,

            as you are - I'm just saying we don't necessarily need to

            support high usability (fast/low memory usage/etc)

            llvm-dwarfutil on an already dwarfutil'd binary (but as

            you've pointed out, the "window" is unknowable because of

            backward references, so this whole subthread is perhaps

            irrelevant).<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <p>There could be several things which could be used to

                decide whether we need to go on a light or heavy path:<br>

                <br>

                1. If the input contains only a single CU we do not need

                to unload it from memory. Thus - we would not need to do

                an extra DWARF loading pass.<br>

                2. If abbreviations from the whole input file do not

                contain inter-CU references then while doing liveness

                analysis,  we do not need to wait until other CUs are

                processed.<br>

              </p>

            </div>

          </blockquote>

          <div>(2) Yeah, that /may/ be a good idea, cheap to test, etc.

            Though I'd still wonder if a more general implementation

            strategy could be found that would make it easier to get a

            sliding scale of efficiency depending on how much inter-CU

            references where were, not a "if there are none it's good,

            if there are any it's bad or otherwise very different to

            implement". <br>

          </div>

        </div>

      </div>

    </blockquote>

    At the current point, I do not see how that could be done. <br>

    One possibility is preliminary mark CU by IsReferenced flag.<br>

    Then we could delay cloning for such CU(either by putting into<br>

    CU liveness window/either by unloading). <br>

    Not referenced CU could be cloned immediately. Such a solution would

    be <br>

    more scalable and work well in cases when only a few inter-CU

    references<br>

    exist. Though it requires changes in DWARF format.<br>

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAENS6EuOz-rVnp5ry=PCf9RUO3Rj-ZarU-jwrqGmBqj53Kjy=g@mail.gmail.com">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <div>

              <p> <br>

                Then that scheme would be used for worst cases:<br>

                <br>

                1. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     analyse CU.<br>

                     unload CU.<br>

                   }    <br>

                2. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     clone CU.<br>

                     unload CU.<br>

                   }    <br>

                3. fixup forward references.<br>

                <br>

                and that scheme for light cases:<br>

                <br>

                1. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     analyse CU.<br>

                     clone CU.<br>

                     unload CU.<br>

                   }<br>

                2. fixup forward references.<br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px

                      0px 0px 0.8ex;border-left:1px solid

                      rgb(204,204,204);padding-left:1ex">

                      <div>Generally, I think we should not assume that

                        inter-CU references would be used in a limited

                        way.<br>

                        <br>

                        Anyway, if this scheme:  <br>

                        <br>

                        1. analyse all CUs.<br>

                        2. clone all CUs.<br>

                        <p>would work slow then we would need to

                          continue with one-pass solution and not

                          support complex closely coupled inputs.<br>

                        </p>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>yeah, certainly seeing the data/experiments

                      will be interesting, if you end up implementing

                      some different strategies, etc.<br>

                      <br>

                      I guess one possibility for parallel generation

                      could be something more like Microsoft's approach

                      with a central debug info server that compilers

                      communicate with - not that exact model, I mean,

                      but if you've got parallel threads generating

                      reduced DWARF into separate object files - they

                      could communicate with a single thread responsible

                      for type emission - the type emitter would be

                      given types from the separate threads and compute

                      their size, queue them up to be streamed out to

                      the type CU (& keep the source CU alive until

                      that work was done) - such a central type emitter

                      could quickly determine the size of the type to be

                      emitted and compute future type offsets (eg: if 5

                      types were in the queue, it could've figured out

                      the offset of those types already) to answer type

                      offset queries quickly and unblock the parallel

                      threads to continue emitting their CUs containing

                      type references.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>yes. Thank you. Would think about it.<br>

              </p>

              <p>Alexey.<br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div><br>

                      - Dave </div>

                  </div>

                </div>

              </blockquote>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </body>

</html>