<div dir="ltr">I think if we're in the realm of DWARF extensions a whole bunch of other considerations come into it (& indeed, your suggested proposal may be a good one - but I think it's a very wide problem space once we're considering DWARF extensions). Mostly I was making arguments/suggestions/thoughts on the basis of being compatible with all existing DWARF producers.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Nov 1, 2020 at 2:05 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 28.10.2020 20:38, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01
            AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p><br>
              </p>
              <div>On 28.10.2020 01:49, David Blaikie wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr"><br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Tue, Oct 27,
                      2020 at 12:34 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p><br>
                        </p>
                        <div>On 27.10.2020 20:32, David Blaikie wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div dir="ltr"><br>
                            </div>
                            <br>
                            <div class="gmail_quote">
                              <div dir="ltr" class="gmail_attr">On Tue,
                                Oct 27, 2020 at 1:23 AM Alexey Lapshin
                                <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                                wrote:<br>
                              </div>
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p><br>
                                  </p>
                                  <div>On 26.10.2020 22:38, David
                                    Blaikie wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div dir="ltr"><br>
                                      </div>
                                      <br>
                                      <div class="gmail_quote">
                                        <div dir="ltr" class="gmail_attr">On Sun, Oct
                                          25, 2020 at 9:31 AM Alexey
                                          Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                                          wrote:<br>
                                        </div>
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p><br>
                                            </p>
                                            <div>On 23.10.2020 19:43,
                                              David Blaikie wrote:<br>
                                            </div>
                                            <blockquote type="cite">
                                              <div dir="ltr">
                                                <div class="gmail_quote">
                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                    <blockquote type="cite">
                                                      <div dir="ltr">
                                                        <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div><br>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <div><br>
                                                          </div>
                                                          <div>Ah, yeah
                                                          - that seems
                                                          like a missed
                                                          opportunity -
                                                          duplicating
                                                          the whole type
                                                          DIE. LTO does
                                                          this by making
                                                          monolithic
                                                          types -
                                                          merging all
                                                          the members
                                                          from different
                                                          definitions of
                                                          the same type
                                                          into one, but
                                                          that's maybe
                                                          too expensive
                                                          for dsymutil
                                                          (might still
                                                          be interesting
                                                          to know how
                                                          much more
                                                          expensive,
                                                          etc). But I
                                                          think the
                                                          other way to
                                                          go would be to
                                                          produce a
                                                          declaration of
                                                          the type, with
                                                          the relevant
                                                          members - and
                                                          let the DWARF
                                                          consumer
                                                          identify this
                                                          declaration as
                                                          matching up
                                                          with the
                                                          earlier
                                                          definition.
                                                          That's the
                                                          sort of DWARF
                                                          you get from
                                                          the non-MachO
                                                          default
                                                          -fno-standalone-debug
                                                          anyway, so
                                                          it's already
                                                          pretty well
                                                          tested/supported
                                                          (support in
                                                          lldb's a bit
                                                          younger/more
                                                          work-in-progress,
                                                          admittedly). I
                                                          wonder how
                                                          much dsym size
                                                          there is that
                                                          could be
                                                          reduced by
                                                          such an
                                                          implementation.</div>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                    <p>I see. Yes, that
                                                      could be done and
                                                      I think it would
                                                      result in
                                                      noticeable size
                                                      reduction(I do not
                                                      know exact numbers
                                                      at the moment).</p>
                                                    <p>I work on
                                                      multi-thread
                                                      DWARFLinker now
                                                      and it`s first
                                                      version will do
                                                      exactly the same
                                                      type processing
                                                      like current
                                                      dsymutil.</p>
                                                  </blockquote>
                                                  <div>Yeah, best to
                                                    keep the behavior
                                                    the same through
                                                    that</div>
                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                    <div>
                                                      <p>Above scheme
                                                        could be
                                                        implemented as a
                                                        next step and it
                                                        would result in
                                                        better size
                                                        reduction(better
                                                        than current
                                                        state).</p>
                                                      <p>But I think the
                                                        better scheme
                                                        could be done
                                                        also and it
                                                        would result in
                                                        even bigger size
                                                        reduction and in
                                                        faster
                                                        execution. This
                                                        scheme is
                                                        something
                                                        similar to what
                                                        you`ve described
                                                        above: "LTO does
                                                        - making
                                                        monolithic types
                                                        - merging all
                                                        the members from
                                                        different
                                                        definitions of
                                                        the same type
                                                        into one".</p>
                                                    </div>
                                                  </blockquote>
                                                  <div>I believe the
                                                    reason that's
                                                    probably not been
                                                    done is that it
                                                    can't be streamed -
                                                    it'd lead to
                                                    buffering more of
                                                    the output </div>
                                                </div>
                                              </div>
                                            </blockquote>
                                            <p>yes. The fact that DWARF
                                              should be streamed into
                                              AsmPrinter complicates
                                              parallel dwarf generation.
                                              In my prototype, I
                                              generate <br>
                                              several resulting
                                              files(each for one source
                                              compilation unit) and then
                                              sequentially glue them
                                              into the final resulting
                                              file.<br>
                                            </p>
                                          </div>
                                        </blockquote>
                                        <div>How does that help? Do you
                                          use relocations in those
                                          intermediate object files so
                                          the DWARF in them can refer
                                          across files? <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>It does not help with referring
                                    across the file. It helps to
                                    parallel the generation of CU
                                    bodies. <br>
                                    It is not possible to write two CUs
                                    in parallel into AsmPrinter. To make
                                    possible parallel generation I
                                    stream them into different
                                    AsmPrinters(this comment is for "I
                                    believe the reason that's probably
                                    not been done is that it can't be
                                    streamed". which initially was about
                                    referring across the file, but it
                                    seems I added another direction).<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Oh, I see - thanks for explaining,
                                essentially buffering on-disk. <br>
                              </div>
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p> </p>
                                            <p><br>
                                            </p>
                                            <blockquote type="cite">
                                              <div dir="ltr">
                                                <div class="gmail_quote">
                                                  <div>(if two of these
                                                    expandable types
                                                    were in one CU - the
                                                    start of the second
                                                    type couldn't be
                                                    known until the end
                                                    because it might
                                                    keep getting pushed
                                                    later due to
                                                    expansion of the
                                                    first type) and/or
                                                    having to revisit
                                                    all the type
                                                    references (the
                                                    offset to the second
                                                    type wouldn't be
                                                    known until the end
                                                    - so writing the
                                                    offsets to refer to
                                                    the type would have
                                                    to be deferred until
                                                    then).<br>
                                                  </div>
                                                </div>
                                              </div>
                                            </blockquote>
                                            <p>That is the second
                                              problem: offsets are not
                                              known until the end of
                                              file.<br>
                                              dsymutil already has that
                                              situation for inter-CU
                                              references, so it has
                                              extra pass to<br>
                                              fixup offsets. </p>
                                          </div>
                                        </blockquote>
                                        <div>Oh, it does? I figured it
                                          was one-pass, and that it only
                                          ever refers back to types in
                                          previous CUs? So it doesn't
                                          have to go back and do a
                                          second pass. But I guess if
                                          sees a declaration of T1 in
                                          CU1, then later on sees a
                                          definition of T1 in CU2, does
                                          it somehow go back to CU1 and
                                          remove the declaration/make
                                          references refer to the
                                          definition in CU2? I figured
                                          it'd just leave the
                                          declaration and references to
                                          it as-is, then add the
                                          definition and use that from
                                          CU2 onwards? <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>For the processing of the types, it
                                    do not go back. <br>
                                    This "I figured it was one-pass, and
                                    that it only ever refers back to
                                    types in previous CUs" <br>
                                    and this "I figured it'd just leave
                                    the declaration and references to it
                                    as-is, then add the definition and
                                    use that from CU2 onwards" are
                                    correct. <br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Great - thanks for
                                explaining/confirming! </div>
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> <br>
                                  </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>With multi-thread
                                              implementation such
                                              situation would arise more
                                              often <br>
                                              for type references and so
                                              more offsets should be
                                              fixed during additional
                                              pass.<br>
                                            </p>
                                            <blockquote type="cite">
                                              <div dir="ltr">
                                                <div class="gmail_quote">
                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                    <div>
                                                      <p>DWARFLinker
                                                        could create
                                                        additional
                                                        artificial
                                                        compile unit and
                                                        put all merged
                                                        types there.
                                                        Later patch all
                                                        type references
                                                        to point into
                                                        this additional
                                                        compilation
                                                        unit.  No any
                                                        bits would be
                                                        duplicated in
                                                        that case. The
                                                        performance
                                                        improvement
                                                        could be
                                                        achieved due to
                                                        less amount of
                                                        the copied DWARF
                                                        and due to the
                                                        fact that type
                                                        references could
                                                        be updated when
                                                        DWARF is
                                                        cloned(no need
                                                        in additional
                                                        pass for that).<br>
                                                      </p>
                                                    </div>
                                                  </blockquote>
                                                  <div>"later patch all
                                                    type references to
                                                    point into this
                                                    additional
                                                    compilation unit" -
                                                    that's the
                                                    additional pass that
                                                    people are probably
                                                    talking/concerned
                                                    about. Rewalking all
                                                    the DWARF. The
                                                    current dsymutil
                                                    approach, as far as
                                                    I know, is single
                                                    pass - it knows the
                                                    final, absolute
                                                    offset to the type
                                                    from the moment it
                                                    emits that
                                                    type/needs to refer
                                                    to it. <br>
                                                  </div>
                                                </div>
                                              </div>
                                            </blockquote>
                                            <p>Right. Current dsymutil
                                              approach is single pass.
                                              And from that point of
                                              view, solution <br>
                                              which you`ve described(to
                                              produce a declaration of
                                              the type, with the
                                              relevant members) <br>
                                              allows to keep that single
                                              pass implementation.<br>
                                              <br>
                                              But there is a restriction
                                              for current dsymutil
                                              approach: To process
                                              inter-CU references <br>
                                              it needs to load all DWARF
                                              into the memory(While it
                                              analyzes which part of
                                              DWARF is live, <br>
                                              it needs to have all CUs
                                              loaded into the memory).</p>
                                          </div>
                                        </blockquote>
                                        <div>All DWARF for a single file
                                          (which for dsymutil is mostly
                                          a single CU, except with LTO I
                                          guess?), not all DWARF for all
                                          inputs in memory at once,
                                          yeah? <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>right. In dsymutil case - all DWARF
                                    for a single file(not all DWARF for
                                    all inputs in memory at once).<br>
                                    But in llvm-dwarfutil case single
                                    file contains DWARF for all original
                                    input object files and it all
                                    becomes<br>
                                    loaded into memory.<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Yeha, would be great to try to go
                                CU-by-CU. </div>
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>That leads to huge memory
                                              usage. <br>
                                              It is less important when
                                              source is a set of object
                                              files(like in dsymutil
                                              case) and this <br>
                                              become a real problem for
                                              llvm-dwarfutil utility
                                              when source is a single
                                              file(With current <br>
                                              implementation it needs
                                              30G of memory for
                                              compiling clang binary).<br>
                                            </p>
                                          </div>
                                        </blockquote>
                                        <div>Yeah, that's where I think
                                          you'd need a fixup pass one
                                          way or another - because
                                          cross-CU references can mean
                                          that when you figure out a new
                                          layout for CU5 (because it has
                                          a duplicate type definition of
                                          something in CU1) then you
                                          might have to touch CU4 that
                                          had an absolute/cross-CU
                                          forward reference to CU5. Once
                                          you've got such a fixup pass
                                          (if dsymutil already has one?
                                          Which, like I said, I'm
                                          confused why it would have
                                          one/that doesn't match my very
                                          vague understanding) then I
                                          think you could make dsymutil
                                          work on a per-CU basis
                                          streaming things out, then
                                          fixing up a few offsets.<br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>When dsymutil deduplicates types it
                                    changes local CU reference into
                                    inter-CU reference(so that CU2(next)
                                    could reference type definition from
                                    CU1(prev)). To do this change it
                                    does not need to do any fixups
                                    currently.<br>
                                    <br>
                                    When dsymutil meets already
                                    existed(located in the input object
                                    file) inter-CU reference pointing
                                    into the CU which has not been
                                    processed yet(and then its offset is
                                    unknown) it marks it as "forward
                                    reference" and patches later during
                                    additional pass "fixup forward
                                    references" at a time when offsets
                                    are known. <br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>OK, so limited 2 pass system. (does
                                it do that second pass once at the end
                                of the whole dsymutil run, or at the end
                                of each input file? (so if an input file
                                has two CUs and the first CU references
                                a type in the second CU - it could write
                                the first CU with a "forward reference",
                                then write the second CU, then fixup the
                                forward reference - and then go on to
                                the next file and its CUs - this could
                                improve performance by touching recently
                                used memory/disk pages only, rather than
                                going all the way back to the start
                                later on when those pages have become
                                cold)</div>
                            </div>
                          </div>
                        </blockquote>
                        <p>yes, It does it in the end of each input
                          file.</p>
                        <p><br>
                        </p>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> <br>
                                    If CUs would be processed in
                                    parallel their offsets would not be
                                    known at the moment when local type
                                    reference would be changed into
                                    inter-CU reference. So we would need
                                    to do the same fix-up processing for
                                    all references to the types like we
                                    already do for other inter-CU
                                    references.<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Yeah - though the existence of this
                                second "fixup forward references" system
                                - yeah, could just use it much more
                                generally as you say. Not an extra pass,
                                just the existing second pass but having
                                way more fixups to fixup in that pass.</div>
                            </div>
                          </div>
                        </blockquote>
                        If we would be able to change the algorithm in
                        such way : <br>
                        <br>
                        1. analyse all CUs.<br>
                        2. clone all CUs.<br>
                        <br>
                        Then we could create a merged type
                        table(artificial CU containing types) during
                        step1. <br>
                        If that type table would be written first, then
                        all following CUs could use known offsets <br>
                        to the types and we would not need additional
                        fix-up processing for type references. <br>
                        It would still be necessary to fix-up other
                        inter-CU references. But it would not be
                        necessary <br>
                        to fix-up type references (which constitute the
                        vast majority).<br>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>To me, that sounds more expensive than the
                      fixup forward references pass.</div>
                  </div>
                </div>
              </blockquote>
              <p>If we would speak about direct comparison then yes
                loading DWARF one more time looks more expensive than
                fixup forward references pass. But if we would speak
                about the general picture then it could probably be
                beneficial:<br>
                <br>
                1. merging types would lead to a smaller size of
                resulting DWARF. This would speed up the process.<br>
                   f.e. If we would switch "odr types deduplication" off
                in current implementation then it would increase
                execution time two times. That is because more DWARF
                should be cloned and written in the result.
                Implementation of "merging types" would probably have a
                similar effect <br>
                   - It would speed-up the overall process. So from one
                side additional step for loading DWARF would <br>
                   decrease performance but a smaller amount of
                resulting data would increase performance.<br>
                   <br>
                2. When types would be put in the first CU then we would
                have a simple strategy for our liveness analysis
                algorithm: just always keep the first CU in memory. This
                allows us to speed up our liveness analysis step.<br>
                <br>
                Anyway, all the above is just an idea for future work.
                Currently, I am going to implement multithread
                processing for CUs loaded into memory and having the
                same type of processing as it currently is(Which assumes
                that "fixup forward references pass" started to do more
                work by fixing types references).<br>
              </p>
              <p><br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div> <br>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> <br>
                                  </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>Without loading all CU
                                              into the memory it would
                                              require two passes
                                              solution. First to analyze
                                              <br>
                                              which part of DWARF
                                              relates to live code and
                                              then second pass to
                                              generate the result. <br>
                                            </p>
                                          </div>
                                        </blockquote>
                                        <div>Not sure it'd require any
                                          more second pass than a
                                          "fixup" pass, which it sounds
                                          like you're saying it already
                                          has? <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>It looks like it would need an
                                    additional pass to process inter-CU
                                    references(existed in incoming file)
                                    if we do not want to load all CUs
                                    into memory.<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Usually inter-CU references aren't
                                used, except in LTO - and in LTO all the
                                DWARF deduplication and function
                                discarding is already done by the IR
                                linker anyway. (ThinLTO is a bit
                                different, but really we'd be better off
                                teaching it the extra tricks anyway
                                (some can't be fixed in ThinLTO - like
                                emitting a "Home" definition of an
                                inline function, only to find out other
                                ThinLTO backend/shards managed to
                                optimize away all uses of the
                                function... so some cleanup may be
                                useful there)). It might be possible to
                                do a more dynamic/rolling cache - keep
                                only the CUs with unresolved cross-CU
                                references alive and only keep them
                                alive until their cross-CU references
                                are found/marked alive. This should make
                                things no worse than the traditional
                                dsymutil case - since cross-CU
                                references are only effective/generally
                                used within a single object file (it's
                                possible to create relocations for them
                                into other files - but I know LLVM
                                doesn't currently do this and I don't
                                think GCC does it) with multiple CUs
                                anyway - so at most you'd keep all the
                                CUs from a single original input file
                                alive together.<br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        But, since it is a DWARF documented case the
                        tool should be ready for such case(when inter-CU
                        <br>
                        references are heavily used).</div>
                    </blockquote>
                    <div><br>
                      Sure - but by implementing a CU liveness window
                      like that (keeping CUs live only so long as they
                      need to be rather than an all-or-nothing approach)
                      only especially quirky inputs would hit the worst
                      case while the more normal inputs could perform
                      better.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>It is not clear what should be put in such CU liveness
                window. If CU100 references CU1 - how could we know that
                we need to put CU1 into CU liveness window before we
                processed CU100?<br>
              </p>
            </div>
          </blockquote>
          <div>Fair point, not just forward references to worry about
            but backward references too. I wonder how much savings there
            is in the liveness analysis compared to "keep one copy of
            everything, no matter whether it's live or not", then it can
            be a pure forward progress situation. (with the quirk that
            you might emit a declaration for an entity once, then a
            definition for it later - alternatively if a declaration is
            seen it could be skipped under the assumption that a
            definition will follow (& use a forward ref fixup) - and
            if none is found, splat some stub declarations into a
            trailing CU at the end) <br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p><br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div> Moreover, llvm-dwarfutil would be the tool
                        producing <br>
                        exactly such situation. The resulting
                        file(produced by llvm-dwarfutil) would contain a
                        lot of <br>
                        inter-CU references. Probably, there is no
                        practical reasons to apply llvm-dwarfutil to the
                        same <br>
                        file twice but it would be a good test for the
                        tool.<br>
                      </div>
                    </blockquote>
                    <div><br>
                      It'd be a good stress test, but not necessarily
                      something that would need to perform the best
                      because it wouldn't be a common use case.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>I agree that we should not slow down the DWARFLinker in
                common cases only because we need to support the worst
                cases.<br>
                But we also need to implement a solution which works in
                some acceptable manner for the worst case. </p>
            </div>
          </blockquote>
          <div>I think that depends on "acceptable" - correct, yes.
            Practical to run in reasonable time/memory? Not necessarily,
            in my opinion. <br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>The current solution - loading everything in memory -
                makes it hard to use in a non-dsymutil
                scenario(llvm-dwarfutil).<br>
              </p>
            </div>
          </blockquote>
          <div>I agree it's worth exploring the non-dsymutil scenario,
            as you are - I'm just saying we don't necessarily need to
            support high usability (fast/low memory usage/etc)
            llvm-dwarfutil on an already dwarfutil'd binary (but as
            you've pointed out, the "window" is unknowable because of
            backward references, so this whole subthread is perhaps
            irrelevant).<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>There could be several things which could be used to
                decide whether we need to go on a light or heavy path:<br>
                <br>
                1. If the input contains only a single CU we do not need
                to unload it from memory. Thus - we would not need to do
                an extra DWARF loading pass.<br>
                2. If abbreviations from the whole input file do not
                contain inter-CU references then while doing liveness
                analysis,  we do not need to wait until other CUs are
                processed.<br>
              </p>
            </div>
          </blockquote>
          <div>(2) Yeah, that /may/ be a good idea, cheap to test, etc.
            Though I'd still wonder if a more general implementation
            strategy could be found that would make it easier to get a
            sliding scale of efficiency depending on how much inter-CU
            references where were, not a "if there are none it's good,
            if there are any it's bad or otherwise very different to
            implement". <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>I think, there is a scenario which would make it possible to
      process CU once for not referenced CUs and handle inter-CU
      references in a scalable way(even for dwarfutil`d binary):<br>
      <br>
      1. Implement a global type's table and types merging. This allows
      us to have all types in the memory. <br>
         Then, all inter-CU type references would point into that memory
      type table. <br>
         (we do not know which CU should be put into CU liveness window,
      we also could not put all CUs into the memory, but we could put
      all types into the memory).<br>
      <br>
      2. If there are not other inter-CU references then all CUs would
      be handled by one pass.<br>
      <br>
      3. If there are other inter-CU references, then after all CU
      processed by the first pass we would have a list of referenced
      CUs. Then, we could delete already cloned data(for referenced CU)
      and start the process again: <br>
         load CU, mark liveness, clone data. This second pass would be
      done for only referenced CUs. <br>
         For not-complex, not closely coupled cases it would work
      relatively fast.<br>
      <br>
      4. put memory type table into artificial CU. Update all type`s
      references.<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> <br>
                Then that scheme would be used for worst cases:<br>
                <br>
                1. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     analyse CU.<br>
                     unload CU.<br>
                   }    <br>
                2. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     clone CU.<br>
                     unload CU.<br>
                   }    <br>
                3. fixup forward references.<br>
                <br>
                and that scheme for light cases:<br>
                <br>
                1. for (CU : CU1...CU100) {<br>
                     load CU.<br>
                     analyse CU.<br>
                     clone CU.<br>
                     unload CU.<br>
                   }<br>
                2. fixup forward references.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div> </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>Generally, I think we should not assume that
                        inter-CU references would be used in a limited
                        way.<br>
                        <br>
                        Anyway, if this scheme:  <br>
                        <br>
                        1. analyse all CUs.<br>
                        2. clone all CUs.<br>
                        <p>would work slow then we would need to
                          continue with one-pass solution and not
                          support complex closely coupled inputs.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                    <div>yeah, certainly seeing the data/experiments
                      will be interesting, if you end up implementing
                      some different strategies, etc.<br>
                      <br>
                      I guess one possibility for parallel generation
                      could be something more like Microsoft's approach
                      with a central debug info server that compilers
                      communicate with - not that exact model, I mean,
                      but if you've got parallel threads generating
                      reduced DWARF into separate object files - they
                      could communicate with a single thread responsible
                      for type emission - the type emitter would be
                      given types from the separate threads and compute
                      their size, queue them up to be streamed out to
                      the type CU (& keep the source CU alive until
                      that work was done) - such a central type emitter
                      could quickly determine the size of the type to be
                      emitted and compute future type offsets (eg: if 5
                      types were in the queue, it could've figured out
                      the offset of those types already) to answer type
                      offset queries quickly and unblock the parallel
                      threads to continue emitting their CUs containing
                      type references.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>yes. Thank you. Would think about it.<br>
              </p>
              <p>Alexey.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div><br>
                      - Dave </div>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </div>

</blockquote></div>