<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 2, 2020 at 2:26 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 02.11.2020 04:11, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">I think if we're in the realm of DWARF extensions a
        whole bunch of other considerations come into it (& indeed,
        your suggested proposal may be a good one - but I think it's a
        very wide problem space once we're considering DWARF
        extensions). Mostly I was making arguments/suggestions/thoughts
        on the basis of being compatible with all existing DWARF
        producers.</div>
    </blockquote>
    <p>the described scenario does not assume DWARF extensions. global
      type table is not new DWARF construction. This is an artificial CU
      keeping all types. That solution would be compatible with existing
      DWARF consumers/produces.<br></p></div></blockquote><div>Sorry, guess I'm not following. Maybe this conversation's getting a bit too abstract/theoretical/forward looking for me right now - no worries. Happy to chat more about it, but might be easier to focus on the immediate steps forward for now & tackle this when it's the thing you're planning to work on? (if I'm understanding correctly that this isn't a direction you're thinking to try right now)<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"><br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Sun, Nov 1, 2020 at 2:05 PM
          Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p><br>
            </p>
            <div>On 28.10.2020 20:38, David Blaikie wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">
                <div dir="ltr"><br>
                </div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020
                    at 6:01 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p><br>
                      </p>
                      <div>On 28.10.2020 01:49, David Blaikie wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div dir="ltr"><br>
                          </div>
                          <br>
                          <div class="gmail_quote">
                            <div dir="ltr" class="gmail_attr">On Tue,
                              Oct 27, 2020 at 12:34 PM Alexey Lapshin
                              <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                              wrote:<br>
                            </div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                              <div>
                                <p><br>
                                </p>
                                <div>On 27.10.2020 20:32, David Blaikie
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div dir="ltr"><br>
                                    </div>
                                    <br>
                                    <div class="gmail_quote">
                                      <div dir="ltr" class="gmail_attr">On
                                        Tue, Oct 27, 2020 at 1:23 AM
                                        Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                                        wrote:<br>
                                      </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <p><br>
                                          </p>
                                          <div>On 26.10.2020 22:38,
                                            David Blaikie wrote:<br>
                                          </div>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div dir="ltr"><br>
                                              </div>
                                              <br>
                                              <div class="gmail_quote">
                                                <div dir="ltr" class="gmail_attr">On
                                                  Sun, Oct 25, 2020 at
                                                  9:31 AM Alexey Lapshin
                                                  <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                                                  wrote:<br>
                                                </div>
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <p><br>
                                                    </p>
                                                    <div>On 23.10.2020
                                                      19:43, David
                                                      Blaikie wrote:<br>
                                                    </div>
                                                    <blockquote type="cite">
                                                      <div dir="ltr">
                                                        <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <blockquote type="cite">
                                                          <div dir="ltr">
                                                          <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div><br>
                                                          <br>
                                                          </div>
                                                          </blockquote>
                                                          <div><br>
                                                          </div>
                                                          <div>Ah, yeah
                                                          - that seems
                                                          like a missed
                                                          opportunity -
                                                          duplicating
                                                          the whole type
                                                          DIE. LTO does
                                                          this by making
                                                          monolithic
                                                          types -
                                                          merging all
                                                          the members
                                                          from different
                                                          definitions of
                                                          the same type
                                                          into one, but
                                                          that's maybe
                                                          too expensive
                                                          for dsymutil
                                                          (might still
                                                          be interesting
                                                          to know how
                                                          much more
                                                          expensive,
                                                          etc). But I
                                                          think the
                                                          other way to
                                                          go would be to
                                                          produce a
                                                          declaration of
                                                          the type, with
                                                          the relevant
                                                          members - and
                                                          let the DWARF
                                                          consumer
                                                          identify this
                                                          declaration as
                                                          matching up
                                                          with the
                                                          earlier
                                                          definition.
                                                          That's the
                                                          sort of DWARF
                                                          you get from
                                                          the non-MachO
                                                          default
                                                          -fno-standalone-debug
                                                          anyway, so
                                                          it's already
                                                          pretty well
                                                          tested/supported
                                                          (support in
                                                          lldb's a bit
                                                          younger/more
                                                          work-in-progress,
                                                          admittedly). I
                                                          wonder how
                                                          much dsym size
                                                          there is that
                                                          could be
                                                          reduced by
                                                          such an
                                                          implementation.</div>
                                                          </div>
                                                          </div>
                                                          </blockquote>
                                                          <p>I see. Yes,
                                                          that could be
                                                          done and I
                                                          think it would
                                                          result in
                                                          noticeable
                                                          size
                                                          reduction(I do
                                                          not know exact
                                                          numbers at the
                                                          moment).</p>
                                                          <p>I work on
                                                          multi-thread
                                                          DWARFLinker
                                                          now and it`s
                                                          first version
                                                          will do
                                                          exactly the
                                                          same type
                                                          processing
                                                          like current
                                                          dsymutil.</p>
                                                          </blockquote>
                                                          <div>Yeah,
                                                          best to keep
                                                          the behavior
                                                          the same
                                                          through that</div>
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div>
                                                          <p>Above
                                                          scheme could
                                                          be implemented
                                                          as a next step
                                                          and it would
                                                          result in
                                                          better size
                                                          reduction(better
                                                          than current
                                                          state).</p>
                                                          <p>But I think
                                                          the better
                                                          scheme could
                                                          be done also
                                                          and it would
                                                          result in even
                                                          bigger size
                                                          reduction and
                                                          in faster
                                                          execution.
                                                          This scheme is
                                                          something
                                                          similar to
                                                          what you`ve
                                                          described
                                                          above: "LTO
                                                          does - making
                                                          monolithic
                                                          types -
                                                          merging all
                                                          the members
                                                          from different
                                                          definitions of
                                                          the same type
                                                          into one".</p>
                                                          </div>
                                                          </blockquote>
                                                          <div>I believe
                                                          the reason
                                                          that's
                                                          probably not
                                                          been done is
                                                          that it can't
                                                          be streamed -
                                                          it'd lead to
                                                          buffering more
                                                          of the output
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                    <p>yes. The fact
                                                      that DWARF should
                                                      be streamed into
                                                      AsmPrinter
                                                      complicates
                                                      parallel dwarf
                                                      generation. In my
                                                      prototype, I
                                                      generate <br>
                                                      several resulting
                                                      files(each for one
                                                      source compilation
                                                      unit) and then
                                                      sequentially glue
                                                      them into the
                                                      final resulting
                                                      file.<br>
                                                    </p>
                                                  </div>
                                                </blockquote>
                                                <div>How does that help?
                                                  Do you use relocations
                                                  in those intermediate
                                                  object files so the
                                                  DWARF in them can
                                                  refer across files? <br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>It does not help with
                                            referring across the file.
                                            It helps to parallel the
                                            generation of CU bodies. <br>
                                            It is not possible to write
                                            two CUs in parallel into
                                            AsmPrinter. To make possible
                                            parallel generation I stream
                                            them into different
                                            AsmPrinters(this comment is
                                            for "I believe the reason
                                            that's probably not been
                                            done is that it can't be
                                            streamed". which initially
                                            was about referring across
                                            the file, but it seems I
                                            added another direction).<br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>Oh, I see - thanks for
                                        explaining, essentially
                                        buffering on-disk. <br>
                                      </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <p> </p>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div class="gmail_quote">
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <p> </p>
                                                    <p><br>
                                                    </p>
                                                    <blockquote type="cite">
                                                      <div dir="ltr">
                                                        <div class="gmail_quote">
                                                          <div>(if two
                                                          of these
                                                          expandable
                                                          types were in
                                                          one CU - the
                                                          start of the
                                                          second type
                                                          couldn't be
                                                          known until
                                                          the end
                                                          because it
                                                          might keep
                                                          getting pushed
                                                          later due to
                                                          expansion of
                                                          the first
                                                          type) and/or
                                                          having to
                                                          revisit all
                                                          the type
                                                          references
                                                          (the offset to
                                                          the second
                                                          type wouldn't
                                                          be known until
                                                          the end - so
                                                          writing the
                                                          offsets to
                                                          refer to the
                                                          type would
                                                          have to be
                                                          deferred until
                                                          then).<br>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                    <p>That is the
                                                      second problem:
                                                      offsets are not
                                                      known until the
                                                      end of file.<br>
                                                      dsymutil already
                                                      has that situation
                                                      for inter-CU
                                                      references, so it
                                                      has extra pass to<br>
                                                      fixup offsets. </p>
                                                  </div>
                                                </blockquote>
                                                <div>Oh, it does? I
                                                  figured it was
                                                  one-pass, and that it
                                                  only ever refers back
                                                  to types in previous
                                                  CUs? So it doesn't
                                                  have to go back and do
                                                  a second pass. But I
                                                  guess if sees a
                                                  declaration of T1 in
                                                  CU1, then later on
                                                  sees a definition of
                                                  T1 in CU2, does it
                                                  somehow go back to CU1
                                                  and remove the
                                                  declaration/make
                                                  references refer to
                                                  the definition in CU2?
                                                  I figured it'd just
                                                  leave the declaration
                                                  and references to it
                                                  as-is, then add the
                                                  definition and use
                                                  that from CU2 onwards?
                                                  <br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>For the processing of the
                                            types, it do not go back. <br>
                                            This "I figured it was
                                            one-pass, and that it only
                                            ever refers back to types in
                                            previous CUs" <br>
                                            and this "I figured it'd
                                            just leave the declaration
                                            and references to it as-is,
                                            then add the definition and
                                            use that from CU2 onwards"
                                            are correct. <br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>Great - thanks for
                                        explaining/confirming! </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <p> <br>
                                          </p>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div class="gmail_quote">
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <p>With multi-thread
                                                      implementation
                                                      such situation
                                                      would arise more
                                                      often <br>
                                                      for type
                                                      references and so
                                                      more offsets
                                                      should be fixed
                                                      during additional
                                                      pass.<br>
                                                    </p>
                                                    <blockquote type="cite">
                                                      <div dir="ltr">
                                                        <div class="gmail_quote">
                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                          <div>
                                                          <p>DWARFLinker
                                                          could create
                                                          additional
                                                          artificial
                                                          compile unit
                                                          and put all
                                                          merged types
                                                          there. Later
                                                          patch all type
                                                          references to
                                                          point into
                                                          this
                                                          additional
                                                          compilation
                                                          unit.  No any
                                                          bits would be
                                                          duplicated in
                                                          that case. The
                                                          performance
                                                          improvement
                                                          could be
                                                          achieved due
                                                          to less amount
                                                          of the copied
                                                          DWARF and due
                                                          to the fact
                                                          that type
                                                          references
                                                          could be
                                                          updated when
                                                          DWARF is
                                                          cloned(no need
                                                          in additional
                                                          pass for
                                                          that).<br>
                                                          </p>
                                                          </div>
                                                          </blockquote>
                                                          <div>"later
                                                          patch all type
                                                          references to
                                                          point into
                                                          this
                                                          additional
                                                          compilation
                                                          unit" - that's
                                                          the additional
                                                          pass that
                                                          people are
                                                          probably
                                                          talking/concerned
                                                          about.
                                                          Rewalking all
                                                          the DWARF. The
                                                          current
                                                          dsymutil
                                                          approach, as
                                                          far as I know,
                                                          is single pass
                                                          - it knows the
                                                          final,
                                                          absolute
                                                          offset to the
                                                          type from the
                                                          moment it
                                                          emits that
                                                          type/needs to
                                                          refer to it. <br>
                                                          </div>
                                                        </div>
                                                      </div>
                                                    </blockquote>
                                                    <p>Right. Current
                                                      dsymutil approach
                                                      is single pass.
                                                      And from that
                                                      point of view,
                                                      solution <br>
                                                      which you`ve
                                                      described(to
                                                      produce a
                                                      declaration of the
                                                      type, with the
                                                      relevant members)
                                                      <br>
                                                      allows to keep
                                                      that single pass
                                                      implementation.<br>
                                                      <br>
                                                      But there is a
                                                      restriction for
                                                      current dsymutil
                                                      approach: To
                                                      process inter-CU
                                                      references <br>
                                                      it needs to load
                                                      all DWARF into the
                                                      memory(While it
                                                      analyzes which
                                                      part of DWARF is
                                                      live, <br>
                                                      it needs to have
                                                      all CUs loaded
                                                      into the memory).</p>
                                                  </div>
                                                </blockquote>
                                                <div>All DWARF for a
                                                  single file (which for
                                                  dsymutil is mostly a
                                                  single CU, except with
                                                  LTO I guess?), not all
                                                  DWARF for all inputs
                                                  in memory at once,
                                                  yeah? <br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>right. In dsymutil case -
                                            all DWARF for a single
                                            file(not all DWARF for all
                                            inputs in memory at once).<br>
                                            But in llvm-dwarfutil case
                                            single file contains DWARF
                                            for all original input
                                            object files and it all
                                            becomes<br>
                                            loaded into memory.<br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>Yeha, would be great to try
                                        to go CU-by-CU. </div>
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div class="gmail_quote">
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <p>That leads to
                                                      huge memory usage.
                                                      <br>
                                                      It is less
                                                      important when
                                                      source is a set of
                                                      object files(like
                                                      in dsymutil case)
                                                      and this <br>
                                                      become a real
                                                      problem for
                                                      llvm-dwarfutil
                                                      utility when
                                                      source is a single
                                                      file(With current
                                                      <br>
                                                      implementation it
                                                      needs 30G of
                                                      memory for
                                                      compiling clang
                                                      binary).<br>
                                                    </p>
                                                  </div>
                                                </blockquote>
                                                <div>Yeah, that's where
                                                  I think you'd need a
                                                  fixup pass one way or
                                                  another - because
                                                  cross-CU references
                                                  can mean that when you
                                                  figure out a new
                                                  layout for CU5
                                                  (because it has a
                                                  duplicate type
                                                  definition of
                                                  something in CU1) then
                                                  you might have to
                                                  touch CU4 that had an
                                                  absolute/cross-CU
                                                  forward reference to
                                                  CU5. Once you've got
                                                  such a fixup pass (if
                                                  dsymutil already has
                                                  one? Which, like I
                                                  said, I'm confused why
                                                  it would have one/that
                                                  doesn't match my very
                                                  vague understanding)
                                                  then I think you could
                                                  make dsymutil work on
                                                  a per-CU basis
                                                  streaming things out,
                                                  then fixing up a few
                                                  offsets.<br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>When dsymutil deduplicates
                                            types it changes local CU
                                            reference into inter-CU
                                            reference(so that CU2(next)
                                            could reference type
                                            definition from CU1(prev)).
                                            To do this change it does
                                            not need to do any fixups
                                            currently.<br>
                                            <br>
                                            When dsymutil meets already
                                            existed(located in the input
                                            object file) inter-CU
                                            reference pointing into the
                                            CU which has not been
                                            processed yet(and then its
                                            offset is unknown) it marks
                                            it as "forward reference"
                                            and patches later during
                                            additional pass "fixup
                                            forward references" at a
                                            time when offsets are known.
                                            <br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>OK, so limited 2 pass system.
                                        (does it do that second pass
                                        once at the end of the whole
                                        dsymutil run, or at the end of
                                        each input file? (so if an input
                                        file has two CUs and the first
                                        CU references a type in the
                                        second CU - it could write the
                                        first CU with a "forward
                                        reference", then write the
                                        second CU, then fixup the
                                        forward reference - and then go
                                        on to the next file and its CUs
                                        - this could improve performance
                                        by touching recently used
                                        memory/disk pages only, rather
                                        than going all the way back to
                                        the start later on when those
                                        pages have become cold)</div>
                                    </div>
                                  </div>
                                </blockquote>
                                <p>yes, It does it in the end of each
                                  input file.</p>
                                <p><br>
                                </p>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <p> <br>
                                            If CUs would be processed in
                                            parallel their offsets would
                                            not be known at the moment
                                            when local type reference
                                            would be changed into
                                            inter-CU reference. So we
                                            would need to do the same
                                            fix-up processing for all
                                            references to the types like
                                            we already do for other
                                            inter-CU references.<br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>Yeah - though the existence
                                        of this second "fixup forward
                                        references" system - yeah, could
                                        just use it much more generally
                                        as you say. Not an extra pass,
                                        just the existing second pass
                                        but having way more fixups to
                                        fixup in that pass.</div>
                                    </div>
                                  </div>
                                </blockquote>
                                If we would be able to change the
                                algorithm in such way : <br>
                                <br>
                                1. analyse all CUs.<br>
                                2. clone all CUs.<br>
                                <br>
                                Then we could create a merged type
                                table(artificial CU containing types)
                                during step1. <br>
                                If that type table would be written
                                first, then all following CUs could use
                                known offsets <br>
                                to the types and we would not need
                                additional fix-up processing for type
                                references. <br>
                                It would still be necessary to fix-up
                                other inter-CU references. But it would
                                not be necessary <br>
                                to fix-up type references (which
                                constitute the vast majority).<br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>To me, that sounds more expensive than
                              the fixup forward references pass.</div>
                          </div>
                        </div>
                      </blockquote>
                      <p>If we would speak about direct comparison then
                        yes loading DWARF one more time looks more
                        expensive than fixup forward references pass.
                        But if we would speak about the general picture
                        then it could probably be beneficial:<br>
                        <br>
                        1. merging types would lead to a smaller size of
                        resulting DWARF. This would speed up the
                        process.<br>
                           f.e. If we would switch "odr types
                        deduplication" off in current implementation
                        then it would increase execution time two times.
                        That is because more DWARF should be cloned and
                        written in the result. Implementation of
                        "merging types" would probably have a similar
                        effect <br>
                           - It would speed-up the overall process. So
                        from one side additional step for loading DWARF
                        would <br>
                           decrease performance but a smaller amount of
                        resulting data would increase performance.<br>
                           <br>
                        2. When types would be put in the first CU then
                        we would have a simple strategy for our liveness
                        analysis algorithm: just always keep the first
                        CU in memory. This allows us to speed up our
                        liveness analysis step.<br>
                        <br>
                        Anyway, all the above is just an idea for future
                        work. Currently, I am going to implement
                        multithread processing for CUs loaded into
                        memory and having the same type of processing as
                        it currently is(Which assumes that "fixup
                        forward references pass" started to do more work
                        by fixing types references).<br>
                      </p>
                      <p><br>
                      </p>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                              <div> <br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_quote">
                                      <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                        <div>
                                          <p> <br>
                                          </p>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div class="gmail_quote">
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div>
                                                    <p>Without loading
                                                      all CU into the
                                                      memory it would
                                                      require two passes
                                                      solution. First to
                                                      analyze <br>
                                                      which part of
                                                      DWARF relates to
                                                      live code and then
                                                      second pass to
                                                      generate the
                                                      result. <br>
                                                    </p>
                                                  </div>
                                                </blockquote>
                                                <div>Not sure it'd
                                                  require any more
                                                  second pass than a
                                                  "fixup" pass, which it
                                                  sounds like you're
                                                  saying it already has?
                                                  <br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>It looks like it would need
                                            an additional pass to
                                            process inter-CU
                                            references(existed in
                                            incoming file) if we do not
                                            want to load all CUs into
                                            memory.<br>
                                          </p>
                                        </div>
                                      </blockquote>
                                      <div>Usually inter-CU references
                                        aren't used, except in LTO - and
                                        in LTO all the DWARF
                                        deduplication and function
                                        discarding is already done by
                                        the IR linker anyway. (ThinLTO
                                        is a bit different, but really
                                        we'd be better off teaching it
                                        the extra tricks anyway (some
                                        can't be fixed in ThinLTO - like
                                        emitting a "Home" definition of
                                        an inline function, only to find
                                        out other ThinLTO backend/shards
                                        managed to optimize away all
                                        uses of the function... so some
                                        cleanup may be useful there)).
                                        It might be possible to do a
                                        more dynamic/rolling cache -
                                        keep only the CUs with
                                        unresolved cross-CU references
                                        alive and only keep them alive
                                        until their cross-CU references
                                        are found/marked alive. This
                                        should make things no worse than
                                        the traditional dsymutil case -
                                        since cross-CU references are
                                        only effective/generally used
                                        within a single object file
                                        (it's possible to create
                                        relocations for them into other
                                        files - but I know LLVM doesn't
                                        currently do this and I don't
                                        think GCC does it) with multiple
                                        CUs anyway - so at most you'd
                                        keep all the CUs from a single
                                        original input file alive
                                        together.<br>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                But, since it is a DWARF documented case
                                the tool should be ready for such
                                case(when inter-CU <br>
                                references are heavily used).</div>
                            </blockquote>
                            <div><br>
                              Sure - but by implementing a CU liveness
                              window like that (keeping CUs live only so
                              long as they need to be rather than an
                              all-or-nothing approach) only especially
                              quirky inputs would hit the worst case
                              while the more normal inputs could perform
                              better.<br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <p>It is not clear what should be put in such CU
                        liveness window. If CU100 references CU1 - how
                        could we know that we need to put CU1 into CU
                        liveness window before we processed CU100?<br>
                      </p>
                    </div>
                  </blockquote>
                  <div>Fair point, not just forward references to worry
                    about but backward references too. I wonder how much
                    savings there is in the liveness analysis compared
                    to "keep one copy of everything, no matter whether
                    it's live or not", then it can be a pure forward
                    progress situation. (with the quirk that you might
                    emit a declaration for an entity once, then a
                    definition for it later - alternatively if a
                    declaration is seen it could be skipped under the
                    assumption that a definition will follow (& use
                    a forward ref fixup) - and if none is found, splat
                    some stub declarations into a trailing CU at the
                    end) <br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p> </p>
                      <p><br>
                      </p>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                              <div> Moreover, llvm-dwarfutil would be
                                the tool producing <br>
                                exactly such situation. The resulting
                                file(produced by llvm-dwarfutil) would
                                contain a lot of <br>
                                inter-CU references. Probably, there is
                                no practical reasons to apply
                                llvm-dwarfutil to the same <br>
                                file twice but it would be a good test
                                for the tool.<br>
                              </div>
                            </blockquote>
                            <div><br>
                              It'd be a good stress test, but not
                              necessarily something that would need to
                              perform the best because it wouldn't be a
                              common use case.<br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <p>I agree that we should not slow down the
                        DWARFLinker in common cases only because we need
                        to support the worst cases.<br>
                        But we also need to implement a solution which
                        works in some acceptable manner for the worst
                        case. </p>
                    </div>
                  </blockquote>
                  <div>I think that depends on "acceptable" - correct,
                    yes. Practical to run in reasonable time/memory? Not
                    necessarily, in my opinion. <br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p>The current solution - loading everything in
                        memory - makes it hard to use in a non-dsymutil
                        scenario(llvm-dwarfutil).<br>
                      </p>
                    </div>
                  </blockquote>
                  <div>I agree it's worth exploring the non-dsymutil
                    scenario, as you are - I'm just saying we don't
                    necessarily need to support high usability (fast/low
                    memory usage/etc) llvm-dwarfutil on an already
                    dwarfutil'd binary (but as you've pointed out, the
                    "window" is unknowable because of backward
                    references, so this whole subthread is perhaps
                    irrelevant).<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p> </p>
                      <p>There could be several things which could be
                        used to decide whether we need to go on a light
                        or heavy path:<br>
                        <br>
                        1. If the input contains only a single CU we do
                        not need to unload it from memory. Thus - we
                        would not need to do an extra DWARF loading
                        pass.<br>
                        2. If abbreviations from the whole input file do
                        not contain inter-CU references then while doing
                        liveness analysis,  we do not need to wait until
                        other CUs are processed.<br>
                      </p>
                    </div>
                  </blockquote>
                  <div>(2) Yeah, that /may/ be a good idea, cheap to
                    test, etc. Though I'd still wonder if a more general
                    implementation strategy could be found that would
                    make it easier to get a sliding scale of efficiency
                    depending on how much inter-CU references where
                    were, not a "if there are none it's good, if there
                    are any it's bad or otherwise very different to
                    implement". <br>
                  </div>
                </div>
              </div>
            </blockquote>
            <p>I think, there is a scenario which would make it possible
              to process CU once for not referenced CUs and handle
              inter-CU references in a scalable way(even for dwarfutil`d
              binary):<br>
              <br>
              1. Implement a global type's table and types merging. This
              allows us to have all types in the memory. <br>
                 Then, all inter-CU type references would point into
              that memory type table. <br>
                 (we do not know which CU should be put into CU liveness
              window, we also could not put all CUs into the memory, but
              we could put all types into the memory).<br>
              <br>
              2. If there are not other inter-CU references then all CUs
              would be handled by one pass.<br>
              <br>
              3. If there are other inter-CU references, then after all
              CU processed by the first pass we would have a list of
              referenced CUs. Then, we could delete already cloned
              data(for referenced CU) and start the process again: <br>
                 load CU, mark liveness, clone data. This second pass
              would be done for only referenced CUs. <br>
                 For not-complex, not closely coupled cases it would
              work relatively fast.<br>
              <br>
              4. put memory type table into artificial CU. Update all
              type`s references.<br>
            </p>
            <p><br>
            </p>
            <blockquote type="cite">
              <div dir="ltr">
                <div class="gmail_quote">
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p> <br>
                        Then that scheme would be used for worst cases:<br>
                        <br>
                        1. for (CU : CU1...CU100) {<br>
                             load CU.<br>
                             analyse CU.<br>
                             unload CU.<br>
                           }    <br>
                        2. for (CU : CU1...CU100) {<br>
                             load CU.<br>
                             clone CU.<br>
                             unload CU.<br>
                           }    <br>
                        3. fixup forward references.<br>
                        <br>
                        and that scheme for light cases:<br>
                        <br>
                        1. for (CU : CU1...CU100) {<br>
                             load CU.<br>
                             analyse CU.<br>
                             clone CU.<br>
                             unload CU.<br>
                           }<br>
                        2. fixup forward references.<br>
                      </p>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_quote">
                            <div> </div>
                            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                              <div>Generally, I think we should not
                                assume that inter-CU references would be
                                used in a limited way.<br>
                                <br>
                                Anyway, if this scheme:  <br>
                                <br>
                                1. analyse all CUs.<br>
                                2. clone all CUs.<br>
                                <p>would work slow then we would need to
                                  continue with one-pass solution and
                                  not support complex closely coupled
                                  inputs.<br>
                                </p>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>yeah, certainly seeing the
                              data/experiments will be interesting, if
                              you end up implementing some different
                              strategies, etc.<br>
                              <br>
                              I guess one possibility for parallel
                              generation could be something more like
                              Microsoft's approach with a central debug
                              info server that compilers communicate
                              with - not that exact model, I mean, but
                              if you've got parallel threads generating
                              reduced DWARF into separate object files -
                              they could communicate with a single
                              thread responsible for type emission - the
                              type emitter would be given types from the
                              separate threads and compute their size,
                              queue them up to be streamed out to the
                              type CU (& keep the source CU alive
                              until that work was done) - such a central
                              type emitter could quickly determine the
                              size of the type to be emitted and compute
                              future type offsets (eg: if 5 types were
                              in the queue, it could've figured out the
                              offset of those types already) to answer
                              type offset queries quickly and unblock
                              the parallel threads to continue emitting
                              their CUs containing type references.<br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <p>yes. Thank you. Would think about it.<br>
                      </p>
                      <p>Alexey.<br>
                      </p>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_quote">
                            <div><br>
                              - Dave </div>
                          </div>
                        </div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>