<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 28.10.2020 01:49, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Tue, Oct 27, 2020 at
            12:34 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p><br>
              </p>
              <div>On 27.10.2020 20:32, David Blaikie wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div dir="ltr"><br>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Tue, Oct 27,
                      2020 at 1:23 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p><br>
                        </p>
                        <div>On 26.10.2020 22:38, David Blaikie wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div dir="ltr"><br>
                            </div>
                            <br>
                            <div class="gmail_quote">
                              <div dir="ltr" class="gmail_attr">On Sun,
                                Oct 25, 2020 at 9:31 AM Alexey Lapshin
                                <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                                wrote:<br>
                              </div>
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p><br>
                                  </p>
                                  <div>On 23.10.2020 19:43, David
                                    Blaikie wrote:<br>
                                  </div>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div class="gmail_quote">
                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                                  <div><br>
                                                    <br>
                                                  </div>
                                                </blockquote>
                                                <div><br>
                                                </div>
                                                <div>Ah, yeah - that
                                                  seems like a missed
                                                  opportunity -
                                                  duplicating the whole
                                                  type DIE. LTO does
                                                  this by making
                                                  monolithic types -
                                                  merging all the
                                                  members from different
                                                  definitions of the
                                                  same type into one,
                                                  but that's maybe too
                                                  expensive for dsymutil
                                                  (might still be
                                                  interesting to know
                                                  how much more
                                                  expensive, etc). But I
                                                  think the other way to
                                                  go would be to produce
                                                  a declaration of the
                                                  type, with the
                                                  relevant members - and
                                                  let the DWARF consumer
                                                  identify this
                                                  declaration as
                                                  matching up with the
                                                  earlier definition.
                                                  That's the sort of
                                                  DWARF you get from the
                                                  non-MachO default
                                                  -fno-standalone-debug
                                                  anyway, so it's
                                                  already pretty well
                                                  tested/supported
                                                  (support in lldb's a
                                                  bit younger/more
                                                  work-in-progress,
                                                  admittedly). I wonder
                                                  how much dsym size
                                                  there is that could be
                                                  reduced by such an
                                                  implementation.</div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <p>I see. Yes, that could be
                                            done and I think it would
                                            result in noticeable size
                                            reduction(I do not know
                                            exact numbers at the
                                            moment).</p>
                                          <p>I work on multi-thread
                                            DWARFLinker now and it`s
                                            first version will do
                                            exactly the same type
                                            processing like current
                                            dsymutil.</p>
                                        </blockquote>
                                        <div>Yeah, best to keep the
                                          behavior the same through that</div>
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>Above scheme could be
                                              implemented as a next step
                                              and it would result in
                                              better size
                                              reduction(better than
                                              current state).</p>
                                            <p>But I think the better
                                              scheme could be done also
                                              and it would result in
                                              even bigger size reduction
                                              and in faster execution.
                                              This scheme is something
                                              similar to what you`ve
                                              described above: "LTO does
                                              - making monolithic types
                                              - merging all the members
                                              from different definitions
                                              of the same type into
                                              one".</p>
                                          </div>
                                        </blockquote>
                                        <div>I believe the reason that's
                                          probably not been done is that
                                          it can't be streamed - it'd
                                          lead to buffering more of the
                                          output </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>yes. The fact that DWARF should be
                                    streamed into AsmPrinter complicates
                                    parallel dwarf generation. In my
                                    prototype, I generate <br>
                                    several resulting files(each for one
                                    source compilation unit) and then
                                    sequentially glue them into the
                                    final resulting file.<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>How does that help? Do you use
                                relocations in those intermediate object
                                files so the DWARF in them can refer
                                across files? <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>It does not help with referring across the
                          file. It helps to parallel the generation of
                          CU bodies. <br>
                          It is not possible to write two CUs in
                          parallel into AsmPrinter. To make possible
                          parallel generation I stream them into
                          different AsmPrinters(this comment is for "I
                          believe the reason that's probably not been
                          done is that it can't be streamed". which
                          initially was about referring across the file,
                          but it seems I added another direction).<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Oh, I see - thanks for explaining, essentially
                      buffering on-disk. <br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> </p>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p> </p>
                                  <p><br>
                                  </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <div>(if two of these expandable
                                          types were in one CU - the
                                          start of the second type
                                          couldn't be known until the
                                          end because it might keep
                                          getting pushed later due to
                                          expansion of the first type)
                                          and/or having to revisit all
                                          the type references (the
                                          offset to the second type
                                          wouldn't be known until the
                                          end - so writing the offsets
                                          to refer to the type would
                                          have to be deferred until
                                          then).<br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>That is the second problem: offsets
                                    are not known until the end of file.<br>
                                    dsymutil already has that situation
                                    for inter-CU references, so it has
                                    extra pass to<br>
                                    fixup offsets. </p>
                                </div>
                              </blockquote>
                              <div>Oh, it does? I figured it was
                                one-pass, and that it only ever refers
                                back to types in previous CUs? So it
                                doesn't have to go back and do a second
                                pass. But I guess if sees a declaration
                                of T1 in CU1, then later on sees a
                                definition of T1 in CU2, does it somehow
                                go back to CU1 and remove the
                                declaration/make references refer to the
                                definition in CU2? I figured it'd just
                                leave the declaration and references to
                                it as-is, then add the definition and
                                use that from CU2 onwards? <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>For the processing of the types, it do not go
                          back. <br>
                          This "I figured it was one-pass, and that it
                          only ever refers back to types in previous
                          CUs" <br>
                          and this "I figured it'd just leave the
                          declaration and references to it as-is, then
                          add the definition and use that from CU2
                          onwards" are correct. <br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Great - thanks for explaining/confirming! </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> <br>
                        </p>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p>With multi-thread implementation
                                    such situation would arise more
                                    often <br>
                                    for type references and so more
                                    offsets should be fixed during
                                    additional pass.<br>
                                  </p>
                                  <blockquote type="cite">
                                    <div dir="ltr">
                                      <div class="gmail_quote">
                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                          <div>
                                            <p>DWARFLinker could create
                                              additional artificial
                                              compile unit and put all
                                              merged types there. Later
                                              patch all type references
                                              to point into this
                                              additional compilation
                                              unit.  No any bits would
                                              be duplicated in that
                                              case. The performance
                                              improvement could be
                                              achieved due to less
                                              amount of the copied DWARF
                                              and due to the fact that
                                              type references could be
                                              updated when DWARF is
                                              cloned(no need in
                                              additional pass for that).<br>
                                            </p>
                                          </div>
                                        </blockquote>
                                        <div>"later patch all type
                                          references to point into this
                                          additional compilation unit" -
                                          that's the additional pass
                                          that people are probably
                                          talking/concerned about.
                                          Rewalking all the DWARF. The
                                          current dsymutil approach, as
                                          far as I know, is single pass
                                          - it knows the final, absolute
                                          offset to the type from the
                                          moment it emits that
                                          type/needs to refer to it. <br>
                                        </div>
                                      </div>
                                    </div>
                                  </blockquote>
                                  <p>Right. Current dsymutil approach is
                                    single pass. And from that point of
                                    view, solution <br>
                                    which you`ve described(to produce a
                                    declaration of the type, with the
                                    relevant members) <br>
                                    allows to keep that single pass
                                    implementation.<br>
                                    <br>
                                    But there is a restriction for
                                    current dsymutil approach: To
                                    process inter-CU references <br>
                                    it needs to load all DWARF into the
                                    memory(While it analyzes which part
                                    of DWARF is live, <br>
                                    it needs to have all CUs loaded into
                                    the memory).</p>
                                </div>
                              </blockquote>
                              <div>All DWARF for a single file (which
                                for dsymutil is mostly a single CU,
                                except with LTO I guess?), not all DWARF
                                for all inputs in memory at once, yeah?
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>right. In dsymutil case - all DWARF for a
                          single file(not all DWARF for all inputs in
                          memory at once).<br>
                          But in llvm-dwarfutil case single file
                          contains DWARF for all original input object
                          files and it all becomes<br>
                          loaded into memory.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Yeha, would be great to try to go CU-by-CU. </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p>That leads to huge memory usage. <br>
                                    It is less important when source is
                                    a set of object files(like in
                                    dsymutil case) and this <br>
                                    become a real problem for
                                    llvm-dwarfutil utility when source
                                    is a single file(With current <br>
                                    implementation it needs 30G of
                                    memory for compiling clang binary).<br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Yeah, that's where I think you'd need
                                a fixup pass one way or another -
                                because cross-CU references can mean
                                that when you figure out a new layout
                                for CU5 (because it has a duplicate type
                                definition of something in CU1) then you
                                might have to touch CU4 that had an
                                absolute/cross-CU forward reference to
                                CU5. Once you've got such a fixup pass
                                (if dsymutil already has one? Which,
                                like I said, I'm confused why it would
                                have one/that doesn't match my very
                                vague understanding) then I think you
                                could make dsymutil work on a per-CU
                                basis streaming things out, then fixing
                                up a few offsets.<br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>When dsymutil deduplicates types it changes
                          local CU reference into inter-CU reference(so
                          that CU2(next) could reference type definition
                          from CU1(prev)). To do this change it does not
                          need to do any fixups currently.<br>
                          <br>
                          When dsymutil meets already existed(located in
                          the input object file) inter-CU reference
                          pointing into the CU which has not been
                          processed yet(and then its offset is unknown)
                          it marks it as "forward reference" and patches
                          later during additional pass "fixup forward
                          references" at a time when offsets are known.
                          <br>
                        </p>
                      </div>
                    </blockquote>
                    <div>OK, so limited 2 pass system. (does it do that
                      second pass once at the end of the whole dsymutil
                      run, or at the end of each input file? (so if an
                      input file has two CUs and the first CU references
                      a type in the second CU - it could write the first
                      CU with a "forward reference", then write the
                      second CU, then fixup the forward reference - and
                      then go on to the next file and its CUs - this
                      could improve performance by touching recently
                      used memory/disk pages only, rather than going all
                      the way back to the start later on when those
                      pages have become cold)</div>
                  </div>
                </div>
              </blockquote>
              <p>yes, It does it in the end of each input file.</p>
              <p><br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> <br>
                          If CUs would be processed in parallel their
                          offsets would not be known at the moment when
                          local type reference would be changed into
                          inter-CU reference. So we would need to do the
                          same fix-up processing for all references to
                          the types like we already do for other
                          inter-CU references.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Yeah - though the existence of this second
                      "fixup forward references" system - yeah, could
                      just use it much more generally as you say. Not an
                      extra pass, just the existing second pass but
                      having way more fixups to fixup in that pass.</div>
                  </div>
                </div>
              </blockquote>
              If we would be able to change the algorithm in such way :
              <br>
              <br>
              1. analyse all CUs.<br>
              2. clone all CUs.<br>
              <br>
              Then we could create a merged type table(artificial CU
              containing types) during step1. <br>
              If that type table would be written first, then all
              following CUs could use known offsets <br>
              to the types and we would not need additional fix-up
              processing for type references. <br>
              It would still be necessary to fix-up other inter-CU
              references. But it would not be necessary <br>
              to fix-up type references (which constitute the vast
              majority).<br>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>To me, that sounds more expensive than the fixup forward
            references pass.</div>
        </div>
      </div>
    </blockquote>
    <p>If we would speak about direct comparison then yes loading DWARF
      one more time looks more expensive than fixup forward references
      pass. But if we would speak about the general picture then it
      could probably be beneficial:<br>
      <br>
      1. merging types would lead to a smaller size of resulting DWARF.
      This would speed up the process.<br>
         f.e. If we would switch "odr types deduplication" off in
      current implementation then it would increase execution time two
      times. That is because more DWARF should be cloned and written in
      the result. Implementation of "merging types" would probably have
      a similar effect <br>
         - It would speed-up the overall process. So from one side
      additional step for loading DWARF would <br>
         decrease performance but a smaller amount of resulting data
      would increase performance.<br>
         <br>
      2. When types would be put in the first CU then we would have a
      simple strategy for our liveness analysis algorithm: just always
      keep the first CU in memory. This allows us to speed up our
      liveness analysis step.<br>
      <br>
      Anyway, all the above is just an idea for future work. Currently,
      I am going to implement multithread processing for CUs loaded into
      memory and having the same type of processing as it currently
      is(Which assumes that "fixup forward references pass" started to
      do more work by fixing types references).<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div> <br>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> <br>
                        </p>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                                <div>
                                  <p>Without loading all CU into the
                                    memory it would require two passes
                                    solution. First to analyze <br>
                                    which part of DWARF relates to live
                                    code and then second pass to
                                    generate the result. <br>
                                  </p>
                                </div>
                              </blockquote>
                              <div>Not sure it'd require any more second
                                pass than a "fixup" pass, which it
                                sounds like you're saying it already
                                has? <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>It looks like it would need an additional
                          pass to process inter-CU references(existed in
                          incoming file) if we do not want to load all
                          CUs into memory.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Usually inter-CU references aren't used, except
                      in LTO - and in LTO all the DWARF deduplication
                      and function discarding is already done by the IR
                      linker anyway. (ThinLTO is a bit different, but
                      really we'd be better off teaching it the extra
                      tricks anyway (some can't be fixed in ThinLTO -
                      like emitting a "Home" definition of an inline
                      function, only to find out other ThinLTO
                      backend/shards managed to optimize away all uses
                      of the function... so some cleanup may be useful
                      there)). It might be possible to do a more
                      dynamic/rolling cache - keep only the CUs with
                      unresolved cross-CU references alive and only keep
                      them alive until their cross-CU references are
                      found/marked alive. This should make things no
                      worse than the traditional dsymutil case - since
                      cross-CU references are only effective/generally
                      used within a single object file (it's possible to
                      create relocations for them into other files - but
                      I know LLVM doesn't currently do this and I don't
                      think GCC does it) with multiple CUs anyway - so
                      at most you'd keep all the CUs from a single
                      original input file alive together.<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              But, since it is a DWARF documented case the tool should
              be ready for such case(when inter-CU <br>
              references are heavily used).</div>
          </blockquote>
          <div><br>
            Sure - but by implementing a CU liveness window like that
            (keeping CUs live only so long as they need to be rather
            than an all-or-nothing approach) only especially quirky
            inputs would hit the worst case while the more normal inputs
            could perform better.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>It is not clear what should be put in such CU liveness window. If
      CU100 references CU1 - how could we know that we need to put CU1
      into CU liveness window before we processed CU100?<br></p></div></blockquote><div>Fair point, not just forward references to worry about but backward references too. I wonder how much savings there is in the liveness analysis compared to "keep one copy of everything, no matter whether it's live or not", then it can be a pure forward progress situation. (with the quirk that you might emit a declaration for an entity once, then a definition for it later - alternatively if a declaration is seen it could be skipped under the assumption that a definition will follow (& use a forward ref fixup) - and if none is found, splat some stub declarations into a trailing CU at the end) <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p><br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div> Moreover, llvm-dwarfutil would be the tool producing <br>
              exactly such situation. The resulting file(produced by
              llvm-dwarfutil) would contain a lot of <br>
              inter-CU references. Probably, there is no practical
              reasons to apply llvm-dwarfutil to the same <br>
              file twice but it would be a good test for the tool.<br>
            </div>
          </blockquote>
          <div><br>
            It'd be a good stress test, but not necessarily something
            that would need to perform the best because it wouldn't be a
            common use case.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>I agree that we should not slow down the DWARFLinker in common
      cases only because we need to support the worst cases.<br>
      But we also need to implement a solution which works in some
      acceptable manner for the worst case. </p></div></blockquote><div>I think that depends on "acceptable" - correct, yes. Practical to run in reasonable time/memory? Not necessarily, in my opinion. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>The current solution -
      loading everything in memory - makes it hard to use in a
      non-dsymutil scenario(llvm-dwarfutil).<br></p></div></blockquote><div>I agree it's worth exploring the non-dsymutil scenario, as you are - I'm just saying we don't necessarily need to support high usability (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil'd binary (but as you've pointed out, the "window" is unknowable because of backward references, so this whole subthread is perhaps irrelevant).<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p>There could be several things which could be used to decide
      whether we need to go on a light or heavy path:<br>
      <br>
      1. If the input contains only a single CU we do not need to unload
      it from memory. Thus - we would not need to do an extra DWARF
      loading pass.<br>
      2. If abbreviations from the whole input file do not contain
      inter-CU references then while doing liveness analysis,  we do not
      need to wait until other CUs are processed.<br></p></div></blockquote><div>(2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I'd still wonder if a more general implementation strategy could be found that would make it easier to get a sliding scale of efficiency depending on how much inter-CU references where were, not a "if there are none it's good, if there are any it's bad or otherwise very different to implement". <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
      <br>
      Then that scheme would be used for worst cases:<br>
      <br>
      1. for (CU : CU1...CU100) {<br>
           load CU.<br>
           analyse CU.<br>
           unload CU.<br>
         }    <br>
      2. for (CU : CU1...CU100) {<br>
           load CU.<br>
           clone CU.<br>
           unload CU.<br>
         }    <br>
      3. fixup forward references.<br>
      <br>
      and that scheme for light cases:<br>
      <br>
      1. for (CU : CU1...CU100) {<br>
           load CU.<br>
           analyse CU.<br>
           clone CU.<br>
           unload CU.<br>
         }<br>
      2. fixup forward references.<br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>Generally, I think we should not assume that inter-CU
              references would be used in a limited way.<br>
              <br>
              Anyway, if this scheme:  <br>
              <br>
              1. analyse all CUs.<br>
              2. clone all CUs.<br>
              <p>would work slow then we would need to continue with
                one-pass solution and not support complex closely
                coupled inputs.<br>
              </p>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>yeah, certainly seeing the data/experiments will be
            interesting, if you end up implementing some different
            strategies, etc.<br>
            <br>
            I guess one possibility for parallel generation could be
            something more like Microsoft's approach with a central
            debug info server that compilers communicate with - not that
            exact model, I mean, but if you've got parallel threads
            generating reduced DWARF into separate object files - they
            could communicate with a single thread responsible for type
            emission - the type emitter would be given types from the
            separate threads and compute their size, queue them up to be
            streamed out to the type CU (& keep the source CU alive
            until that work was done) - such a central type emitter
            could quickly determine the size of the type to be emitted
            and compute future type offsets (eg: if 5 types were in the
            queue, it could've figured out the offset of those types
            already) to answer type offset queries quickly and unblock
            the parallel threads to continue emitting their CUs
            containing type references.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>yes. Thank you. Would think about it.<br>
    </p>
    <p>Alexey.<br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
            - Dave </div>
        </div>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>