<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 28.10.2020 01:49, David Blaikie

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div dir="ltr"><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Tue, Oct 27, 2020 at

            12:34 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p><br>

              </p>

              <div>On 27.10.2020 20:32, David Blaikie wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div dir="ltr"><br>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Tue, Oct 27,

                      2020 at 1:23 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p><br>

                        </p>

                        <div>On 26.10.2020 22:38, David Blaikie wrote:<br>

                        </div>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div dir="ltr"><br>

                            </div>

                            <br>

                            <div class="gmail_quote">

                              <div dir="ltr" class="gmail_attr">On Sun,

                                Oct 25, 2020 at 9:31 AM Alexey Lapshin

                                <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

                                wrote:<br>

                              </div>

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p><br>

                                  </p>

                                  <div>On 23.10.2020 19:43, David

                                    Blaikie wrote:<br>

                                  </div>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div class="gmail_quote">

                                                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                  <div><br>

                                                    <br>

                                                  </div>

                                                </blockquote>

                                                <div><br>

                                                </div>

                                                <div>Ah, yeah - that

                                                  seems like a missed

                                                  opportunity -

                                                  duplicating the whole

                                                  type DIE. LTO does

                                                  this by making

                                                  monolithic types -

                                                  merging all the

                                                  members from different

                                                  definitions of the

                                                  same type into one,

                                                  but that's maybe too

                                                  expensive for dsymutil

                                                  (might still be

                                                  interesting to know

                                                  how much more

                                                  expensive, etc). But I

                                                  think the other way to

                                                  go would be to produce

                                                  a declaration of the

                                                  type, with the

                                                  relevant members - and

                                                  let the DWARF consumer

                                                  identify this

                                                  declaration as

                                                  matching up with the

                                                  earlier definition.

                                                  That's the sort of

                                                  DWARF you get from the

                                                  non-MachO default

                                                  -fno-standalone-debug

                                                  anyway, so it's

                                                  already pretty well

                                                  tested/supported

                                                  (support in lldb's a

                                                  bit younger/more

                                                  work-in-progress,

                                                  admittedly). I wonder

                                                  how much dsym size

                                                  there is that could be

                                                  reduced by such an

                                                  implementation.</div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>I see. Yes, that could be

                                            done and I think it would

                                            result in noticeable size

                                            reduction(I do not know

                                            exact numbers at the

                                            moment).</p>

                                          <p>I work on multi-thread

                                            DWARFLinker now and it`s

                                            first version will do

                                            exactly the same type

                                            processing like current

                                            dsymutil.</p>

                                        </blockquote>

                                        <div>Yeah, best to keep the

                                          behavior the same through that</div>

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>Above scheme could be

                                              implemented as a next step

                                              and it would result in

                                              better size

                                              reduction(better than

                                              current state).</p>

                                            <p>But I think the better

                                              scheme could be done also

                                              and it would result in

                                              even bigger size reduction

                                              and in faster execution.

                                              This scheme is something

                                              similar to what you`ve

                                              described above: "LTO does

                                              - making monolithic types

                                              - merging all the members

                                              from different definitions

                                              of the same type into

                                              one".</p>

                                          </div>

                                        </blockquote>

                                        <div>I believe the reason that's

                                          probably not been done is that

                                          it can't be streamed - it'd

                                          lead to buffering more of the

                                          output </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>yes. The fact that DWARF should be

                                    streamed into AsmPrinter complicates

                                    parallel dwarf generation. In my

                                    prototype, I generate <br>

                                    several resulting files(each for one

                                    source compilation unit) and then

                                    sequentially glue them into the

                                    final resulting file.<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>How does that help? Do you use

                                relocations in those intermediate object

                                files so the DWARF in them can refer

                                across files? <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <p>It does not help with referring across the

                          file. It helps to parallel the generation of

                          CU bodies. <br>

                          It is not possible to write two CUs in

                          parallel into AsmPrinter. To make possible

                          parallel generation I stream them into

                          different AsmPrinters(this comment is for "I

                          believe the reason that's probably not been

                          done is that it can't be streamed". which

                          initially was about referring across the file,

                          but it seems I added another direction).<br>

                        </p>

                      </div>

                    </blockquote>

                    <div>Oh, I see - thanks for explaining, essentially

                      buffering on-disk. <br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p> </p>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> </p>

                                  <p><br>

                                  </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <div>(if two of these expandable

                                          types were in one CU - the

                                          start of the second type

                                          couldn't be known until the

                                          end because it might keep

                                          getting pushed later due to

                                          expansion of the first type)

                                          and/or having to revisit all

                                          the type references (the

                                          offset to the second type

                                          wouldn't be known until the

                                          end - so writing the offsets

                                          to refer to the type would

                                          have to be deferred until

                                          then).<br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>That is the second problem: offsets

                                    are not known until the end of file.<br>

                                    dsymutil already has that situation

                                    for inter-CU references, so it has

                                    extra pass to<br>

                                    fixup offsets. </p>

                                </div>

                              </blockquote>

                              <div>Oh, it does? I figured it was

                                one-pass, and that it only ever refers

                                back to types in previous CUs? So it

                                doesn't have to go back and do a second

                                pass. But I guess if sees a declaration

                                of T1 in CU1, then later on sees a

                                definition of T1 in CU2, does it somehow

                                go back to CU1 and remove the

                                declaration/make references refer to the

                                definition in CU2? I figured it'd just

                                leave the declaration and references to

                                it as-is, then add the definition and

                                use that from CU2 onwards? <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <p>For the processing of the types, it do not go

                          back. <br>

                          This "I figured it was one-pass, and that it

                          only ever refers back to types in previous

                          CUs" <br>

                          and this "I figured it'd just leave the

                          declaration and references to it as-is, then

                          add the definition and use that from CU2

                          onwards" are correct. <br>

                        </p>

                      </div>

                    </blockquote>

                    <div>Great - thanks for explaining/confirming! </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p> <br>

                        </p>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p>With multi-thread implementation

                                    such situation would arise more

                                    often <br>

                                    for type references and so more

                                    offsets should be fixed during

                                    additional pass.<br>

                                  </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>DWARFLinker could create

                                              additional artificial

                                              compile unit and put all

                                              merged types there. Later

                                              patch all type references

                                              to point into this

                                              additional compilation

                                              unit.  No any bits would

                                              be duplicated in that

                                              case. The performance

                                              improvement could be

                                              achieved due to less

                                              amount of the copied DWARF

                                              and due to the fact that

                                              type references could be

                                              updated when DWARF is

                                              cloned(no need in

                                              additional pass for that).<br>

                                            </p>

                                          </div>

                                        </blockquote>

                                        <div>"later patch all type

                                          references to point into this

                                          additional compilation unit" -

                                          that's the additional pass

                                          that people are probably

                                          talking/concerned about.

                                          Rewalking all the DWARF. The

                                          current dsymutil approach, as

                                          far as I know, is single pass

                                          - it knows the final, absolute

                                          offset to the type from the

                                          moment it emits that

                                          type/needs to refer to it. <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>Right. Current dsymutil approach is

                                    single pass. And from that point of

                                    view, solution <br>

                                    which you`ve described(to produce a

                                    declaration of the type, with the

                                    relevant members) <br>

                                    allows to keep that single pass

                                    implementation.<br>

                                    <br>

                                    But there is a restriction for

                                    current dsymutil approach: To

                                    process inter-CU references <br>

                                    it needs to load all DWARF into the

                                    memory(While it analyzes which part

                                    of DWARF is live, <br>

                                    it needs to have all CUs loaded into

                                    the memory).</p>

                                </div>

                              </blockquote>

                              <div>All DWARF for a single file (which

                                for dsymutil is mostly a single CU,

                                except with LTO I guess?), not all DWARF

                                for all inputs in memory at once, yeah?

                                <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <p>right. In dsymutil case - all DWARF for a

                          single file(not all DWARF for all inputs in

                          memory at once).<br>

                          But in llvm-dwarfutil case single file

                          contains DWARF for all original input object

                          files and it all becomes<br>

                          loaded into memory.<br>

                        </p>

                      </div>

                    </blockquote>

                    <div>Yeha, would be great to try to go CU-by-CU. </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p>That leads to huge memory usage. <br>

                                    It is less important when source is

                                    a set of object files(like in

                                    dsymutil case) and this <br>

                                    become a real problem for

                                    llvm-dwarfutil utility when source

                                    is a single file(With current <br>

                                    implementation it needs 30G of

                                    memory for compiling clang binary).<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Yeah, that's where I think you'd need

                                a fixup pass one way or another -

                                because cross-CU references can mean

                                that when you figure out a new layout

                                for CU5 (because it has a duplicate type

                                definition of something in CU1) then you

                                might have to touch CU4 that had an

                                absolute/cross-CU forward reference to

                                CU5. Once you've got such a fixup pass

                                (if dsymutil already has one? Which,

                                like I said, I'm confused why it would

                                have one/that doesn't match my very

                                vague understanding) then I think you

                                could make dsymutil work on a per-CU

                                basis streaming things out, then fixing

                                up a few offsets.<br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <p>When dsymutil deduplicates types it changes

                          local CU reference into inter-CU reference(so

                          that CU2(next) could reference type definition

                          from CU1(prev)). To do this change it does not

                          need to do any fixups currently.<br>

                          <br>

                          When dsymutil meets already existed(located in

                          the input object file) inter-CU reference

                          pointing into the CU which has not been

                          processed yet(and then its offset is unknown)

                          it marks it as "forward reference" and patches

                          later during additional pass "fixup forward

                          references" at a time when offsets are known.

                          <br>

                        </p>

                      </div>

                    </blockquote>

                    <div>OK, so limited 2 pass system. (does it do that

                      second pass once at the end of the whole dsymutil

                      run, or at the end of each input file? (so if an

                      input file has two CUs and the first CU references

                      a type in the second CU - it could write the first

                      CU with a "forward reference", then write the

                      second CU, then fixup the forward reference - and

                      then go on to the next file and its CUs - this

                      could improve performance by touching recently

                      used memory/disk pages only, rather than going all

                      the way back to the start later on when those

                      pages have become cold)</div>

                  </div>

                </div>

              </blockquote>

              <p>yes, It does it in the end of each input file.</p>

              <p><br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p> <br>

                          If CUs would be processed in parallel their

                          offsets would not be known at the moment when

                          local type reference would be changed into

                          inter-CU reference. So we would need to do the

                          same fix-up processing for all references to

                          the types like we already do for other

                          inter-CU references.<br>

                        </p>

                      </div>

                    </blockquote>

                    <div>Yeah - though the existence of this second

                      "fixup forward references" system - yeah, could

                      just use it much more generally as you say. Not an

                      extra pass, just the existing second pass but

                      having way more fixups to fixup in that pass.</div>

                  </div>

                </div>

              </blockquote>

              If we would be able to change the algorithm in such way :

              <br>

              <br>

              1. analyse all CUs.<br>

              2. clone all CUs.<br>

              <br>

              Then we could create a merged type table(artificial CU

              containing types) during step1. <br>

              If that type table would be written first, then all

              following CUs could use known offsets <br>

              to the types and we would not need additional fix-up

              processing for type references. <br>

              It would still be necessary to fix-up other inter-CU

              references. But it would not be necessary <br>

              to fix-up type references (which constitute the vast

              majority).<br>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>To me, that sounds more expensive than the fixup forward

            references pass.</div>

        </div>

      </div>

    </blockquote>

    <p>If we would speak about direct comparison then yes loading DWARF

      one more time looks more expensive than fixup forward references

      pass. But if we would speak about the general picture then it

      could probably be beneficial:<br>

      <br>

      1. merging types would lead to a smaller size of resulting DWARF.

      This would speed up the process.<br>

         f.e. If we would switch "odr types deduplication" off in

      current implementation then it would increase execution time two

      times. That is because more DWARF should be cloned and written in

      the result. Implementation of "merging types" would probably have

      a similar effect <br>

         - It would speed-up the overall process. So from one side

      additional step for loading DWARF would <br>

         decrease performance but a smaller amount of resulting data

      would increase performance.<br>

         <br>

      2. When types would be put in the first CU then we would have a

      simple strategy for our liveness analysis algorithm: just always

      keep the first CU in memory. This allows us to speed up our

      liveness analysis step.<br>

      <br>

      Anyway, all the above is just an idea for future work. Currently,

      I am going to implement multithread processing for CUs loaded into

      memory and having the same type of processing as it currently

      is(Which assumes that "fixup forward references pass" started to

      do more work by fixing types references).<br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div> <br>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p> <br>

                        </p>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p>Without loading all CU into the

                                    memory it would require two passes

                                    solution. First to analyze <br>

                                    which part of DWARF relates to live

                                    code and then second pass to

                                    generate the result. <br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Not sure it'd require any more second

                                pass than a "fixup" pass, which it

                                sounds like you're saying it already

                                has? <br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        <p>It looks like it would need an additional

                          pass to process inter-CU references(existed in

                          incoming file) if we do not want to load all

                          CUs into memory.<br>

                        </p>

                      </div>

                    </blockquote>

                    <div>Usually inter-CU references aren't used, except

                      in LTO - and in LTO all the DWARF deduplication

                      and function discarding is already done by the IR

                      linker anyway. (ThinLTO is a bit different, but

                      really we'd be better off teaching it the extra

                      tricks anyway (some can't be fixed in ThinLTO -

                      like emitting a "Home" definition of an inline

                      function, only to find out other ThinLTO

                      backend/shards managed to optimize away all uses

                      of the function... so some cleanup may be useful

                      there)). It might be possible to do a more

                      dynamic/rolling cache - keep only the CUs with

                      unresolved cross-CU references alive and only keep

                      them alive until their cross-CU references are

                      found/marked alive. This should make things no

                      worse than the traditional dsymutil case - since

                      cross-CU references are only effective/generally

                      used within a single object file (it's possible to

                      create relocations for them into other files - but

                      I know LLVM doesn't currently do this and I don't

                      think GCC does it) with multiple CUs anyway - so

                      at most you'd keep all the CUs from a single

                      original input file alive together.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              But, since it is a DWARF documented case the tool should

              be ready for such case(when inter-CU <br>

              references are heavily used).</div>

          </blockquote>

          <div><br>

            Sure - but by implementing a CU liveness window like that

            (keeping CUs live only so long as they need to be rather

            than an all-or-nothing approach) only especially quirky

            inputs would hit the worst case while the more normal inputs

            could perform better.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>It is not clear what should be put in such CU liveness window. If

      CU100 references CU1 - how could we know that we need to put CU1

      into CU liveness window before we processed CU100?<br></p></div></blockquote><div>Fair point, not just forward references to worry about but backward references too. I wonder how much savings there is in the liveness analysis compared to "keep one copy of everything, no matter whether it's live or not", then it can be a pure forward progress situation. (with the quirk that you might emit a declaration for an entity once, then a definition for it later - alternatively if a declaration is seen it could be skipped under the assumption that a definition will follow (& use a forward ref fixup) - and if none is found, splat some stub declarations into a trailing CU at the end) <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>

    </p>

    <p><br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div> Moreover, llvm-dwarfutil would be the tool producing <br>

              exactly such situation. The resulting file(produced by

              llvm-dwarfutil) would contain a lot of <br>

              inter-CU references. Probably, there is no practical

              reasons to apply llvm-dwarfutil to the same <br>

              file twice but it would be a good test for the tool.<br>

            </div>

          </blockquote>

          <div><br>

            It'd be a good stress test, but not necessarily something

            that would need to perform the best because it wouldn't be a

            common use case.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>I agree that we should not slow down the DWARFLinker in common

      cases only because we need to support the worst cases.<br>

      But we also need to implement a solution which works in some

      acceptable manner for the worst case. </p></div></blockquote><div>I think that depends on "acceptable" - correct, yes. Practical to run in reasonable time/memory? Not necessarily, in my opinion. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>The current solution -

      loading everything in memory - makes it hard to use in a

      non-dsymutil scenario(llvm-dwarfutil).<br></p></div></blockquote><div>I agree it's worth exploring the non-dsymutil scenario, as you are - I'm just saying we don't necessarily need to support high usability (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil'd binary (but as you've pointed out, the "window" is unknowable because of backward references, so this whole subthread is perhaps irrelevant).<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>

    </p>

    <p>There could be several things which could be used to decide

      whether we need to go on a light or heavy path:<br>

      <br>

      1. If the input contains only a single CU we do not need to unload

      it from memory. Thus - we would not need to do an extra DWARF

      loading pass.<br>

      2. If abbreviations from the whole input file do not contain

      inter-CU references then while doing liveness analysis,  we do not

      need to wait until other CUs are processed.<br></p></div></blockquote><div>(2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I'd still wonder if a more general implementation strategy could be found that would make it easier to get a sliding scale of efficiency depending on how much inter-CU references where were, not a "if there are none it's good, if there are any it's bad or otherwise very different to implement". <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>

      <br>

      Then that scheme would be used for worst cases:<br>

      <br>

      1. for (CU : CU1...CU100) {<br>

           load CU.<br>

           analyse CU.<br>

           unload CU.<br>

         }    <br>

      2. for (CU : CU1...CU100) {<br>

           load CU.<br>

           clone CU.<br>

           unload CU.<br>

         }    <br>

      3. fixup forward references.<br>

      <br>

      and that scheme for light cases:<br>

      <br>

      1. for (CU : CU1...CU100) {<br>

           load CU.<br>

           analyse CU.<br>

           clone CU.<br>

           unload CU.<br>

         }<br>

      2. fixup forward references.<br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>Generally, I think we should not assume that inter-CU

              references would be used in a limited way.<br>

              <br>

              Anyway, if this scheme:  <br>

              <br>

              1. analyse all CUs.<br>

              2. clone all CUs.<br>

              <p>would work slow then we would need to continue with

                one-pass solution and not support complex closely

                coupled inputs.<br>

              </p>

            </div>

          </blockquote>

          <div><br>

          </div>

          <div>yeah, certainly seeing the data/experiments will be

            interesting, if you end up implementing some different

            strategies, etc.<br>

            <br>

            I guess one possibility for parallel generation could be

            something more like Microsoft's approach with a central

            debug info server that compilers communicate with - not that

            exact model, I mean, but if you've got parallel threads

            generating reduced DWARF into separate object files - they

            could communicate with a single thread responsible for type

            emission - the type emitter would be given types from the

            separate threads and compute their size, queue them up to be

            streamed out to the type CU (& keep the source CU alive

            until that work was done) - such a central type emitter

            could quickly determine the size of the type to be emitted

            and compute future type offsets (eg: if 5 types were in the

            queue, it could've figured out the offset of those types

            already) to answer type offset queries quickly and unblock

            the parallel threads to continue emitting their CUs

            containing type references.<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>yes. Thank you. Would think about it.<br>

    </p>

    <p>Alexey.<br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div><br>

            - Dave </div>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div></div>