<div dir="ltr">I think if we're in the realm of DWARF extensions a whole bunch of other considerations come into it (& indeed, your suggested proposal may be a good one - but I think it's a very wide problem space once we're considering DWARF extensions). Mostly I was making arguments/suggestions/thoughts on the basis of being compatible with all existing DWARF producers.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Nov 1, 2020 at 2:05 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  <div>

    <p><br>

    </p>

    <div>On 28.10.2020 20:38, David Blaikie

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr">

        <div dir="ltr"><br>

        </div>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020 at 6:01

            AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p><br>

              </p>

              <div>On 28.10.2020 01:49, David Blaikie wrote:<br>

              </div>

              <blockquote type="cite">

                <div dir="ltr">

                  <div dir="ltr"><br>

                  </div>

                  <br>

                  <div class="gmail_quote">

                    <div dir="ltr" class="gmail_attr">On Tue, Oct 27,

                      2020 at 12:34 PM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

                      wrote:<br>

                    </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>

                        <p><br>

                        </p>

                        <div>On 27.10.2020 20:32, David Blaikie wrote:<br>

                        </div>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div dir="ltr"><br>

                            </div>

                            <br>

                            <div class="gmail_quote">

                              <div dir="ltr" class="gmail_attr">On Tue,

                                Oct 27, 2020 at 1:23 AM Alexey Lapshin

                                <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

                                wrote:<br>

                              </div>

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p><br>

                                  </p>

                                  <div>On 26.10.2020 22:38, David

                                    Blaikie wrote:<br>

                                  </div>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div dir="ltr"><br>

                                      </div>

                                      <br>

                                      <div class="gmail_quote">

                                        <div dir="ltr" class="gmail_attr">On Sun, Oct

                                          25, 2020 at 9:31 AM Alexey

                                          Lapshin <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>

                                          wrote:<br>

                                        </div>

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p><br>

                                            </p>

                                            <div>On 23.10.2020 19:43,

                                              David Blaikie wrote:<br>

                                            </div>

                                            <blockquote type="cite">

                                              <div dir="ltr">

                                                <div class="gmail_quote">

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                    <blockquote type="cite">

                                                      <div dir="ltr">

                                                        <div class="gmail_quote">

                                                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                          <div><br>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <div><br>

                                                          </div>

                                                          <div>Ah, yeah

                                                          - that seems

                                                          like a missed

                                                          opportunity -

                                                          duplicating

                                                          the whole type

                                                          DIE. LTO does

                                                          this by making

                                                          monolithic

                                                          types -

                                                          merging all

                                                          the members

                                                          from different

                                                          definitions of

                                                          the same type

                                                          into one, but

                                                          that's maybe

                                                          too expensive

                                                          for dsymutil

                                                          (might still

                                                          be interesting

                                                          to know how

                                                          much more

                                                          expensive,

                                                          etc). But I

                                                          think the

                                                          other way to

                                                          go would be to

                                                          produce a

                                                          declaration of

                                                          the type, with

                                                          the relevant

                                                          members - and

                                                          let the DWARF

                                                          consumer

                                                          identify this

                                                          declaration as

                                                          matching up

                                                          with the

                                                          earlier

                                                          definition.

                                                          That's the

                                                          sort of DWARF

                                                          you get from

                                                          the non-MachO

                                                          default

                                                          -fno-standalone-debug

                                                          anyway, so

                                                          it's already

                                                          pretty well

                                                          tested/supported

                                                          (support in

                                                          lldb's a bit

                                                          younger/more

                                                          work-in-progress,

                                                          admittedly). I

                                                          wonder how

                                                          much dsym size

                                                          there is that

                                                          could be

                                                          reduced by

                                                          such an

                                                          implementation.</div>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                    <p>I see. Yes, that

                                                      could be done and

                                                      I think it would

                                                      result in

                                                      noticeable size

                                                      reduction(I do not

                                                      know exact numbers

                                                      at the moment).</p>

                                                    <p>I work on

                                                      multi-thread

                                                      DWARFLinker now

                                                      and it`s first

                                                      version will do

                                                      exactly the same

                                                      type processing

                                                      like current

                                                      dsymutil.</p>

                                                  </blockquote>

                                                  <div>Yeah, best to

                                                    keep the behavior

                                                    the same through

                                                    that</div>

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                    <div>

                                                      <p>Above scheme

                                                        could be

                                                        implemented as a

                                                        next step and it

                                                        would result in

                                                        better size

                                                        reduction(better

                                                        than current

                                                        state).</p>

                                                      <p>But I think the

                                                        better scheme

                                                        could be done

                                                        also and it

                                                        would result in

                                                        even bigger size

                                                        reduction and in

                                                        faster

                                                        execution. This

                                                        scheme is

                                                        something

                                                        similar to what

                                                        you`ve described

                                                        above: "LTO does

                                                        - making

                                                        monolithic types

                                                        - merging all

                                                        the members from

                                                        different

                                                        definitions of

                                                        the same type

                                                        into one".</p>

                                                    </div>

                                                  </blockquote>

                                                  <div>I believe the

                                                    reason that's

                                                    probably not been

                                                    done is that it

                                                    can't be streamed -

                                                    it'd lead to

                                                    buffering more of

                                                    the output </div>

                                                </div>

                                              </div>

                                            </blockquote>

                                            <p>yes. The fact that DWARF

                                              should be streamed into

                                              AsmPrinter complicates

                                              parallel dwarf generation.

                                              In my prototype, I

                                              generate <br>

                                              several resulting

                                              files(each for one source

                                              compilation unit) and then

                                              sequentially glue them

                                              into the final resulting

                                              file.<br>

                                            </p>

                                          </div>

                                        </blockquote>

                                        <div>How does that help? Do you

                                          use relocations in those

                                          intermediate object files so

                                          the DWARF in them can refer

                                          across files? <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>It does not help with referring

                                    across the file. It helps to

                                    parallel the generation of CU

                                    bodies. <br>

                                    It is not possible to write two CUs

                                    in parallel into AsmPrinter. To make

                                    possible parallel generation I

                                    stream them into different

                                    AsmPrinters(this comment is for "I

                                    believe the reason that's probably

                                    not been done is that it can't be

                                    streamed". which initially was about

                                    referring across the file, but it

                                    seems I added another direction).<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Oh, I see - thanks for explaining,

                                essentially buffering on-disk. <br>

                              </div>

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p> </p>

                                            <p><br>

                                            </p>

                                            <blockquote type="cite">

                                              <div dir="ltr">

                                                <div class="gmail_quote">

                                                  <div>(if two of these

                                                    expandable types

                                                    were in one CU - the

                                                    start of the second

                                                    type couldn't be

                                                    known until the end

                                                    because it might

                                                    keep getting pushed

                                                    later due to

                                                    expansion of the

                                                    first type) and/or

                                                    having to revisit

                                                    all the type

                                                    references (the

                                                    offset to the second

                                                    type wouldn't be

                                                    known until the end

                                                    - so writing the

                                                    offsets to refer to

                                                    the type would have

                                                    to be deferred until

                                                    then).<br>

                                                  </div>

                                                </div>

                                              </div>

                                            </blockquote>

                                            <p>That is the second

                                              problem: offsets are not

                                              known until the end of

                                              file.<br>

                                              dsymutil already has that

                                              situation for inter-CU

                                              references, so it has

                                              extra pass to<br>

                                              fixup offsets. </p>

                                          </div>

                                        </blockquote>

                                        <div>Oh, it does? I figured it

                                          was one-pass, and that it only

                                          ever refers back to types in

                                          previous CUs? So it doesn't

                                          have to go back and do a

                                          second pass. But I guess if

                                          sees a declaration of T1 in

                                          CU1, then later on sees a

                                          definition of T1 in CU2, does

                                          it somehow go back to CU1 and

                                          remove the declaration/make

                                          references refer to the

                                          definition in CU2? I figured

                                          it'd just leave the

                                          declaration and references to

                                          it as-is, then add the

                                          definition and use that from

                                          CU2 onwards? <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>For the processing of the types, it

                                    do not go back. <br>

                                    This "I figured it was one-pass, and

                                    that it only ever refers back to

                                    types in previous CUs" <br>

                                    and this "I figured it'd just leave

                                    the declaration and references to it

                                    as-is, then add the definition and

                                    use that from CU2 onwards" are

                                    correct. <br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Great - thanks for

                                explaining/confirming! </div>

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> <br>

                                  </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>With multi-thread

                                              implementation such

                                              situation would arise more

                                              often <br>

                                              for type references and so

                                              more offsets should be

                                              fixed during additional

                                              pass.<br>

                                            </p>

                                            <blockquote type="cite">

                                              <div dir="ltr">

                                                <div class="gmail_quote">

                                                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                                    <div>

                                                      <p>DWARFLinker

                                                        could create

                                                        additional

                                                        artificial

                                                        compile unit and

                                                        put all merged

                                                        types there.

                                                        Later patch all

                                                        type references

                                                        to point into

                                                        this additional

                                                        compilation

                                                        unit.  No any

                                                        bits would be

                                                        duplicated in

                                                        that case. The

                                                        performance

                                                        improvement

                                                        could be

                                                        achieved due to

                                                        less amount of

                                                        the copied DWARF

                                                        and due to the

                                                        fact that type

                                                        references could

                                                        be updated when

                                                        DWARF is

                                                        cloned(no need

                                                        in additional

                                                        pass for that).<br>

                                                      </p>

                                                    </div>

                                                  </blockquote>

                                                  <div>"later patch all

                                                    type references to

                                                    point into this

                                                    additional

                                                    compilation unit" -

                                                    that's the

                                                    additional pass that

                                                    people are probably

                                                    talking/concerned

                                                    about. Rewalking all

                                                    the DWARF. The

                                                    current dsymutil

                                                    approach, as far as

                                                    I know, is single

                                                    pass - it knows the

                                                    final, absolute

                                                    offset to the type

                                                    from the moment it

                                                    emits that

                                                    type/needs to refer

                                                    to it. <br>

                                                  </div>

                                                </div>

                                              </div>

                                            </blockquote>

                                            <p>Right. Current dsymutil

                                              approach is single pass.

                                              And from that point of

                                              view, solution <br>

                                              which you`ve described(to

                                              produce a declaration of

                                              the type, with the

                                              relevant members) <br>

                                              allows to keep that single

                                              pass implementation.<br>

                                              <br>

                                              But there is a restriction

                                              for current dsymutil

                                              approach: To process

                                              inter-CU references <br>

                                              it needs to load all DWARF

                                              into the memory(While it

                                              analyzes which part of

                                              DWARF is live, <br>

                                              it needs to have all CUs

                                              loaded into the memory).</p>

                                          </div>

                                        </blockquote>

                                        <div>All DWARF for a single file

                                          (which for dsymutil is mostly

                                          a single CU, except with LTO I

                                          guess?), not all DWARF for all

                                          inputs in memory at once,

                                          yeah? <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>right. In dsymutil case - all DWARF

                                    for a single file(not all DWARF for

                                    all inputs in memory at once).<br>

                                    But in llvm-dwarfutil case single

                                    file contains DWARF for all original

                                    input object files and it all

                                    becomes<br>

                                    loaded into memory.<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Yeha, would be great to try to go

                                CU-by-CU. </div>

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>That leads to huge memory

                                              usage. <br>

                                              It is less important when

                                              source is a set of object

                                              files(like in dsymutil

                                              case) and this <br>

                                              become a real problem for

                                              llvm-dwarfutil utility

                                              when source is a single

                                              file(With current <br>

                                              implementation it needs

                                              30G of memory for

                                              compiling clang binary).<br>

                                            </p>

                                          </div>

                                        </blockquote>

                                        <div>Yeah, that's where I think

                                          you'd need a fixup pass one

                                          way or another - because

                                          cross-CU references can mean

                                          that when you figure out a new

                                          layout for CU5 (because it has

                                          a duplicate type definition of

                                          something in CU1) then you

                                          might have to touch CU4 that

                                          had an absolute/cross-CU

                                          forward reference to CU5. Once

                                          you've got such a fixup pass

                                          (if dsymutil already has one?

                                          Which, like I said, I'm

                                          confused why it would have

                                          one/that doesn't match my very

                                          vague understanding) then I

                                          think you could make dsymutil

                                          work on a per-CU basis

                                          streaming things out, then

                                          fixing up a few offsets.<br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>When dsymutil deduplicates types it

                                    changes local CU reference into

                                    inter-CU reference(so that CU2(next)

                                    could reference type definition from

                                    CU1(prev)). To do this change it

                                    does not need to do any fixups

                                    currently.<br>

                                    <br>

                                    When dsymutil meets already

                                    existed(located in the input object

                                    file) inter-CU reference pointing

                                    into the CU which has not been

                                    processed yet(and then its offset is

                                    unknown) it marks it as "forward

                                    reference" and patches later during

                                    additional pass "fixup forward

                                    references" at a time when offsets

                                    are known. <br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>OK, so limited 2 pass system. (does

                                it do that second pass once at the end

                                of the whole dsymutil run, or at the end

                                of each input file? (so if an input file

                                has two CUs and the first CU references

                                a type in the second CU - it could write

                                the first CU with a "forward reference",

                                then write the second CU, then fixup the

                                forward reference - and then go on to

                                the next file and its CUs - this could

                                improve performance by touching recently

                                used memory/disk pages only, rather than

                                going all the way back to the start

                                later on when those pages have become

                                cold)</div>

                            </div>

                          </div>

                        </blockquote>

                        <p>yes, It does it in the end of each input

                          file.</p>

                        <p><br>

                        </p>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> <br>

                                    If CUs would be processed in

                                    parallel their offsets would not be

                                    known at the moment when local type

                                    reference would be changed into

                                    inter-CU reference. So we would need

                                    to do the same fix-up processing for

                                    all references to the types like we

                                    already do for other inter-CU

                                    references.<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Yeah - though the existence of this

                                second "fixup forward references" system

                                - yeah, could just use it much more

                                generally as you say. Not an extra pass,

                                just the existing second pass but having

                                way more fixups to fixup in that pass.</div>

                            </div>

                          </div>

                        </blockquote>

                        If we would be able to change the algorithm in

                        such way : <br>

                        <br>

                        1. analyse all CUs.<br>

                        2. clone all CUs.<br>

                        <br>

                        Then we could create a merged type

                        table(artificial CU containing types) during

                        step1. <br>

                        If that type table would be written first, then

                        all following CUs could use known offsets <br>

                        to the types and we would not need additional

                        fix-up processing for type references. <br>

                        It would still be necessary to fix-up other

                        inter-CU references. But it would not be

                        necessary <br>

                        to fix-up type references (which constitute the

                        vast majority).<br>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>To me, that sounds more expensive than the

                      fixup forward references pass.</div>

                  </div>

                </div>

              </blockquote>

              <p>If we would speak about direct comparison then yes

                loading DWARF one more time looks more expensive than

                fixup forward references pass. But if we would speak

                about the general picture then it could probably be

                beneficial:<br>

                <br>

                1. merging types would lead to a smaller size of

                resulting DWARF. This would speed up the process.<br>

                   f.e. If we would switch "odr types deduplication" off

                in current implementation then it would increase

                execution time two times. That is because more DWARF

                should be cloned and written in the result.

                Implementation of "merging types" would probably have a

                similar effect <br>

                   - It would speed-up the overall process. So from one

                side additional step for loading DWARF would <br>

                   decrease performance but a smaller amount of

                resulting data would increase performance.<br>

                   <br>

                2. When types would be put in the first CU then we would

                have a simple strategy for our liveness analysis

                algorithm: just always keep the first CU in memory. This

                allows us to speed up our liveness analysis step.<br>

                <br>

                Anyway, all the above is just an idea for future work.

                Currently, I am going to implement multithread

                processing for CUs loaded into memory and having the

                same type of processing as it currently is(Which assumes

                that "fixup forward references pass" started to do more

                work by fixing types references).<br>

              </p>

              <p><br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div> <br>

                        <blockquote type="cite">

                          <div dir="ltr">

                            <div class="gmail_quote">

                              <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                <div>

                                  <p> <br>

                                  </p>

                                  <blockquote type="cite">

                                    <div dir="ltr">

                                      <div class="gmail_quote">

                                        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                                          <div>

                                            <p>Without loading all CU

                                              into the memory it would

                                              require two passes

                                              solution. First to analyze

                                              <br>

                                              which part of DWARF

                                              relates to live code and

                                              then second pass to

                                              generate the result. <br>

                                            </p>

                                          </div>

                                        </blockquote>

                                        <div>Not sure it'd require any

                                          more second pass than a

                                          "fixup" pass, which it sounds

                                          like you're saying it already

                                          has? <br>

                                        </div>

                                      </div>

                                    </div>

                                  </blockquote>

                                  <p>It looks like it would need an

                                    additional pass to process inter-CU

                                    references(existed in incoming file)

                                    if we do not want to load all CUs

                                    into memory.<br>

                                  </p>

                                </div>

                              </blockquote>

                              <div>Usually inter-CU references aren't

                                used, except in LTO - and in LTO all the

                                DWARF deduplication and function

                                discarding is already done by the IR

                                linker anyway. (ThinLTO is a bit

                                different, but really we'd be better off

                                teaching it the extra tricks anyway

                                (some can't be fixed in ThinLTO - like

                                emitting a "Home" definition of an

                                inline function, only to find out other

                                ThinLTO backend/shards managed to

                                optimize away all uses of the

                                function... so some cleanup may be

                                useful there)). It might be possible to

                                do a more dynamic/rolling cache - keep

                                only the CUs with unresolved cross-CU

                                references alive and only keep them

                                alive until their cross-CU references

                                are found/marked alive. This should make

                                things no worse than the traditional

                                dsymutil case - since cross-CU

                                references are only effective/generally

                                used within a single object file (it's

                                possible to create relocations for them

                                into other files - but I know LLVM

                                doesn't currently do this and I don't

                                think GCC does it) with multiple CUs

                                anyway - so at most you'd keep all the

                                CUs from a single original input file

                                alive together.<br>

                              </div>

                            </div>

                          </div>

                        </blockquote>

                        But, since it is a DWARF documented case the

                        tool should be ready for such case(when inter-CU

                        <br>

                        references are heavily used).</div>

                    </blockquote>

                    <div><br>

                      Sure - but by implementing a CU liveness window

                      like that (keeping CUs live only so long as they

                      need to be rather than an all-or-nothing approach)

                      only especially quirky inputs would hit the worst

                      case while the more normal inputs could perform

                      better.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>It is not clear what should be put in such CU liveness

                window. If CU100 references CU1 - how could we know that

                we need to put CU1 into CU liveness window before we

                processed CU100?<br>

              </p>

            </div>

          </blockquote>

          <div>Fair point, not just forward references to worry about

            but backward references too. I wonder how much savings there

            is in the liveness analysis compared to "keep one copy of

            everything, no matter whether it's live or not", then it can

            be a pure forward progress situation. (with the quirk that

            you might emit a declaration for an entity once, then a

            definition for it later - alternatively if a declaration is

            seen it could be skipped under the assumption that a

            definition will follow (& use a forward ref fixup) - and

            if none is found, splat some stub declarations into a

            trailing CU at the end) <br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <p><br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div> Moreover, llvm-dwarfutil would be the tool

                        producing <br>

                        exactly such situation. The resulting

                        file(produced by llvm-dwarfutil) would contain a

                        lot of <br>

                        inter-CU references. Probably, there is no

                        practical reasons to apply llvm-dwarfutil to the

                        same <br>

                        file twice but it would be a good test for the

                        tool.<br>

                      </div>

                    </blockquote>

                    <div><br>

                      It'd be a good stress test, but not necessarily

                      something that would need to perform the best

                      because it wouldn't be a common use case.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>I agree that we should not slow down the DWARFLinker in

                common cases only because we need to support the worst

                cases.<br>

                But we also need to implement a solution which works in

                some acceptable manner for the worst case. </p>

            </div>

          </blockquote>

          <div>I think that depends on "acceptable" - correct, yes.

            Practical to run in reasonable time/memory? Not necessarily,

            in my opinion. <br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>The current solution - loading everything in memory -

                makes it hard to use in a non-dsymutil

                scenario(llvm-dwarfutil).<br>

              </p>

            </div>

          </blockquote>

          <div>I agree it's worth exploring the non-dsymutil scenario,

            as you are - I'm just saying we don't necessarily need to

            support high usability (fast/low memory usage/etc)

            llvm-dwarfutil on an already dwarfutil'd binary (but as

            you've pointed out, the "window" is unknowable because of

            backward references, so this whole subthread is perhaps

            irrelevant).<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <p>There could be several things which could be used to

                decide whether we need to go on a light or heavy path:<br>

                <br>

                1. If the input contains only a single CU we do not need

                to unload it from memory. Thus - we would not need to do

                an extra DWARF loading pass.<br>

                2. If abbreviations from the whole input file do not

                contain inter-CU references then while doing liveness

                analysis,  we do not need to wait until other CUs are

                processed.<br>

              </p>

            </div>

          </blockquote>

          <div>(2) Yeah, that /may/ be a good idea, cheap to test, etc.

            Though I'd still wonder if a more general implementation

            strategy could be found that would make it easier to get a

            sliding scale of efficiency depending on how much inter-CU

            references where were, not a "if there are none it's good,

            if there are any it's bad or otherwise very different to

            implement". <br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>I think, there is a scenario which would make it possible to

      process CU once for not referenced CUs and handle inter-CU

      references in a scalable way(even for dwarfutil`d binary):<br>

      <br>

      1. Implement a global type's table and types merging. This allows

      us to have all types in the memory. <br>

         Then, all inter-CU type references would point into that memory

      type table. <br>

         (we do not know which CU should be put into CU liveness window,

      we also could not put all CUs into the memory, but we could put

      all types into the memory).<br>

      <br>

      2. If there are not other inter-CU references then all CUs would

      be handled by one pass.<br>

      <br>

      3. If there are other inter-CU references, then after all CU

      processed by the first pass we would have a list of referenced

      CUs. Then, we could delete already cloned data(for referenced CU)

      and start the process again: <br>

         load CU, mark liveness, clone data. This second pass would be

      done for only referenced CUs. <br>

         For not-complex, not closely coupled cases it would work

      relatively fast.<br>

      <br>

      4. put memory type table into artificial CU. Update all type`s

      references.<br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> <br>

                Then that scheme would be used for worst cases:<br>

                <br>

                1. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     analyse CU.<br>

                     unload CU.<br>

                   }    <br>

                2. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     clone CU.<br>

                     unload CU.<br>

                   }    <br>

                3. fixup forward references.<br>

                <br>

                and that scheme for light cases:<br>

                <br>

                1. for (CU : CU1...CU100) {<br>

                     load CU.<br>

                     analyse CU.<br>

                     clone CU.<br>

                     unload CU.<br>

                   }<br>

                2. fixup forward references.<br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div>Generally, I think we should not assume that

                        inter-CU references would be used in a limited

                        way.<br>

                        <br>

                        Anyway, if this scheme:  <br>

                        <br>

                        1. analyse all CUs.<br>

                        2. clone all CUs.<br>

                        <p>would work slow then we would need to

                          continue with one-pass solution and not

                          support complex closely coupled inputs.<br>

                        </p>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>yeah, certainly seeing the data/experiments

                      will be interesting, if you end up implementing

                      some different strategies, etc.<br>

                      <br>

                      I guess one possibility for parallel generation

                      could be something more like Microsoft's approach

                      with a central debug info server that compilers

                      communicate with - not that exact model, I mean,

                      but if you've got parallel threads generating

                      reduced DWARF into separate object files - they

                      could communicate with a single thread responsible

                      for type emission - the type emitter would be

                      given types from the separate threads and compute

                      their size, queue them up to be streamed out to

                      the type CU (& keep the source CU alive until

                      that work was done) - such a central type emitter

                      could quickly determine the size of the type to be

                      emitted and compute future type offsets (eg: if 5

                      types were in the queue, it could've figured out

                      the offset of those types already) to answer type

                      offset queries quickly and unblock the parallel

                      threads to continue emitting their CUs containing

                      type references.<br>

                    </div>

                  </div>

                </div>

              </blockquote>

              <p>yes. Thank you. Would think about it.<br>

              </p>

              <p>Alexey.<br>

              </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div><br>

                      - Dave </div>

                  </div>

                </div>

              </blockquote>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </div>

</blockquote></div>