<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 02.11.2020 04:11, David Blaikie

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAENS6EtSYim_Cpdkr6gFGmfjRbpwyaf+RegitKmHZq5+edG+7w@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">I think if we're in the realm of DWARF extensions a

        whole bunch of other considerations come into it (& indeed,

        your suggested proposal may be a good one - but I think it's a

        very wide problem space once we're considering DWARF

        extensions). Mostly I was making arguments/suggestions/thoughts

        on the basis of being compatible with all existing DWARF

        producers.</div>

    </blockquote>

    <p>the described scenario does not assume DWARF extensions. global

      type table is not new DWARF construction. This is an artificial CU

      keeping all types. That solution would be compatible with existing

      DWARF consumers/produces.<br>

    </p>

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAENS6EtSYim_Cpdkr6gFGmfjRbpwyaf+RegitKmHZq5+edG+7w@mail.gmail.com"><br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Sun, Nov 1, 2020 at 2:05 PM

          Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"

            moz-do-not-send="true">avl.lapshin@gmail.com</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

          <div>

            <p><br>

            </p>

            <div>On 28.10.2020 20:38, David Blaikie wrote:<br>

            </div>

            <blockquote type="cite">

              <div dir="ltr">

                <div dir="ltr"><br>

                </div>

                <br>

                <div class="gmail_quote">

                  <div dir="ltr" class="gmail_attr">On Wed, Oct 28, 2020

                    at 6:01 AM Alexey Lapshin <<a

                      href="mailto:avl.lapshin@gmail.com"

                      target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                    wrote:<br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p><br>

                      </p>

                      <div>On 28.10.2020 01:49, David Blaikie wrote:<br>

                      </div>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div dir="ltr"><br>

                          </div>

                          <br>

                          <div class="gmail_quote">

                            <div dir="ltr" class="gmail_attr">On Tue,

                              Oct 27, 2020 at 12:34 PM Alexey Lapshin

                              <<a href="mailto:avl.lapshin@gmail.com"

                                target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                              wrote:<br>

                            </div>

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <div>

                                <p><br>

                                </p>

                                <div>On 27.10.2020 20:32, David Blaikie

                                  wrote:<br>

                                </div>

                                <blockquote type="cite">

                                  <div dir="ltr">

                                    <div dir="ltr"><br>

                                    </div>

                                    <br>

                                    <div class="gmail_quote">

                                      <div dir="ltr" class="gmail_attr">On

                                        Tue, Oct 27, 2020 at 1:23 AM

                                        Alexey Lapshin <<a

                                          href="mailto:avl.lapshin@gmail.com"

                                          target="_blank"

                                          moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                                        wrote:<br>

                                      </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <p><br>

                                          </p>

                                          <div>On 26.10.2020 22:38,

                                            David Blaikie wrote:<br>

                                          </div>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div dir="ltr"><br>

                                              </div>

                                              <br>

                                              <div class="gmail_quote">

                                                <div dir="ltr"

                                                  class="gmail_attr">On

                                                  Sun, Oct 25, 2020 at

                                                  9:31 AM Alexey Lapshin

                                                  <<a

                                                    href="mailto:avl.lapshin@gmail.com"

                                                    target="_blank"

                                                    moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                                                  wrote:<br>

                                                </div>

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0px 0px

                                                  0px

                                                  0.8ex;border-left:1px

                                                  solid

                                                  rgb(204,204,204);padding-left:1ex">

                                                  <div>

                                                    <p><br>

                                                    </p>

                                                    <div>On 23.10.2020

                                                      19:43, David

                                                      Blaikie wrote:<br>

                                                    </div>

                                                    <blockquote

                                                      type="cite">

                                                      <div dir="ltr">

                                                        <div

                                                          class="gmail_quote">

                                                          <blockquote

                                                          class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid

                                                          rgb(204,204,204);padding-left:1ex">

                                                          <blockquote

                                                          type="cite">

                                                          <div dir="ltr">

                                                          <div

                                                          class="gmail_quote">

                                                          <blockquote

                                                          class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid

                                                          rgb(204,204,204);padding-left:1ex">

                                                          <div><br>

                                                          <br>

                                                          </div>

                                                          </blockquote>

                                                          <div><br>

                                                          </div>

                                                          <div>Ah, yeah

                                                          - that seems

                                                          like a missed

                                                          opportunity -

                                                          duplicating

                                                          the whole type

                                                          DIE. LTO does

                                                          this by making

                                                          monolithic

                                                          types -

                                                          merging all

                                                          the members

                                                          from different

                                                          definitions of

                                                          the same type

                                                          into one, but

                                                          that's maybe

                                                          too expensive

                                                          for dsymutil

                                                          (might still

                                                          be interesting

                                                          to know how

                                                          much more

                                                          expensive,

                                                          etc). But I

                                                          think the

                                                          other way to

                                                          go would be to

                                                          produce a

                                                          declaration of

                                                          the type, with

                                                          the relevant

                                                          members - and

                                                          let the DWARF

                                                          consumer

                                                          identify this

                                                          declaration as

                                                          matching up

                                                          with the

                                                          earlier

                                                          definition.

                                                          That's the

                                                          sort of DWARF

                                                          you get from

                                                          the non-MachO

                                                          default

                                                          -fno-standalone-debug

                                                          anyway, so

                                                          it's already

                                                          pretty well

                                                          tested/supported

                                                          (support in

                                                          lldb's a bit

                                                          younger/more

                                                          work-in-progress,

                                                          admittedly). I

                                                          wonder how

                                                          much dsym size

                                                          there is that

                                                          could be

                                                          reduced by

                                                          such an

                                                          implementation.</div>

                                                          </div>

                                                          </div>

                                                          </blockquote>

                                                          <p>I see. Yes,

                                                          that could be

                                                          done and I

                                                          think it would

                                                          result in

                                                          noticeable

                                                          size

                                                          reduction(I do

                                                          not know exact

                                                          numbers at the

                                                          moment).</p>

                                                          <p>I work on

                                                          multi-thread

                                                          DWARFLinker

                                                          now and it`s

                                                          first version

                                                          will do

                                                          exactly the

                                                          same type

                                                          processing

                                                          like current

                                                          dsymutil.</p>

                                                          </blockquote>

                                                          <div>Yeah,

                                                          best to keep

                                                          the behavior

                                                          the same

                                                          through that</div>

                                                          <blockquote

                                                          class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid

                                                          rgb(204,204,204);padding-left:1ex">

                                                          <div>

                                                          <p>Above

                                                          scheme could

                                                          be implemented

                                                          as a next step

                                                          and it would

                                                          result in

                                                          better size

                                                          reduction(better

                                                          than current

                                                          state).</p>

                                                          <p>But I think

                                                          the better

                                                          scheme could

                                                          be done also

                                                          and it would

                                                          result in even

                                                          bigger size

                                                          reduction and

                                                          in faster

                                                          execution.

                                                          This scheme is

                                                          something

                                                          similar to

                                                          what you`ve

                                                          described

                                                          above: "LTO

                                                          does - making

                                                          monolithic

                                                          types -

                                                          merging all

                                                          the members

                                                          from different

                                                          definitions of

                                                          the same type

                                                          into one".</p>

                                                          </div>

                                                          </blockquote>

                                                          <div>I believe

                                                          the reason

                                                          that's

                                                          probably not

                                                          been done is

                                                          that it can't

                                                          be streamed -

                                                          it'd lead to

                                                          buffering more

                                                          of the output

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                    <p>yes. The fact

                                                      that DWARF should

                                                      be streamed into

                                                      AsmPrinter

                                                      complicates

                                                      parallel dwarf

                                                      generation. In my

                                                      prototype, I

                                                      generate <br>

                                                      several resulting

                                                      files(each for one

                                                      source compilation

                                                      unit) and then

                                                      sequentially glue

                                                      them into the

                                                      final resulting

                                                      file.<br>

                                                    </p>

                                                  </div>

                                                </blockquote>

                                                <div>How does that help?

                                                  Do you use relocations

                                                  in those intermediate

                                                  object files so the

                                                  DWARF in them can

                                                  refer across files? <br>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>It does not help with

                                            referring across the file.

                                            It helps to parallel the

                                            generation of CU bodies. <br>

                                            It is not possible to write

                                            two CUs in parallel into

                                            AsmPrinter. To make possible

                                            parallel generation I stream

                                            them into different

                                            AsmPrinters(this comment is

                                            for "I believe the reason

                                            that's probably not been

                                            done is that it can't be

                                            streamed". which initially

                                            was about referring across

                                            the file, but it seems I

                                            added another direction).<br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>Oh, I see - thanks for

                                        explaining, essentially

                                        buffering on-disk. <br>

                                      </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <p> </p>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div class="gmail_quote">

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0px 0px

                                                  0px

                                                  0.8ex;border-left:1px

                                                  solid

                                                  rgb(204,204,204);padding-left:1ex">

                                                  <div>

                                                    <p> </p>

                                                    <p><br>

                                                    </p>

                                                    <blockquote

                                                      type="cite">

                                                      <div dir="ltr">

                                                        <div

                                                          class="gmail_quote">

                                                          <div>(if two

                                                          of these

                                                          expandable

                                                          types were in

                                                          one CU - the

                                                          start of the

                                                          second type

                                                          couldn't be

                                                          known until

                                                          the end

                                                          because it

                                                          might keep

                                                          getting pushed

                                                          later due to

                                                          expansion of

                                                          the first

                                                          type) and/or

                                                          having to

                                                          revisit all

                                                          the type

                                                          references

                                                          (the offset to

                                                          the second

                                                          type wouldn't

                                                          be known until

                                                          the end - so

                                                          writing the

                                                          offsets to

                                                          refer to the

                                                          type would

                                                          have to be

                                                          deferred until

                                                          then).<br>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                    <p>That is the

                                                      second problem:

                                                      offsets are not

                                                      known until the

                                                      end of file.<br>

                                                      dsymutil already

                                                      has that situation

                                                      for inter-CU

                                                      references, so it

                                                      has extra pass to<br>

                                                      fixup offsets. </p>

                                                  </div>

                                                </blockquote>

                                                <div>Oh, it does? I

                                                  figured it was

                                                  one-pass, and that it

                                                  only ever refers back

                                                  to types in previous

                                                  CUs? So it doesn't

                                                  have to go back and do

                                                  a second pass. But I

                                                  guess if sees a

                                                  declaration of T1 in

                                                  CU1, then later on

                                                  sees a definition of

                                                  T1 in CU2, does it

                                                  somehow go back to CU1

                                                  and remove the

                                                  declaration/make

                                                  references refer to

                                                  the definition in CU2?

                                                  I figured it'd just

                                                  leave the declaration

                                                  and references to it

                                                  as-is, then add the

                                                  definition and use

                                                  that from CU2 onwards?

                                                  <br>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>For the processing of the

                                            types, it do not go back. <br>

                                            This "I figured it was

                                            one-pass, and that it only

                                            ever refers back to types in

                                            previous CUs" <br>

                                            and this "I figured it'd

                                            just leave the declaration

                                            and references to it as-is,

                                            then add the definition and

                                            use that from CU2 onwards"

                                            are correct. <br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>Great - thanks for

                                        explaining/confirming! </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <p> <br>

                                          </p>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div class="gmail_quote">

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0px 0px

                                                  0px

                                                  0.8ex;border-left:1px

                                                  solid

                                                  rgb(204,204,204);padding-left:1ex">

                                                  <div>

                                                    <p>With multi-thread

                                                      implementation

                                                      such situation

                                                      would arise more

                                                      often <br>

                                                      for type

                                                      references and so

                                                      more offsets

                                                      should be fixed

                                                      during additional

                                                      pass.<br>

                                                    </p>

                                                    <blockquote

                                                      type="cite">

                                                      <div dir="ltr">

                                                        <div

                                                          class="gmail_quote">

                                                          <blockquote

                                                          class="gmail_quote"

style="margin:0px 0px 0px 0.8ex;border-left:1px solid

                                                          rgb(204,204,204);padding-left:1ex">

                                                          <div>

                                                          <p>DWARFLinker

                                                          could create

                                                          additional

                                                          artificial

                                                          compile unit

                                                          and put all

                                                          merged types

                                                          there. Later

                                                          patch all type

                                                          references to

                                                          point into

                                                          this

                                                          additional

                                                          compilation

                                                          unit.  No any

                                                          bits would be

                                                          duplicated in

                                                          that case. The

                                                          performance

                                                          improvement

                                                          could be

                                                          achieved due

                                                          to less amount

                                                          of the copied

                                                          DWARF and due

                                                          to the fact

                                                          that type

                                                          references

                                                          could be

                                                          updated when

                                                          DWARF is

                                                          cloned(no need

                                                          in additional

                                                          pass for

                                                          that).<br>

                                                          </p>

                                                          </div>

                                                          </blockquote>

                                                          <div>"later

                                                          patch all type

                                                          references to

                                                          point into

                                                          this

                                                          additional

                                                          compilation

                                                          unit" - that's

                                                          the additional

                                                          pass that

                                                          people are

                                                          probably

                                                          talking/concerned

                                                          about.

                                                          Rewalking all

                                                          the DWARF. The

                                                          current

                                                          dsymutil

                                                          approach, as

                                                          far as I know,

                                                          is single pass

                                                          - it knows the

                                                          final,

                                                          absolute

                                                          offset to the

                                                          type from the

                                                          moment it

                                                          emits that

                                                          type/needs to

                                                          refer to it. <br>

                                                          </div>

                                                        </div>

                                                      </div>

                                                    </blockquote>

                                                    <p>Right. Current

                                                      dsymutil approach

                                                      is single pass.

                                                      And from that

                                                      point of view,

                                                      solution <br>

                                                      which you`ve

                                                      described(to

                                                      produce a

                                                      declaration of the

                                                      type, with the

                                                      relevant members)

                                                      <br>

                                                      allows to keep

                                                      that single pass

                                                      implementation.<br>

                                                      <br>

                                                      But there is a

                                                      restriction for

                                                      current dsymutil

                                                      approach: To

                                                      process inter-CU

                                                      references <br>

                                                      it needs to load

                                                      all DWARF into the

                                                      memory(While it

                                                      analyzes which

                                                      part of DWARF is

                                                      live, <br>

                                                      it needs to have

                                                      all CUs loaded

                                                      into the memory).</p>

                                                  </div>

                                                </blockquote>

                                                <div>All DWARF for a

                                                  single file (which for

                                                  dsymutil is mostly a

                                                  single CU, except with

                                                  LTO I guess?), not all

                                                  DWARF for all inputs

                                                  in memory at once,

                                                  yeah? <br>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>right. In dsymutil case -

                                            all DWARF for a single

                                            file(not all DWARF for all

                                            inputs in memory at once).<br>

                                            But in llvm-dwarfutil case

                                            single file contains DWARF

                                            for all original input

                                            object files and it all

                                            becomes<br>

                                            loaded into memory.<br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>Yeha, would be great to try

                                        to go CU-by-CU. </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div class="gmail_quote">

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0px 0px

                                                  0px

                                                  0.8ex;border-left:1px

                                                  solid

                                                  rgb(204,204,204);padding-left:1ex">

                                                  <div>

                                                    <p>That leads to

                                                      huge memory usage.

                                                      <br>

                                                      It is less

                                                      important when

                                                      source is a set of

                                                      object files(like

                                                      in dsymutil case)

                                                      and this <br>

                                                      become a real

                                                      problem for

                                                      llvm-dwarfutil

                                                      utility when

                                                      source is a single

                                                      file(With current

                                                      <br>

                                                      implementation it

                                                      needs 30G of

                                                      memory for

                                                      compiling clang

                                                      binary).<br>

                                                    </p>

                                                  </div>

                                                </blockquote>

                                                <div>Yeah, that's where

                                                  I think you'd need a

                                                  fixup pass one way or

                                                  another - because

                                                  cross-CU references

                                                  can mean that when you

                                                  figure out a new

                                                  layout for CU5

                                                  (because it has a

                                                  duplicate type

                                                  definition of

                                                  something in CU1) then

                                                  you might have to

                                                  touch CU4 that had an

                                                  absolute/cross-CU

                                                  forward reference to

                                                  CU5. Once you've got

                                                  such a fixup pass (if

                                                  dsymutil already has

                                                  one? Which, like I

                                                  said, I'm confused why

                                                  it would have one/that

                                                  doesn't match my very

                                                  vague understanding)

                                                  then I think you could

                                                  make dsymutil work on

                                                  a per-CU basis

                                                  streaming things out,

                                                  then fixing up a few

                                                  offsets.<br>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>When dsymutil deduplicates

                                            types it changes local CU

                                            reference into inter-CU

                                            reference(so that CU2(next)

                                            could reference type

                                            definition from CU1(prev)).

                                            To do this change it does

                                            not need to do any fixups

                                            currently.<br>

                                            <br>

                                            When dsymutil meets already

                                            existed(located in the input

                                            object file) inter-CU

                                            reference pointing into the

                                            CU which has not been

                                            processed yet(and then its

                                            offset is unknown) it marks

                                            it as "forward reference"

                                            and patches later during

                                            additional pass "fixup

                                            forward references" at a

                                            time when offsets are known.

                                            <br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>OK, so limited 2 pass system.

                                        (does it do that second pass

                                        once at the end of the whole

                                        dsymutil run, or at the end of

                                        each input file? (so if an input

                                        file has two CUs and the first

                                        CU references a type in the

                                        second CU - it could write the

                                        first CU with a "forward

                                        reference", then write the

                                        second CU, then fixup the

                                        forward reference - and then go

                                        on to the next file and its CUs

                                        - this could improve performance

                                        by touching recently used

                                        memory/disk pages only, rather

                                        than going all the way back to

                                        the start later on when those

                                        pages have become cold)</div>

                                    </div>

                                  </div>

                                </blockquote>

                                <p>yes, It does it in the end of each

                                  input file.</p>

                                <p><br>

                                </p>

                                <blockquote type="cite">

                                  <div dir="ltr">

                                    <div class="gmail_quote">

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <p> <br>

                                            If CUs would be processed in

                                            parallel their offsets would

                                            not be known at the moment

                                            when local type reference

                                            would be changed into

                                            inter-CU reference. So we

                                            would need to do the same

                                            fix-up processing for all

                                            references to the types like

                                            we already do for other

                                            inter-CU references.<br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>Yeah - though the existence

                                        of this second "fixup forward

                                        references" system - yeah, could

                                        just use it much more generally

                                        as you say. Not an extra pass,

                                        just the existing second pass

                                        but having way more fixups to

                                        fixup in that pass.</div>

                                    </div>

                                  </div>

                                </blockquote>

                                If we would be able to change the

                                algorithm in such way : <br>

                                <br>

                                1. analyse all CUs.<br>

                                2. clone all CUs.<br>

                                <br>

                                Then we could create a merged type

                                table(artificial CU containing types)

                                during step1. <br>

                                If that type table would be written

                                first, then all following CUs could use

                                known offsets <br>

                                to the types and we would not need

                                additional fix-up processing for type

                                references. <br>

                                It would still be necessary to fix-up

                                other inter-CU references. But it would

                                not be necessary <br>

                                to fix-up type references (which

                                constitute the vast majority).<br>

                              </div>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>To me, that sounds more expensive than

                              the fixup forward references pass.</div>

                          </div>

                        </div>

                      </blockquote>

                      <p>If we would speak about direct comparison then

                        yes loading DWARF one more time looks more

                        expensive than fixup forward references pass.

                        But if we would speak about the general picture

                        then it could probably be beneficial:<br>

                        <br>

                        1. merging types would lead to a smaller size of

                        resulting DWARF. This would speed up the

                        process.<br>

                           f.e. If we would switch "odr types

                        deduplication" off in current implementation

                        then it would increase execution time two times.

                        That is because more DWARF should be cloned and

                        written in the result. Implementation of

                        "merging types" would probably have a similar

                        effect <br>

                           - It would speed-up the overall process. So

                        from one side additional step for loading DWARF

                        would <br>

                           decrease performance but a smaller amount of

                        resulting data would increase performance.<br>

                           <br>

                        2. When types would be put in the first CU then

                        we would have a simple strategy for our liveness

                        analysis algorithm: just always keep the first

                        CU in memory. This allows us to speed up our

                        liveness analysis step.<br>

                        <br>

                        Anyway, all the above is just an idea for future

                        work. Currently, I am going to implement

                        multithread processing for CUs loaded into

                        memory and having the same type of processing as

                        it currently is(Which assumes that "fixup

                        forward references pass" started to do more work

                        by fixing types references).<br>

                      </p>

                      <p><br>

                      </p>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <div> <br>

                                <blockquote type="cite">

                                  <div dir="ltr">

                                    <div class="gmail_quote">

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

                                        0.8ex;border-left:1px solid

                                        rgb(204,204,204);padding-left:1ex">

                                        <div>

                                          <p> <br>

                                          </p>

                                          <blockquote type="cite">

                                            <div dir="ltr">

                                              <div class="gmail_quote">

                                                <blockquote

                                                  class="gmail_quote"

                                                  style="margin:0px 0px

                                                  0px

                                                  0.8ex;border-left:1px

                                                  solid

                                                  rgb(204,204,204);padding-left:1ex">

                                                  <div>

                                                    <p>Without loading

                                                      all CU into the

                                                      memory it would

                                                      require two passes

                                                      solution. First to

                                                      analyze <br>

                                                      which part of

                                                      DWARF relates to

                                                      live code and then

                                                      second pass to

                                                      generate the

                                                      result. <br>

                                                    </p>

                                                  </div>

                                                </blockquote>

                                                <div>Not sure it'd

                                                  require any more

                                                  second pass than a

                                                  "fixup" pass, which it

                                                  sounds like you're

                                                  saying it already has?

                                                  <br>

                                                </div>

                                              </div>

                                            </div>

                                          </blockquote>

                                          <p>It looks like it would need

                                            an additional pass to

                                            process inter-CU

                                            references(existed in

                                            incoming file) if we do not

                                            want to load all CUs into

                                            memory.<br>

                                          </p>

                                        </div>

                                      </blockquote>

                                      <div>Usually inter-CU references

                                        aren't used, except in LTO - and

                                        in LTO all the DWARF

                                        deduplication and function

                                        discarding is already done by

                                        the IR linker anyway. (ThinLTO

                                        is a bit different, but really

                                        we'd be better off teaching it

                                        the extra tricks anyway (some

                                        can't be fixed in ThinLTO - like

                                        emitting a "Home" definition of

                                        an inline function, only to find

                                        out other ThinLTO backend/shards

                                        managed to optimize away all

                                        uses of the function... so some

                                        cleanup may be useful there)).

                                        It might be possible to do a

                                        more dynamic/rolling cache -

                                        keep only the CUs with

                                        unresolved cross-CU references

                                        alive and only keep them alive

                                        until their cross-CU references

                                        are found/marked alive. This

                                        should make things no worse than

                                        the traditional dsymutil case -

                                        since cross-CU references are

                                        only effective/generally used

                                        within a single object file

                                        (it's possible to create

                                        relocations for them into other

                                        files - but I know LLVM doesn't

                                        currently do this and I don't

                                        think GCC does it) with multiple

                                        CUs anyway - so at most you'd

                                        keep all the CUs from a single

                                        original input file alive

                                        together.<br>

                                      </div>

                                    </div>

                                  </div>

                                </blockquote>

                                But, since it is a DWARF documented case

                                the tool should be ready for such

                                case(when inter-CU <br>

                                references are heavily used).</div>

                            </blockquote>

                            <div><br>

                              Sure - but by implementing a CU liveness

                              window like that (keeping CUs live only so

                              long as they need to be rather than an

                              all-or-nothing approach) only especially

                              quirky inputs would hit the worst case

                              while the more normal inputs could perform

                              better.<br>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                      <p>It is not clear what should be put in such CU

                        liveness window. If CU100 references CU1 - how

                        could we know that we need to put CU1 into CU

                        liveness window before we processed CU100?<br>

                      </p>

                    </div>

                  </blockquote>

                  <div>Fair point, not just forward references to worry

                    about but backward references too. I wonder how much

                    savings there is in the liveness analysis compared

                    to "keep one copy of everything, no matter whether

                    it's live or not", then it can be a pure forward

                    progress situation. (with the quirk that you might

                    emit a declaration for an entity once, then a

                    definition for it later - alternatively if a

                    declaration is seen it could be skipped under the

                    assumption that a definition will follow (& use

                    a forward ref fixup) - and if none is found, splat

                    some stub declarations into a trailing CU at the

                    end) <br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p> </p>

                      <p><br>

                      </p>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <div> Moreover, llvm-dwarfutil would be

                                the tool producing <br>

                                exactly such situation. The resulting

                                file(produced by llvm-dwarfutil) would

                                contain a lot of <br>

                                inter-CU references. Probably, there is

                                no practical reasons to apply

                                llvm-dwarfutil to the same <br>

                                file twice but it would be a good test

                                for the tool.<br>

                              </div>

                            </blockquote>

                            <div><br>

                              It'd be a good stress test, but not

                              necessarily something that would need to

                              perform the best because it wouldn't be a

                              common use case.<br>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                      <p>I agree that we should not slow down the

                        DWARFLinker in common cases only because we need

                        to support the worst cases.<br>

                        But we also need to implement a solution which

                        works in some acceptable manner for the worst

                        case. </p>

                    </div>

                  </blockquote>

                  <div>I think that depends on "acceptable" - correct,

                    yes. Practical to run in reasonable time/memory? Not

                    necessarily, in my opinion. <br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p>The current solution - loading everything in

                        memory - makes it hard to use in a non-dsymutil

                        scenario(llvm-dwarfutil).<br>

                      </p>

                    </div>

                  </blockquote>

                  <div>I agree it's worth exploring the non-dsymutil

                    scenario, as you are - I'm just saying we don't

                    necessarily need to support high usability (fast/low

                    memory usage/etc) llvm-dwarfutil on an already

                    dwarfutil'd binary (but as you've pointed out, the

                    "window" is unknowable because of backward

                    references, so this whole subthread is perhaps

                    irrelevant).<br>

                  </div>

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p> </p>

                      <p>There could be several things which could be

                        used to decide whether we need to go on a light

                        or heavy path:<br>

                        <br>

                        1. If the input contains only a single CU we do

                        not need to unload it from memory. Thus - we

                        would not need to do an extra DWARF loading

                        pass.<br>

                        2. If abbreviations from the whole input file do

                        not contain inter-CU references then while doing

                        liveness analysis,  we do not need to wait until

                        other CUs are processed.<br>

                      </p>

                    </div>

                  </blockquote>

                  <div>(2) Yeah, that /may/ be a good idea, cheap to

                    test, etc. Though I'd still wonder if a more general

                    implementation strategy could be found that would

                    make it easier to get a sliding scale of efficiency

                    depending on how much inter-CU references where

                    were, not a "if there are none it's good, if there

                    are any it's bad or otherwise very different to

                    implement". <br>

                  </div>

                </div>

              </div>

            </blockquote>

            <p>I think, there is a scenario which would make it possible

              to process CU once for not referenced CUs and handle

              inter-CU references in a scalable way(even for dwarfutil`d

              binary):<br>

              <br>

              1. Implement a global type's table and types merging. This

              allows us to have all types in the memory. <br>

                 Then, all inter-CU type references would point into

              that memory type table. <br>

                 (we do not know which CU should be put into CU liveness

              window, we also could not put all CUs into the memory, but

              we could put all types into the memory).<br>

              <br>

              2. If there are not other inter-CU references then all CUs

              would be handled by one pass.<br>

              <br>

              3. If there are other inter-CU references, then after all

              CU processed by the first pass we would have a list of

              referenced CUs. Then, we could delete already cloned

              data(for referenced CU) and start the process again: <br>

                 load CU, mark liveness, clone data. This second pass

              would be done for only referenced CUs. <br>

                 For not-complex, not closely coupled cases it would

              work relatively fast.<br>

              <br>

              4. put memory type table into artificial CU. Update all

              type`s references.<br>

            </p>

            <p><br>

            </p>

            <blockquote type="cite">

              <div dir="ltr">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px

                    0px 0.8ex;border-left:1px solid

                    rgb(204,204,204);padding-left:1ex">

                    <div>

                      <p> <br>

                        Then that scheme would be used for worst cases:<br>

                        <br>

                        1. for (CU : CU1...CU100) {<br>

                             load CU.<br>

                             analyse CU.<br>

                             unload CU.<br>

                           }    <br>

                        2. for (CU : CU1...CU100) {<br>

                             load CU.<br>

                             clone CU.<br>

                             unload CU.<br>

                           }    <br>

                        3. fixup forward references.<br>

                        <br>

                        and that scheme for light cases:<br>

                        <br>

                        1. for (CU : CU1...CU100) {<br>

                             load CU.<br>

                             analyse CU.<br>

                             clone CU.<br>

                             unload CU.<br>

                           }<br>

                        2. fixup forward references.<br>

                      </p>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div class="gmail_quote">

                            <div> </div>

                            <blockquote class="gmail_quote"

                              style="margin:0px 0px 0px

                              0.8ex;border-left:1px solid

                              rgb(204,204,204);padding-left:1ex">

                              <div>Generally, I think we should not

                                assume that inter-CU references would be

                                used in a limited way.<br>

                                <br>

                                Anyway, if this scheme:  <br>

                                <br>

                                1. analyse all CUs.<br>

                                2. clone all CUs.<br>

                                <p>would work slow then we would need to

                                  continue with one-pass solution and

                                  not support complex closely coupled

                                  inputs.<br>

                                </p>

                              </div>

                            </blockquote>

                            <div><br>

                            </div>

                            <div>yeah, certainly seeing the

                              data/experiments will be interesting, if

                              you end up implementing some different

                              strategies, etc.<br>

                              <br>

                              I guess one possibility for parallel

                              generation could be something more like

                              Microsoft's approach with a central debug

                              info server that compilers communicate

                              with - not that exact model, I mean, but

                              if you've got parallel threads generating

                              reduced DWARF into separate object files -

                              they could communicate with a single

                              thread responsible for type emission - the

                              type emitter would be given types from the

                              separate threads and compute their size,

                              queue them up to be streamed out to the

                              type CU (& keep the source CU alive

                              until that work was done) - such a central

                              type emitter could quickly determine the

                              size of the type to be emitted and compute

                              future type offsets (eg: if 5 types were

                              in the queue, it could've figured out the

                              offset of those types already) to answer

                              type offset queries quickly and unblock

                              the parallel threads to continue emitting

                              their CUs containing type references.<br>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                      <p>yes. Thank you. Would think about it.<br>

                      </p>

                      <p>Alexey.<br>

                      </p>

                      <blockquote type="cite">

                        <div dir="ltr">

                          <div class="gmail_quote">

                            <div><br>

                              - Dave </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>