<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Jonas, <br>

      <br>

      Thank you for the comments, please find my answers below...<br>

    </p>

    <div class="moz-cite-prefix">On 06.08.2020 20:39, Jonas Devlieghere

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div dir="ltr">

          <div>Hi Alexey,</div>

          <div><br>

          </div>

          <div>I should've looked at this earlier. I went through the

            thread again and I've</div>

          <div>made some comments, mostly from the dsymutil point of

            view.</div>

          <div><br>

          </div>

          <div>> Current DWARFEmitter/DWARFStreamer has an

            implementation for DWARF</div>

          <div>> generation, which does not support DWARF5(only

            debug_names table). At the</div>

          <div>> same time, there already exists code in

            CodeGen/AsmPrinter/DwarfDebug.h,</div>

          <div>> which implements most of DWARF5. It seems that

            DWARFEmitter/DWARFStreamer</div>

          <div>> should be rewritten using DwarfDebug/DwarfFile.

            Though I am not sure</div>

          <div>> whether it would be easy to re-use

            DwarfDebug/DwarfFile. It would probably</div>

          <div>> be necessary to separate some intermediate level of

            DwarfDebug/DwarfFile.</div>

          <div><br>

          </div>

          <div>These classes serve very different purposes. Last time I

            looked at them there</div>

          <div>was very little overlap in functionality. In the compiler

            we're mostly</div>

          <div>concerned with generating the DWARF, while in dsymutil we

            try to copy</div>

          <div>everything we don't need to parse, and fix up what we

            have to. I don't want</div>

          <div>to say it's not possible, but I think supporting DWARF5

            in those classes is</div>

          <div>going to be a lot less work than trying to reuse the

            CodeGen variants.</div>

        </div>

      </div>

    </blockquote>

    I agree, in it`s current state it would be less work to write

    separate implementation <br>

    than reusing CodeGen variants. The bad thing is that in such a case

    there is a lot of <br>

    code duplication:<br>

    <br>

    DwarfStreamer::emitUnitRangesEntries<br>

    DwarfDebug::emitDebugARanges<br>

    EmitGenDwarfAranges<br>

    DWARFYAML::emitDebugAranges<br>

    <br>

    Supporting new standard would require rewriting/modification of all

    these places. In the ideal world,<br>

    having single implementation for the DWARF generation allows

    changing one place and having <br>

    benefits in others. Probably, CodeGen classes could be rewritten and

    then it would be useful<br>

    to write them assuming two use cases - generation from the scratch

    and copying/updating <br>

    existing data. In the end, there would be single implementation

    which could be reused in <br>

    many places. Though, it is indeed a lot of work.<br>

    <br>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div><br>

          </div>

          <div>> Measurements show that it is spent ~10 sec in</div>

          <div>> llvm::StringMapImpl::LookupBucketFor(). The problem

            is that the same</div>

          <div>> strings, again and again, are added to the string

            pool. Two attributes</div>

          <div>> having the same string value would be analyzed (hash

            calculated) and</div>

          <div>> searched inside the string pool. Even if these

            strings are already in</div>

          <div>> string table(DW_FORM_strp, DW_FORM_strx). The

            process could be optimized</div>

          <div>> for string tables. So that if some string from the

            string table were</div>

          <div>> accessed previously then, it would keep a reference

            into the string pool.</div>

          <div>> This would eliminate a lot of string pool searches.</div>

          <div><br>

          </div>

          <div>I'm not sure I fully understand the optimization, but I'd

            love to speed this</div>

          <div>up, if only for dsymutil's sake. I'd love to talk about

            this in a separate</div>

          <div>thread or offline.</div>

          <div><br>

          </div>

        </div>

      </div>

    </blockquote>

    The measurements show that quite a big time is taken <br>

    by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a

    string <br>

    pool takes a significant amount of time. The idea of optimization

    was to <br>

    reduce the number of string pool searches by remembering previous <br>

    results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself

    <br>

    but reference a string from a separate table by index. Currently. if

    there are <br>

    duplicated strings of DW_FORM_strp, DW_FORM_strx there would be <br>

    two/three/...(one per duplicate) searches in string pool <br>

    (llvm::StringMapImpl::LookupBucketFor() would be called). If the

    position <br>

    in the pool would be remembered for the index of the first duplicate

    <br>

    then there would not be necessary to call

    llvm::StringMapImpl::LookupBucketFor() next times.<br>

    <br>

    But prototyping of that idea did not show any worthful performance

    improvement. <br>

    <br>

    Some small performance improvement could be achieved if string pools

    would use <br>

    llvm::hash_value(StringRef S) instead of llvm::djbHash().

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div>> Currently, all object files are analyzed

            sequentially and cloned</div>

          <div>> sequentially. Cloning is started in parallel with

            analyzing. That scheme</div>

          <div>> could be changed: analyzing and cloning could be

            done in parallel for each</div>

          <div>> object file. That requires refactoring of

            DWARFLinker and making string</div>

          <div>> pools and DeclContextTree thread-safe.</div>

          <div><br>

          </div>

          <div>I'm less familiar with the way that LLD uses the

            DWARFOptimizer but this is</div>

          <div>not possible for dsymutil as it is trying to deduplicate

            DIEs from different</div>

          <div>compile units.</div>

        </div>

      </div>

    </blockquote>

    Right. dsymutil is trying to de-duplicate DIEs from different<br>

    compile units. That, probably, does not avoid multi-thread

    implementation: <br>

    <br>

    1. DeclContextTree.getChildDeclContext() should be done thread safe.<br>

        thus, even if CU would be processed in parallel - DIEs could be

    de-duplicated<br>

        based on DeclContext. <br>

    2. UniquingStringPool and OffsetsStringPool should also be done

    thread safe.<br>

    3. Since compilation units would be processed in parallel -<br>

        the size of the compilation unit would not be known until it is

    fully processed. <br>

        That means that all compilation unit's references should be

    patched after <br>

        CU content is generated. In the same manner like forward

    references <br>

        are currently patched(fixupForwardReferences).<br>

    4. DWARFStreamer provides a sequential interface. Instead of a

    single stream <br>

        as the output, there could be generated several outputs for each

    CU. <br>

        They would be glued together in the end.<br>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div><br>

          </div>

          <div>> I think improving dsymutil is a valuable thing.

            Though there are several</div>

          <div>> directions which might be considered to make it more

            robust:</div>

          <div>></div>

          <div>> 1. support of latest DWARF - DWARF5/DWARF64...</div>

          <div><br>

          </div>

          <div>Strong +1 on DWARF5. I haven't had the bandwidth yet to

            really look at this.</div>

          <div>Right now we can't find (at least some) rellocations so

            we bail out. I'd need</div>

          <div>to fix that to assess the current state of things and

            figure out how much</div>

          <div>work would be needed.</div>

          <div><br>

          </div>

          <div>I don't think anything in LLVM supports generating

            DWARF64 though.</div>

          <div><br>

          </div>

          <div>> 2. implement multi-threaded execution.</div>

          <div><br>

          </div>

          <div>See my earlier comment. At least for the dsymutil case,

            the current approach</div>

          <div>is the best we can do, but I'd love to be proven wrong.

            :-)</div>

          <div><br>

          </div>

          <div>> 3. support of split DWARF.</div>

          <div>> 4. implement dsymutil for non-darwin platform.</div>

          <div><br>

          </div>

          <div>These two seem to go together. Given the work you did to

            split off the DWARF</div>

          <div>optimization part I think we're closer to this than ever.

            Thanks again for</div>

          <div>doing that.</div>

          <div><br>

          </div>

          <div>> We considered three options:</div>

          <div>></div>

          <div>> 1. add new functionality into dsymutil. So that

            dsymutil behaves</div>

          <div>> differently on a non-darwin platform and supports

            another set of</div>

          <div>> command-line options.</div>

          <div>></div>

          <div>> 2. add new functionality into llvm-objcopy.

            llvm-objcopy already supports</div>

          <div>> various binary objects formats(MachO,ELF,COFF,wasm).

            It also has several</div>

          <div>> options to work with debug-info.</div>

          <div>></div>

          <div>> 3. create new utility llvm-dwarfutil which would

            implement the above</div>

          <div>> functionality and reuse DWARFLinker(extracted from

            dsymutil) library and</div>

          <div>> new library ObjectCopy(extracted from llvm-objcopy).</div>

          <div>></div>

          <div>> So far our preference is number three. The reason

            for this is that separate</div>

          <div>> utility specifically working with debug info looks

            as good separation of</div>

          <div>> concepts. Adding another behavior to dsymutil looks

            not very good.</div>

          <div><br>

          </div>

          <div>In its current state dsymutil itself is a pretty small

            tool on top of the</div>

          <div>DWARFOptimizer/Linker. I'm curious what the benefits of

            another tool are</div>

          <div>compared to a different frontend (like objcopy) for MachO

            and ELF. It seems</div>

          <div>like that would allow for separation of concerns, while

            still being able to</div>

          <div>share common code without having to push it all the way

            up into LLVM.</div>

        </div>

      </div>

    </blockquote>

    my concern is that this tool would have different source data and

    different set of options.<br>

    Having in mind that handling different set of input data and

    different set of options <br>

    means writing the other frontend - it, probably, would be good not

    to make dsymutil more complex but<br>

    to create another small tool. But, If extending dsymutil looks OK -

    I am OK with it. <br>

    Let`s discuss this approach within proposal thread.

    <p><br>

    </p>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div><br>

          </div>

          <div>> Extending the already rich interface of llvm-objcopy

            looks also not very</div>

          <div>> good. Having in mind that actual implementation

            would be shared by</div>

          <div>> libraries, the separate utility, working

            specifically with debug info,</div>

          <div>> looks like the right choice. That is our current

            idea.</div>

          <div><br>

          </div>

          <div>> My personal thought would be that extending dsymutil

            should be ok as the</div>

          <div>> functionality goes well with everything else

            dsymutil does (other than not</div>

          <div>> support ELF which the dsymutil maintainers are on

            board with last I</div>

          <div>> checked). That said, I definitely think a write-up

            will be helpful. No</div>

          <div>> matter what I support extracting all of the behavior

            into libraries and</div>

          <div>> using that somewhere :)</div>

          <div><br>

          </div>

          <div>Ha, so basically what I was trying to say above.</div>

          <div><br>

          </div>

          <div>I look forward to seeing the proposal!</div>

        </div>

      </div>

    </blockquote>

    <p>yep, would publish it soon.<br>

    </p>

    <p>Thank you, Alexey.<br>

    </p>

    <blockquote type="cite"

cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">

      <div dir="ltr">

        <div dir="ltr">

          <div><br>

          </div>

          <div>Cheers,</div>

          <div>Jonas</div>

          <div><br>

          </div>

        </div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Tue, Aug 4, 2020 at 11:33

          PM Eric Christopher <<a href="mailto:echristo@gmail.com"

            moz-do-not-send="true">echristo@gmail.com</a>> wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

          <div dir="ltr">

            <div dir="ltr">Hi Alexey,

              <div><br>

              </div>

              <div><br>

              </div>

            </div>

            <br>

            <div class="gmail_quote">

              <div dir="ltr" class="gmail_attr">On Mon, Aug 3, 2020 at

                8:32 AM Alexey Lapshin <<a

                  href="mailto:avl.lapshin@gmail.com" target="_blank"

                  moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                wrote:<br>

              </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                <div>

                  <p>Hi Eric, please <br>

                  </p>

                  <div>On 31.07.2020 22:02, Eric Christopher wrote:<br>

                  </div>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div>Hi Alexey,</div>

                      <br>

                      <div class="gmail_quote">

                        <div dir="ltr" class="gmail_attr">On Fri, Jul

                          31, 2020 at 4:02 AM Alexey Lapshin via

                          llvm-dev <<a

                            href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

                          wrote:<br>

                        </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>

                          On 28.07.2020 19:28, David Blaikie wrote:<br>

                          > On Tue, Jul 28, 2020 at 8:55 AM Alexey

                          Lapshin <<a

                            href="mailto:avl.lapshin@gmail.com"

                            target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>

                          wrote:<br>

                          >><br>

                          >> On 28.07.2020 10:29, David Blaikie

                          via llvm-dev wrote:<br>

                          >>> On Fri, Jun 26, 2020 at 9:28 AM

                          Alexey Lapshin<br>

                          >>> <<a

                            href="mailto:alapshin@accesssoftek.com"

                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>

                          wrote:<br>

>>>>>>>>>>>> This idea goes in

                          another direction than fragmenting dwarf<br>

>>>>>>>>>>>> using elf

                          sections&tricks. It seems to me that the

                          cost of fragmenting is too high.<br>

                          >>>>>>>>>>> I

                          tend to agree - but I'm sort of leaning

                          towards trying to use object<br>

                          >>>>>>>>>>>

                          features as much as possible, then

                          implementing just enough custom<br>

                          >>>>>>>>>>>

                          handling in the linker to recoup overhead,

                          etc. (eg: add some kind of<br>

                          >>>>>>>>>>>

                          small header/brief description that makes it

                          easy for the linker to<br>

                          >>>>>>>>>>>

                          slice-and-dice - but hopefully a

                          domain-specific such header can be a<br>

                          >>>>>>>>>>>

                          bit more compact than the fully general ELF

                          form)<br>

                          >>>>>>>>>> I

                          think this indeed should be implemented and

                          evaluated.<br>

                          >>>>>>>>>> So

                          that various approaches could be compared.<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> It is not only the

                          sizes of structures describing fragments but

                          also the complexity<br>

>>>>>>>>>>>> of tools that should be

                          taught to work with fragmented DWARF.<br>

>>>>>>>>>>>> (f.e. llvm-dwarfdump

                          applied to object file should be able to read

                          fragmented DWARF,<br>

>>>>>>>>>>>> but applied to linked

                          executable it should work with non-fragmented

                          DWARF).<br>

>>>>>>>>>>>> That idea is for the

                          tool which works the same way as dsymutil ODR.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> I will shortly describe

                          the idea of making DWARF be easier processed

                          by dsymutil/DWARFLinker:<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> The idea is to have

                          only one "type table" per object file(special

                          section .debug_types_table).<br>

>>>>>>>>>>>> This "type table" would

                          contain all types.<br>

>>>>>>>>>>>> There could be a

                          special type of reference - type_offset - that

                          offset points into the type table.<br>

>>>>>>>>>>>> Basic types could

                          always be placed into the start of "type

                          table" thus, offsets to basic types<br>

>>>>>>>>>>>> most often would be 1

                          byte. There also would be a special kind of

                          reference - reference inside the type.<br>

>>>>>>>>>>>> Type units sig8 system

                          - would not be used to reference types.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> Types deduplication is

                          assumed to be done, not by linker mechanism

                          for COMDAT,<br>

>>>>>>>>>>>> but by a tool like

                          dsymutil. This tool would create resulting

                          .debug_types_table by putting there<br>

>>>>>>>>>>>> types from source

                          .debug_types_table-s. Only one copy of the

                          type would be placed into the<br>

>>>>>>>>>>>> resulting table. All

                          references pointing to the deleted copy would

                          be corrected to point<br>

>>>>>>>>>>>> to the single copy

                          inside "type table". (that is how dsymutil

                          works currently)<br>

                          >>>>>>>>>>> ^

                          that's the step that's probably a bit

                          expensive for a general-use<br>

                          >>>>>>>>>>>

                          tool - it implies parsing all the DWARF to

                          find those references and<br>

                          >>>>>>>>>>>

                          rewrite them, I think. For a high-performance

                          solution that could be<br>

                          >>>>>>>>>>>

                          run by the linker I think it'd be necessary to

                          have a solution that<br>

                          >>>>>>>>>>>

                          doesn't involve parsing all the DIEs.<br>

                          >>>>>>>>>>

                          According to the current dsymutil processing,<br>

                          >>>>>>>>>>

                          exactly this process is not the most

                          time-consuming.<br>

                          >>>>>>>>>> That

                          could be done relatively fast.<br>

                          >>>>>>>>> Fair

                          enough - though I'd still imagine any solution

                          that involves<br>

                          >>>>>>>>> parsing

                          all the DIEs still wouldn't be fast enough

                          (maybe an order of<br>

                          >>>>>>>>> magnitude

                          faster than the current solution even - but

                          that's stuill,<br>

                          >>>>>>>>> what, 6

                          or 7x slower than linking without the

                          feature?) for most users<br>

                          >>>>>>>>> to

                          consider it a good trade-off.<br>

                          >>>>>>>> It seems to

                          me that even the current 6x-7x slowdown could

                          be useful.<br>

                          >>>>>>>> Users who

                          already use dsymutil or llvm-dwp(assuming

                          DWARFLinker<br>

                          >>>>>>>> would be

                          taught to work with a split dwarf) tools spend

                          this time and,<br>

                          >>>>>>>> in some

                          scenarios, waste disk space by inter-mediate

                          files.<br>

                          >>>>>>> FWIW, dwp

                          (llvm-dwp hasn't really been optimized

                          compared to binutils<br>

                          >>>>>>> dwp) is designed

                          to be very quick - by not needing to do a lot

                          of<br>

                          >>>>>>> parsing/fixups.

                          Which, yes, means larger output files than

                          would be<br>

                          >>>>>>> possible with

                          more parsing/etc. It also doesn't take any

                          input from<br>

                          >>>>>>> the linker (so it

                          can run in parallel with the linker) - so it

                          can't<br>

                          >>>>>>> remove dead

                          subprograms. Given Google's the major (perhaps

                          only<br>

                          >>>>>>> significant?)

                          user of Split DWARF - I can say that the needs

                          don't<br>

                          >>>>>>> necessarily

                          overlap well with something that would take

                          significantly<br>

                          >>>>>>> longer to run or

                          use significantly more memory.

                          Faster/cheaper/with<br>

                          >>>>>>> somewhat bigger

                          output files is probably the right tradeoff

                          for<br>

                          >>>>>>> Google's use

                          case, at least.<br>

                          >>>>>>><br>

                          >>>>>>> I imagine Apple's

                          use for dsymutil is somewhat similar - it's

                          not used<br>

                          >>>>>>> in the iterative

                          development cycle, only in final releases -

                          well,<br>

                          >>>>>>> maybe their

                          situation is more "neutral" - not a major pain

                          point in<br>

                          >>>>>>> any case I'd

                          guess.<br>

                          >>>>>>><br>

                          >>>>>>><br>

                          >>>>>> I see. FWIW,

                          Comparison splitdwarf+dwp and DWARFLinker from

                          lld:<br>

                          >>>>>><br>

                          >>>>>> 1.

                          split-dwarf+llvm-dwp = linking time for clang

                          6 sec,<br>

                          >>>>>>       generating time

                          for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>

                          >>>>> FWIW, llvm-dwp is not

                          very well optimized (which is to say: it is

                          not<br>

                          >>>>> optimized), binutils dwp

                          might be a better comparison (& even that<br>

                          >>>>> doesn't have the

                          parallelism & some potential further

                          memory savings<br>

                          >>>>> that lld has that we

                          could take advantage of in a dwp-like tool)<br>

                          >>>>><br>

                          >>>>> What build mode was the

                          clang binary built in? Optimized or

                          unoptimized?<br>

                          >>>> right, that is unoptimized

                          build with -ffunction-sections.<br>

                          >>>><br>

                          >>>>>> 2. DWARFLinker from

                          lld = linking time for clang 72 sec,

                          clang=760M.<br>

                          >>> And this is without Split DWARF?

                          Without linker DWARF compression? -<br>

                          >>> that seems quite a bit

                          surprising, that the deduplication of DWARF<br>

                          >>> could fit into less space than

                          the wasted/reclaimed space in ranges (&<br>

                          >>> line)?<br>

                          >> that was without split dwarf, without

                          linker compression.<br>

                          >><br>

                          >>> Could you double check these

                          numbers & provide a clearer summary?<br>

                          >> sure, I would re-check it.<br>

                          >><br>

                          >>> Here's my attempt at numbers (all

                          with function-sections+gc-sections)...<br>

                          >>><br>

                          >>> Split DWARF tests didn't seem

                          meaningful - gc-debuginfo + split DWARF<br>

                          >>> seemed to drop all the debug info

                          (except gdb_index) so wasn't<br>

                          >>> working/comparison wasn't

                          meaningful for Apples to Apples, but<br>

                          >>> included it for comparing gc'd

                          non-split to non-gc'd split (disabled<br>

                          >>> gnu-pubnames/gdb-index

                          (-gsplit-dwarf -gno-gnu-pubnames) (which turns<br>

                          >>> on by default with Split DWARF

                          because gdb needs it - but a bit of an<br>

                          >>> unfair comparison without turning

                          on gnu-pubnames/gdb-index in other<br>

                          >>> build modes too, since it...

                          /shouldn't/ be necessary) which might've<br>

                          >>> been a factor in the data you

                          were looking at)<br>

                          >> that might be the case. i.e.

                          clang=997M for split dwarf(from my previous<br>

                          >> measurement) might include

                          gnu-pubnames.<br>

                          >><br>

                          >> would recheck it and if that is the

                          case then it is a unfair comparison.<br>

                          >><br>

                          >><br>

                          >> My point was that "DWARFLinker from

                          lld" takes less space than singleton<br>

                          >> split dwarf file+.dwp file.<br>

                          >><br>

                          >> for -O0 uncompressed:<br>

                          >><br>

                          >> - .dwp took 1.1G(if I built it

                          correctly), singleton clang(from your<br>

                          >> measurements) 566 MB<br>

                          >><br>

                          >>      overall 1.6G.<br>

                          > Oh, yeah, even if there are some

                          measurement issues, linked executable<br>

                          > + .dwp is going to be larger than a

                          linked executable using non-split<br>

                          > DWARF (in v5), since v5 uses all the same

                          representations as non-split<br>

                          > DWARF, and split DWARF adds the

                          indirection overhead of a split file,<br>

                          > etc.<br>

                          ><br>

                          > Even without DWARF linking, it's true

                          that split DWARF has overhead<br>

                          > (dwp+executable will be larger than

                          executable non-split).<br>

                          ><br>

                          > But maybe we've ended up down a bit of a

                          tangent in any case.<br>

                          ><br>

                          > Trying to bring this back to "should this

                          be committed to lld" seems<br>

                          > valuable, and I'm not sure what the right

                          criteria are for that.<br>

                          I think it would be useful to do "removing

                          obsolete debug info"<br>

                          in the linker. First thing is that it would be

                          the fastest way(no need<br>

                          to copy data/create temp files/built address

                          map...) Second thing<br>

                          is that it would be a good separation of

                          concepts. All debug info<br>

                          processing, currently done in the

                          linker(gdb_index, upcoming<br>

                          debug_names), could be moved into separate

                          library processing<br>

                          debug info. When gdb_index/debug_names should

                          be built without<br>

                          "removing of obsolete debug info" it would

                          have the same<br>

                          performance results as it currently has.<br>

                          <br>

                          We decided to give the idea of "removing of

                          obsolete debug info"<br>

                          another try and are going to implement it as a

                          separate utility<br>

                          working with built binary. Making it to be

                          multi-thread would<br>

                          probably show better performance results and

                          then it could<br>

                          probably be considered as acceptable to use

                          from the linker.<br>

                          <br>

                        </blockquote>

                        <div><br>

                        </div>

                        <div>I'm quite interested in this direction. One

                          thought I had was to incorporate such a

                          library into dsymutil but with support for

                          ELF. If you get a proposal written up I'd love

                          to take a look and comment.</div>

                        <div><br>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                  <p><br>

                  </p>

                  yes, I would share the proposal in a separate thread

                  within a week or two.<br>

                  <br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div>Excellent, thanks :)</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                <div> Shortly: we decided to move in slightly other

                  direction than adding this functionality <br>

                  into dsymutil. Though if there is a preference to

                  implement it as part of dsymutil <br>

                  we are OK to do this way.<br>

                  <br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div>I have a vague preference since a lot of

                functionality already exists there on one platform and

                extending that seems straight forward, however...</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                <div> In its first version, this new utility supposed to

                  receive built binary with debug info <br>

                  as input(with the new marking for references to

                  removed code sections -1/-2 <br>

                  -<a href="https://reviews.llvm.org/D84825"

                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D84825</a>)

                  and create a new binary with removed obsolete <br>

                  debug info according to the above marking. In the next

                  versions, it could be extended <br>

                  with other debug info optimizations tasks. F.e.

                  generation new index tables, debug info <br>

                  optimizing... etc...<br>

                  <br>

                  We considered three options:<br>

                  <br>

                  1. add new functionality into dsymutil. So that

                  dsymutil behaves differently <br>

                      on a non-darwin platform and supports another set

                  of command-line options.<br>

                  <br>

                  2. add new functionality into llvm-objcopy.

                  llvm-objcopy already supports various <br>

                       binary objects formats(MachO,ELF,COFF,wasm). It

                  also has several options <br>

                       to work with debug-info.<br>

                  <br>

                  3. create new utility llvm-dwarfutil which would

                  implement the above functionality <br>

                       and reuse DWARFLinker(extracted from dsymutil)

                  library and new library <br>

                       ObjectCopy(extracted from llvm-objcopy).<br>

                  <br>

                  So far our preference is number three. The reason for

                  this is that separate <br>

                  utility specifically working with debug info looks as

                  good separation of concepts. <br>

                  Adding another behavior to dsymutil looks not very

                  good. Extending the already <br>

                  rich interface of llvm-objcopy looks also not very

                  good. Having in mind that actual <br>

                  implementation would be shared by libraries, the

                  separate utility, working specifically <br>

                  with debug info, looks like the right choice. That is

                  our current idea. <br>

                  <p>I would publish the proposal shortly to discuss it.<br>

                  </p>

                  <br>

                </div>

              </blockquote>

              <div><br>

              </div>

              <div>These are solid arguments - in particular, I agree

                with not extending llvm-objcopy :)</div>

              <div><br>

              </div>

              <div><a class="gmail_plusreply"

                  id="gmail-m_144640436407649066plusReplyChip-0"

                  href="mailto:jonas@devlieghere.com" target="_blank"

                  moz-do-not-send="true">+Jonas Devlieghere</a> and <a

                  class="gmail_plusreply"

                  id="gmail-m_144640436407649066plusReplyChip-1"

                  href="mailto:aprantl@apple.com" target="_blank"

                  moz-do-not-send="true">+Adrian Prantl</a> for dsymutil

                comments.</div>

              <div><br>

              </div>

              <div>My personal thought would be that extending dsymutil

                should be ok as the functionality goes well with

                everything else dsymutil does (other than not support

                ELF which the dsymutil maintainers are on board with

                last I checked). That said, I definitely think a

                write-up will be helpful. No matter what I support

                extracting all of the behavior into libraries and using

                that somewhere :)</div>

              <div><br>

              </div>

              <div>Thanks!</div>

              <div><br>

              </div>

              <div>-eric</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                <div> Thank you, Alexey.<br>

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_quote">

                        <div>Thanks!</div>

                        <div><br>

                        </div>

                        <div>-eric</div>

                        <div> </div>

                        <blockquote class="gmail_quote"

                          style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

                          Alexey.<br>

                          <br>

                          ><br>

                          > Ray's the best person to weigh in on

                          that. My 2c is that I think it<br>

                          > probably is worthwhile, even just as an

                          experiment, assuming it's not<br>

                          > too intrusive to lld.<br>

                          ><br>

                          >> - The "DWARFLinker from lld" 820

                          MB(from your measurements).<br>

                          >><br>

                          >><br>

                          >> So "DWARFLinker from lld" looks two

                          times better.<br>

                          >><br>

                          >><br>

                          >> Anyway, thank you for pointing me to

                          possible mistake. I would recheck<br>

                          >> it and update results.<br>

                          >><br>

                          >><br>

                          >> Alexey.<br>

                          >><br>

                          >><br>

                          >>> * -O0: (baseline, just using

                          strip -g: 356 MB)<br>

                          >>>     * compressed: 25% smaller

                          with gc-debuginfo (481 MB / 641 MB) (407<br>

                          >>> MB split/non-gc)<br>

                          >>>     * uncompressed: 30% smaller

                          (820 MB / 1.2 GB) (566 MB split/non-gc)<br>

                          >>> * -O3: (baseline: 116 MB)<br>

                          >>>     * compressed: 16% smaller

                          (361 MB / 462 MB) (283 MB split/non-gc)<br>

                          >>>     * uncompressed: 22% smaller

                          (1022 MB / 1.2 GB) (156 MB split/non-gc)<br>

                          >>><br>

                          >>><br>

                          >>><br>

                          >>><br>

                          >>> On Fri, Jun 26, 2020 at 9:28 AM

                          Alexey Lapshin<br>

                          >>> <<a

                            href="mailto:alapshin@accesssoftek.com"

                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>

                          wrote:<br>

>>>>>>>>>>>> This idea goes in

                          another direction than fragmenting dwarf<br>

>>>>>>>>>>>> using elf

                          sections&tricks. It seems to me that the

                          cost of fragmenting is too high.<br>

                          >>>>>>>>>>> I

                          tend to agree - but I'm sort of leaning

                          towards trying to use object<br>

                          >>>>>>>>>>>

                          features as much as possible, then

                          implementing just enough custom<br>

                          >>>>>>>>>>>

                          handling in the linker to recoup overhead,

                          etc. (eg: add some kind of<br>

                          >>>>>>>>>>>

                          small header/brief description that makes it

                          easy for the linker to<br>

                          >>>>>>>>>>>

                          slice-and-dice - but hopefully a

                          domain-specific such header can be a<br>

                          >>>>>>>>>>>

                          bit more compact than the fully general ELF

                          form)<br>

                          >>>>>>>>>> I

                          think this indeed should be implemented and

                          evaluated.<br>

                          >>>>>>>>>> So

                          that various approaches could be compared.<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> It is not only the

                          sizes of structures describing fragments but

                          also the complexity<br>

>>>>>>>>>>>> of tools that should be

                          taught to work with fragmented DWARF.<br>

>>>>>>>>>>>> (f.e. llvm-dwarfdump

                          applied to object file should be able to read

                          fragmented DWARF,<br>

>>>>>>>>>>>> but applied to linked

                          executable it should work with non-fragmented

                          DWARF).<br>

>>>>>>>>>>>> That idea is for the

                          tool which works the same way as dsymutil ODR.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> I will shortly describe

                          the idea of making DWARF be easier processed

                          by dsymutil/DWARFLinker:<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> The idea is to have

                          only one "type table" per object file(special

                          section .debug_types_table).<br>

>>>>>>>>>>>> This "type table" would

                          contain all types.<br>

>>>>>>>>>>>> There could be a

                          special type of reference - type_offset - that

                          offset points into the type table.<br>

>>>>>>>>>>>> Basic types could

                          always be placed into the start of "type

                          table" thus, offsets to basic types<br>

>>>>>>>>>>>> most often would be 1

                          byte. There also would be a special kind of

                          reference - reference inside the type.<br>

>>>>>>>>>>>> Type units sig8 system

                          - would not be used to reference types.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> Types deduplication is

                          assumed to be done, not by linker mechanism

                          for COMDAT,<br>

>>>>>>>>>>>> but by a tool like

                          dsymutil. This tool would create resulting

                          .debug_types_table by putting there<br>

>>>>>>>>>>>> types from source

                          .debug_types_table-s. Only one copy of the

                          type would be placed into the<br>

>>>>>>>>>>>> resulting table. All

                          references pointing to the deleted copy would

                          be corrected to point<br>

>>>>>>>>>>>> to the single copy

                          inside "type table". (that is how dsymutil

                          works currently)<br>

                          >>>>>>>>>>> ^

                          that's the step that's probably a bit

                          expensive for a general-use<br>

                          >>>>>>>>>>>

                          tool - it implies parsing all the DWARF to

                          find those references and<br>

                          >>>>>>>>>>>

                          rewrite them, I think. For a high-performance

                          solution that could be<br>

                          >>>>>>>>>>>

                          run by the linker I think it'd be necessary to

                          have a solution that<br>

                          >>>>>>>>>>>

                          doesn't involve parsing all the DIEs.<br>

                          >>>>>>>>>>

                          According to the current dsymutil processing,<br>

                          >>>>>>>>>>

                          exactly this process is not the most

                          time-consuming.<br>

                          >>>>>>>>>> That

                          could be done relatively fast.<br>

                          >>>>>>>>> Fair

                          enough - though I'd still imagine any solution

                          that involves<br>

                          >>>>>>>>> parsing

                          all the DIEs still wouldn't be fast enough

                          (maybe an order of<br>

                          >>>>>>>>> magnitude

                          faster than the current solution even - but

                          that's stuill,<br>

                          >>>>>>>>> what, 6

                          or 7x slower than linking without the

                          feature?) for most users<br>

                          >>>>>>>>> to

                          consider it a good trade-off.<br>

                          >>>>>>>> It seems to

                          me that even the current 6x-7x slowdown could

                          be useful.<br>

                          >>>>>>>> Users who

                          already use dsymutil or llvm-dwp(assuming

                          DWARFLinker<br>

                          >>>>>>>> would be

                          taught to work with a split dwarf) tools spend

                          this time and,<br>

                          >>>>>>>> in some

                          scenarios, waste disk space by inter-mediate

                          files.<br>

                          >>>>>>> FWIW, dwp

                          (llvm-dwp hasn't really been optimized

                          compared to binutils<br>

                          >>>>>>> dwp) is designed

                          to be very quick - by not needing to do a lot

                          of<br>

                          >>>>>>> parsing/fixups.

                          Which, yes, means larger output files than

                          would be<br>

                          >>>>>>> possible with

                          more parsing/etc. It also doesn't take any

                          input from<br>

                          >>>>>>> the linker (so it

                          can run in parallel with the linker) - so it

                          can't<br>

                          >>>>>>> remove dead

                          subprograms. Given Google's the major (perhaps

                          only<br>

                          >>>>>>> significant?)

                          user of Split DWARF - I can say that the needs

                          don't<br>

                          >>>>>>> necessarily

                          overlap well with something that would take

                          significantly<br>

                          >>>>>>> longer to run or

                          use significantly more memory.

                          Faster/cheaper/with<br>

                          >>>>>>> somewhat bigger

                          output files is probably the right tradeoff

                          for<br>

                          >>>>>>> Google's use

                          case, at least.<br>

                          >>>>>>><br>

                          >>>>>>> I imagine Apple's

                          use for dsymutil is somewhat similar - it's

                          not used<br>

                          >>>>>>> in the iterative

                          development cycle, only in final releases -

                          well,<br>

                          >>>>>>> maybe their

                          situation is more "neutral" - not a major pain

                          point in<br>

                          >>>>>>> any case I'd

                          guess.<br>

                          >>>>>>><br>

                          >>>>>>><br>

                          >>>>>> I see. FWIW,

                          Comparison splitdwarf+dwp and DWARFLinker from

                          lld:<br>

                          >>>>>><br>

                          >>>>>> 1.

                          split-dwarf+llvm-dwp = linking time for clang

                          6 sec,<br>

                          >>>>>>       generating time

                          for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>

                          >>>>> FWIW, llvm-dwp is not

                          very well optimized (which is to say: it is

                          not<br>

                          >>>>> optimized), binutils dwp

                          might be a better comparison (& even that<br>

                          >>>>> doesn't have the

                          parallelism & some potential further

                          memory savings<br>

                          >>>>> that lld has that we

                          could take advantage of in a dwp-like tool)<br>

                          >>>>><br>

                          >>>>> What build mode was the

                          clang binary built in? Optimized or

                          unoptimized?<br>

                          >>>> right, that is unoptimized

                          build with -ffunction-sections.<br>

                          >>>><br>

                          >>>>>> 2. DWARFLinker from

                          lld = linking time for clang 72 sec,

                          clang=760M.<br>

                          >>>>> It does seem a tad

                          strange that the clang binary would be smaller<br>

                          >>>>> non-split with DWARF

                          linking than it was split. Though I could

                          imagine<br>

                          >>>>> this might be possible in

                          an optimized build (wehre debug_ranges<br>

                          >>>>> become quite relatively

                          expensive in the .o file contribution with<br>

                          >>>>> Split DWARF)<br>

                          >>>>> Could you compare the

                          section sizes between these two clang

                          binaries, perhaps?<br>

                          >>>> .debug_ranges is three times

                          bigger and .debug_line is twice bigger.<br>

                          >>>><br>

                          >>>>>>>> Thus if they

                          would use this LLD feature in its current

                          state<br>

                          >>>>>>>> - they would

                          still receive benefits.<br>

                          >>>>>>>><br>

                          >>>>>>>> Speaking of

                          performance results - LLD is a multi-thread

                          linker;<br>

                          >>>>>>>> it handles

                          sections in parallel. DWARFLinker generates

                          DWARF using<br>

                          >>>>>>>> AsmPrinter

                          which is a stream - so it could make resulting

                          DWARF only<br>

                          >>>>>>>> continuously.

                          It is not surprising that the parallel

                          solution works faster.<br>

                          >>>>>>>> Making

                          DWARFLinker truly multi-threaded would

                          probably allow us<br>

                          >>>>>>>> to make

                          slowdown to be at 2x-4x range.<br>

                          >>>>>>> *nod* that's

                          still a really expensive link - but I

                          understand that's a<br>

                          >>>>>>> suitable tradeoff

                          for your users<br>

                          >>>>>>><br>

                          >>>>>> Btw, 2x or 7x is for

                          pure linking time. Overall compilation

                          slowdown<br>

                          >>>>>> is not so

                          significant. Building LLVM codebase has only

                          20% slowdown.<br>

                          >>>>> Understood - that's still

                          quite significant to most users, I'd imagine.<br>

                          >>>> I see.<br>

                          >>>><br>

                          >>>>>>>>>>

                          Anyway, I think the dsymutil approach is still

                          valuable, and it<br>

                          >>>>>>>>>> would

                          be useful to optimize it.<br>

                          >>>>>>>>>> Do

                          you think it would be useful to make

                          dsymutil/DWARFLinker truly multi-thread?<br>

                          >>>>>>>>>> (To

                          make dsymutil/DWARFLinker able to process each

                          object file in a separate thread)<br>

                          >>>>>>>>> Perhaps -

                          that I'd probably leave up to the folks who

                          are more<br>

                          >>>>>>>>> invested

                          in dsymutil (Adrian Prantl et al). Maybe one

                          day we'll get it<br>

                          >>>>>>>>>

                          integrated into llvm-dwp and then I'll be

                          interested in getting as<br>

                          >>>>>>>>> much

                          performance out of it as lld - so

                          multithreading and things would<br>

                          >>>>>>>>> be on the

                          books.<br>

                          >>>>>>>> I think

                          improving dsymutil is a valuable thing.<br>

                          >>>>>>>> Though there

                          are several directions which might be

                          considered<br>

                          >>>>>>>> to make it

                          more robust:<br>

                          >>>>>>>><br>

                          >>>>>>>> 1. support of

                          latest DWARF - DWARF5/DWARF64...<br>

                          >>>>>>> I expect/though

                          some of the Apple folks had already worked on

                          DWARF5 support?<br>

                          >>>>>>> DWARF64 - that's

                          been around for a while, and just hasn't been

                          needed<br>

                          >>>>>>> by LLVM users

                          thus far, it seems (until recently - where

                          some<br>

                          >>>>>>> developers have

                          started working on that)<br>

                          >>>>>> There already

                          implemented debug_names table, but

                          debug_rnglists,<br>

                          >>>>>> debug_loclists, type

                          units - are not implemented yet.<br>

                          >>>>> Superficially, type units

                          wouldn't be on the list of features (like<br>

                          >>>>> DWARF64 - it's optional)

                          I'd try to support in dsymutil - since their<br>

                          >>>>> size overhead is more

                          justified for a DWARF-agnostic linker that's<br>

                          >>>>> using comdat groups. With

                          a DWARF-aware linker I'd be specifically<br>

                          >>>>> hoping to avoid using

                          type units to help<br>

                          >>>>>> The thing which<br>

                          >>>>>> should probably be

                          changed is that dsymutil should not have its

                          version<br>

                          >>>>>> of code generating

                          DWARF tables. It should call already existed<br>

                          >>>>>> DWARF5/DWARF64

                          implementations. Then dsymutil would always<br>

                          >>>>>> use last DWARF

                          generators.<br>

                          >>>>> Possibly - I don't know

                          what the architectural tradeoffs for that look<br>

                          >>>>> like - I'd imagine

                          DWARFLinker has sufficiently different<br>

                          >>>>> needs/tradeoffs than

                          LLVM's DWARF generation code (rewriting

                          existing<br>

                          >>>>> DIEs compared to building

                          new ones from scratch, etc) that it might be<br>

                          >>>>> hard for them to share a

                          lot of their implementation.<br>

                          >>>> It is not easy, and would

                          require some additions, but it would benefit<br>

                          >>>> in that all format

                          implementation is in one place. Thus changing

                          that place<br>

                          >>>> would reflect in other

                          places. There are at least three

                          implementations for<br>

                          >>>> .debug_ranges, .debug_aranges

                          currently...<br>

                          >>>><br>

                          >>>><br>

                          >>>>>>>> 2. implement

                          multi-threaded execution.<br>

                          >>>>>>>> 3. support of

                          split DWARF.<br>

                          >>>>>>> Maybe, though I'm

                          still not sure it'd be the right tradeoff -<br>

                          >>>>>>> especially if it

                          involved having to wait to run the .dwo merger

                          (call<br>

                          >>>>>>> it DWARF-aware

                          dwp, or dsymutil with dwp support) until after

                          the<br>

                          >>>>>>> linker ran.<br>

                          >>>>>>><br>

                          >>>>>>>> 4. implement

                          dsymutil for non-darwin platform.<br>

                          >>>>>>> That's probably,

                          essentially (3), more-or-less. Split DWARF is<br>

                          >>>>>>> somewhat of a

                          formalization of Apple's/MachO DWARF

                          distribution model<br>

                          >>>>>>> (leave DWARF it

                          in files that aren't linked/use them from a

                          debugger,<br>

                          >>>>>>> but also be able

                          to merge them into some final file (dsym or

                          dwp) for<br>

                          >>>>>>> archival

                          purposes)<br>

                          >>>>>>><br>

                          >>>>>>>> All of this

                          is a massive piece of work.<br>

                          >>>>>>>> Our original

                          investment was to solve two problems:<br>

                          >>>>>>>><br>

                          >>>>>>>> 1. Overlapped

                          address ranges, which is currently close to

                          being solved. Thank you for helping with that!<br>

                          >>>>>>> Yeah, again,

                          sorry that's taken quite so long/somewhat

                          circuitous route.<br>

                          >>>>>>><br>

                          >>>>>>>> 2. Size of

                          debug info. That still becomes an issue, but

                          we are unsure whether we are ready to<br>

                          >>>>>>>>      invest

                          in solving all the above 1-4 problems and how

                          much community interested in it.<br>

                          >>>>>>> Fair, for sure -

                          I don't think you'd need to sign up to solve

                          all of<br>

                          >>>>>>> them (don't think

                          they necessarily need solving). Potentially

                          moving<br>

                          >>>>>>> the logic out

                          into a separate tool as Fangrui's considering

                          - a<br>

                          >>>>>>> post-link DWARF

                          optimizer, rather than in-linker DWARF

                          optimization.<br>

                          >>>>>>><br>

                          >>>>>>> I really don't

                          want to give you the runaround like this - but

                          multiple<br>

                          >>>>>>> times slower

                          links is something that seems pretty

                          problematic for most<br>

                          >>>>>>> users, to the

                          point of weighing the maintainability of lld

                          against the<br>

                          >>>>>>> convenience of

                          having this functionality in-linker rather

                          than in a<br>

                          >>>>>>> post-link

                          optimizer.<br>

                          >>>>>>><br>

                          >>>>>>> (I know you've

                          spoken a bit before about your users needs -

                          but if<br>

                          >>>>>>> it's possible,

                          could you explain (again :/) why they have

                          such a<br>

                          >>>>>>> strong need for

                          smaller DWARF? While DWARF size is an ongoing

                          concern<br>

                          >>>>>>> for many users

                          (Google certainly - hence the invention of

                          Split DWARF,<br>

                          >>>>>>> use of type units

                          and compressed DWARF, etc) - usually it's in

                          rather<br>

                          >>>>>>> large programs,

                          but it sounds like you're dealing with

                          relatively<br>

                          >>>>>>> small ones

                          (otherwise the increase in link time, I'd

                          imagine, would be<br>

                          >>>>>>> prohibitive for

                          your users?)?<br>

                          >>>>>> We have many large

                          programs and keep Dayly/Nightly debug builds,<br>

                          >>>>>> which takes a lot of

                          disk space. Compilation time for these

                          programs is big.<br>

                          >>>>>> The scenario is

                          "compile once".(not

                          compile-debug-compile-debug).<br>

                          >>>>>> So we think that

                          solution(like dsymutil/DWARFLinker) would not

                          slowdown<br>

                          >>>>>> the compilation time

                          of overall build significantly(see above

                          numbers for<br>

                          >>>>>> llvm codebase) and

                          would allow us to reduce disk space required

                          to keep<br>

                          >>>>>> all of these builds.<br>

                          >>>>> Ah, OK - for archival

                          purposes. So the interactive developers

                          wouldn't<br>

                          >>>>> necessarily be using this

                          feature. Makes sense - similar to dsymutil<br>

                          >>>>> and dwp, mostly used for

                          archival purposes & you can debug straight<br>

                          >>>> >from .o/.dwos for

                          interactive/iterative development.<br>

                          >>>><br>

                          >>>>> In that case, it seems

                          more likely that a separate tool might

                          suffice.<br>

                          >>>> agreed: if to continue the

                          work on this then it makes sense to<br>

                          >>>> do it as separate tool. Make

                          it fast enough. And if there would be interest<br>

                          >>>> in it - then it would

                          probably be possible to return to idea calling

                          it from linker.<br>

                          >>>><br>

                          >>>>> Also, out of curiosity -

                          have you tried just compressing the output<br>

                          >>>>> (-gz (I think that does

                          the right thing for the linker level<br>

                          >>>>> compression too,

                          otherwise -Wl,-compress-debug-sections might

                          do it))<br>

                          >>>>> or are you already doing

                          that in addition?<br>

                          >>>> sure. we use 

                          -Wl,-compress-debug-sections.<br>

                          >>>><br>

                          >>>> Thank you, Alexey.<br>

                          >>>><br>

                          >>>>>>> You mentioned

                          that the usability cost of<br>

                          >>>>>>> Split DWARF for

                          your users was too high (or high enough to

                          justify<br>

                          >>>>>>> this alternative

                          work of DWARF-aware linking)? That all seems a

                          bit<br>

                          >>>>>>> surprising to me

                          - though I understand the deployment issues of

                          Split<br>

                          >>>>>>> DWARF do present

                          some challenges to users in more heterogenous<br>

                          >>>>>>> environments than

                          Google's... still, I'd have thought there was

                          some<br>

                          >>>>>>> hope there)<br>

                          >>>>>> Our tools does not

                          support split dwarf yet. Though we plan to

                          implement it.<br>

                          >>>>>> When we would have

                          support of split dwarf then it would be<br>

                          >>>>>> convenient to have

                          easy way to share built debug binaries.

                          llvm-dwp is the<br>

                          >>>>>> answer to this.

                          DWARFLinker could probably be another answer.<br>

                          >>>>> Ah, fair enough - thanks

                          for the context!<br>

                          >>>>>>>>> One way

                          to do that would be to have a CU-local type

                          indirection table.<br>

                          >>>>>>>>> DIEs

                          reference local type numbers (like local

                          address/string numbers -<br>

                          >>>>>>>>>

                          addrx/strx/rnglistx) and that table contains

                          either sig8 (no linker<br>

                          >>>>>>>>> fixups

                          required) or the local type offsets you

                          describe - the linker<br>

                          >>>>>>>>> would

                          then only need to read this type number

                          indirection table and<br>

                          >>>>>>>>> rewrite

                          them to the final type numbers.<br>

                          >>>>>>>> Yes, that

                          could be additionally done if this process

                          would be time-consuming.<br>

                          >>>>>>>><br>

                          >>>>>>>> David, thank

                          you for all your comments and explanations.

                          They are extremely helpful.<br>

                          >>>>>>> Sure thing -

                          really appreciate your patience with all this

                          - it's... a<br>

                          >>>>>>> lot of moving

                          parts.<br>

                          >>>>>>> - Dave<br>

                          >>>>>>> Thank you,

                          Alexey.<br>

                          >>>>>>><br>

                          >>>>>>>> sig8 hash-id

                          would be used to compare types and to

                          deduplicate them.<br>

                          >>>>>>>> It would

                          speed up the current dsymutil context

                          analysis.<br>

                          >>>>>>>> Types having

                          the same hash-id could be deduplicated.<br>

                          >>>>>>>> This would

                          allow deduplicating a more number of types

                          than current dsymutil.<br>

                          >>>>>>>> Incomplete

                          type definitions having a similar set of

                          members are not deduplicated by dsymutil

                          currently.<br>

                          >>>>>>>> In this case

                          they would have the same hash-id.<br>

                          >>>>>>>><br>

                          >>>>>>>> This "type

                          table" would take less space than current

                          "type units" and current ODR solution.<br>

                          >>>>>>>><br>

                          >>>>>>>> Above is just

                          an idea on how to help DWARF-aware

                          linker(based on idea removing obsolete debug

                          info)<br>

                          >>>>>>>> to work

                          faster(if that is interesting).<br>

                          >>>>>>>><br>

                          >>>>>>>> Alexey.<br>

                          >>>>>>>><br>

                          >>>>>>>>> From:

                          llvm-dev <<a

                            href="mailto:llvm-dev-bounces@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev-bounces@lists.llvm.org</a>>

                          On Behalf Of James Henderson via llvm-dev<br>

                          >>>>>>>>> Sent:

                          Wednesday, June 3, 2020 3:48 AM<br>

                          >>>>>>>>> To: David

                          Blaikie <<a

                            href="mailto:dblaikie@gmail.com"

                            target="_blank" moz-do-not-send="true">dblaikie@gmail.com</a>><br>

                          >>>>>>>>> Cc: <a

                            href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                          >>>>>>>>> Subject:

                          Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove

                          obsolete debug info in lld.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> It makes

                          me sad that the linker (via a library or

                          otherwise) has to be "DWARF-aware" to be able

                          to effectively handle --gc-sections, COMDATs,

                          --icf etc for debug info, without leaving

                          large blocks of data kicking around.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> The

                          patching to -1 (or equivalent) is probably a

                          good lightweight solution (though I'd love it

                          if it could be done based on section type in

                          the future rather than section name, but

                          that's probably outside the realm of DWARF),

                          as it requires only minimal understanding in

                          the linker, but anything beyond that seems to

                          be complicated logic that is mostly due to the

                          structure of DWARF. Patching to -1 does feel a

                          bit like a sticking plaster/band aid to patch

                          over the issue rather than properly solving it

                          too - there will still be debug data

                          (potentially significant amounts in

                          COMDAT-heavy objects) that the linker has to

                          write and the debugger has to somehow know how

                          to skip (even if it knows that -1 is

                          special-case due to the standard being

                          updated, it needs to get as far as the -1),

                          which is all wasted effort.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> We've

                          already seen from Alexey's prototyping, and

                          from our own experiences with the Sony

                          proprietary linker (which tried to rewrite

                          .debug_line only) that deconstructing the

                          DWARF so that it can be more optimally

                          reassembled at link time is slow going, and

                          will probably inevitably be however much

                          effort is put into optimising it. For a start,

                          given the current standards, it's impossible

                          to know how to deconstruct it without having

                          to parse vast amounts of DWARF, which is

                          typically going to mean a lot more parsing

                          work than the linker would normally have to

                          deal with. Additionally, much of this parsing

                          work is wasted effort, since it seems unlikely

                          in many links that large amounts of the DWARF

                          will be redundant. Having an option to opt-in

                          doesn't help much there, since it just means

                          the logic exists without most people using it,

                          due to it not being good enough, or

                          potentially they don't even know it exists.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> I don't

                          have particularly concrete suggestions as to

                          how to solve the structural problems with

                          DWARF at this point. The only thing that seems

                          obvious to me is a more "blessed" approach to

                          fragmentation of sections, similar to what I

                          tried with my prototype mentioned earlier in

                          the thread, although we'd need to figure out

                          the previously stated performance issues.

                          Other ideas might tie into this, like somehow

                          sharing the various table headers a bit like

                          CIEs in .eh_frame that could be merged by the

                          linker - each object could have separate table

                          header sections, which are referenced by the

                          individual .debug_* blocks, which in turn are

                          one per function/data piece and easily

                          discardable/merged by the linker.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> Just some

                          thoughts.<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> James<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

                          >>>>>>>>> On Tue, 2

                          Jun 2020 at 19:24, David Blaikie via llvm-dev

                          <<a href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

                          wrote:<br>

                          >>>>>>>>><br>

                          >>>>>>>>> On Tue,

                          May 19, 2020 at 7:17 AM Alexey Lapshin<br>

                          >>>>>>>>> <<a

                            href="mailto:alapshin@accesssoftek.com"

                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>

                          wrote:<br>

                          >>>>>>>>>> Hi

                          David, please find my comments inside:<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

>>>>>>>>>>>>> Broad question: Do

                          you have any specific motivation/users/etc in

                          implementing this (if you can speak about it)?<br>

>>>>>>>>>>>>> - it might help

                          motivate the work, understand what tradeoffs

                          might be suitable for you/your users, etc.<br>

>>>>>>>>>>>> There are two general

                          requirements:<br>

>>>>>>>>>>>> 1) Remove (or clean)

                          invalid debug info.<br>

                          >>>>>>>>>>>

                          Perhaps a simpler direct solution for your

                          immediate needs might be a much narrower,<br>

                          >>>>>>>>>>>

                          and more efficient linker-DWARF-awareness

                          feature:<br>

                          >>>>>>>>>>><br>

                          >>>>>>>>>>>

                          With DWARFv5, rnglists present an opportunity

                          for a DWARF linker to rewrite the ranges<br>

                          >>>>>>>>>>>

                          without parsing the rest of the DWARF.

                          /technically/ this isn't guaranteed - rnglist

                          entries<br>

                          >>>>>>>>>>>

                          can be referenced either directly, or by

                          index. If all rnglists are referenced by

                          index, then<br>

                          >>>>>>>>>>> a

                          linker could parse only the debug_rnglists

                          section and rewrite ranges to remove any<br>

                          >>>>>>>>>>>

                          address ranges that refer to optimized-out

                          code.<br>

                          >>>>>>>>>>><br>

                          >>>>>>>>>>>

                          This would only be correct for rnglists that

                          had no direct references to them (that only

                          were<br>

                          >>>>>>>>>>>

                          referenced via the indexes) - but we could

                          either implement it with that assumption, or

                          could<br>

                          >>>>>>>>>>>

                          add an LLVM extension attribute on the CU that

                          would say "I promise I only referenced

                          rnglists<br>

                          >>>>>>>>>>>

                          via rnglistx forms/indexes). If this

                          DWARF-aware linking would have to read the CU

                          DIE (not<br>

                          >>>>>>>>>>>

                          all the other DIEs) it /could/ also then

                          rewrite high/low_pc if the CU wasn't using

                          ranges...<br>

                          >>>>>>>>>>>

                          but that wouldn't come up in the

                          function-removal case, because then you'd have

                          ranges anyway,<br>

                          >>>>>>>>>>>

                          so no need for that.<br>

                          >>>>>>>>>>><br>

                          >>>>>>>>>>>

                          Such a DWARF-aware rnglist linking could also

                          simplify rnglists, in cases where functions<br>

                          >>>>>>>>>>>

                          ended up being laid out next to each other,

                          the linker could coalesce their ranges

                          together.<br>

                          >>>>>>>>>>><br>

                          >>>>>>>>>>> I

                          imagine this could be implemented with very

                          little overhead to linking, especially

                          compared<br>

                          >>>>>>>>>>>

                          to the overhead of full DWARF-aware linking.<br>

                          >>>>>>>>>>><br>

                          >>>>>>>>>>>

                          Though none of this fixes Split DWARF, where

                          the linker doesn't get a chance to see the<br>

                          >>>>>>>>>>>

                          addresses being used - but if you only

                          want/need the CU-level ranges to be correct,

                          this<br>

                          >>>>>>>>>>>

                          might be a viable fix, and quite efficient.<br>

                          >>>>>>>>>> Yes,

                          we think about that alternative. This would

                          resolve our problem of invalid debug info<br>

                          >>>>>>>>>> and

                          would work much faster. Thus, if we would not

                          have good results for D74169 then we<br>

                          >>>>>>>>>> will

                          implement it. Do you think it could be useful

                          to have this solution in upstream?<br>

                          >>>>>>>>> A pure

                          rnglist rewriting - I think it'd be OK to have

                          in upstream -<br>

                          >>>>>>>>> again,

                          cost/benefit/etc would have to be weighed. I'm

                          not sure it<br>

                          >>>>>>>>> would

                          save enough space to be particularly valuable

                          beyond the<br>

                          >>>>>>>>>

                          correctness issue - and it doesn't completely

                          solve the correctness<br>

                          >>>>>>>>> issue for

                          zero-address usage or low-address usage

                          (because you could<br>

                          >>>>>>>>> still

                          have overlapping subprograms inside a CU - so

                          if you were<br>

                          >>>>>>>>>

                          symbolizing you could use the correct rnglist

                          to filter, but then go<br>

                          >>>>>>>>> look

                          inside the CU only to find two subprograms

                          that had that address<br>

                          >>>>>>>>> & not

                          know which one was the correct one an which

                          one was the<br>

                          >>>>>>>>> discarded

                          one).<br>

                          >>>>>>>>><br>

                          >>>>>>>>> rnglist

                          rewriting might be easy enough to prototype -

                          but depends what<br>

                          >>>>>>>>> you want

                          to spend your time on, I know this whole issue

                          has been a<br>

                          >>>>>>>>> huge

                          investment of your time already - but maybe

                          this recent<br>

                          >>>>>>>>>

                          revitalization of the conversation around

                          having an explicit value in<br>

                          >>>>>>>>> the

                          linker might be sufficient to address

                          everyone's needs... *fingers<br>

                          >>>>>>>>> crossed*)<br>

                          >>>>>>>>><br>

                          >>>>>>>>><br>

>>>>>>>>>>>> 2) Optimize the DWARF

                          size.<br>

                          >>>>>>>>>>>

                          Do your users care much about this? I imagine

                          if they had significant DWARF size issues,<br>

                          >>>>>>>>>>>

                          they'd have significant link time issues and

                          the kind of cost to link time this feature has

                          would<br>

                          >>>>>>>>>>>

                          be prohibitive - but perhaps they're sharing

                          linked binaries much more often than they're<br>

                          >>>>>>>>>>>

                          actually performing linking.<br>

                          >>>>>>>>>> Yes,

                          they do. They also have significant link-time

                          issues.<br>

                          >>>>>>>>>> So

                          current performance results of D74169 are not

                          very acceptable.<br>

                          >>>>>>>>>> We

                          hope to improve it.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> The specifics which our

                          users have:<br>

>>>>>>>>>>>>    - embedded platform

                          which uses 0 as start of .text section.<br>

>>>>>>>>>>>>    - custom toolset

                          which does not support all features yet(f.e.

                          split dwarf).<br>

>>>>>>>>>>>>    - tolerant of the

                          link-time increase.<br>

>>>>>>>>>>>>    - need a useful way

                          to share debug builds.<br>

                          >>>>>>>>>>>

                          Sharing two files (executable and dwp) is

                          significantly less useful than sharing one

                          file?<br>

                          >>>>>>>>>>

                          Probably not significantly, but yes, it looks

                          less useful comparing to D74169.<br>

                          >>>>>>>>>>

                          Having only two files (executable and .dwp)

                          looks significantly better than having

                          executable and multiple .dwo files.<br>

                          >>>>>>>>>>

                          Having only one file(executable) with minimal

                          size looks better than the two files with a

                          bigger size.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>> clang

                          compiled with -gsplitdwarf takes 0.9G for

                          executable and 0.9G for .dwp.<br>

                          >>>>>>>>>> clang

                          compiled with -gc-debuginfo takes only 0.76G

                          for single executable.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> For the first point: we

                          have a problem "Overlapping address ranges

                          starting from 0"(D59553).<br>

>>>>>>>>>>>> We use custom solution,

                          but the general solution like D74169 would be

                          better here.<br>

                          >>>>>>>>>>>

                          If CU ranges are the only ones that need

                          fixing, then I think the above solution might

                          be as<br>

                          >>>>>>>>>>>

                          good/better - if more than CU ranges need

                          fixing, then I think we might want to start

                          talking about<br>

                          >>>>>>>>>>>

                          how to fix DWARF itself (split and non-split)

                          to signal certain addresses point to dead code

                          with a<br>

                          >>>>>>>>>>>

                          specific blessed value that linkers would need

                          to implement - because with Split DWARF

                          there's<br>

                          >>>>>>>>>>>

                          no way to solve the non-CU addresses at the

                          linker.<br>

                          >>>>>>>>>> I

                          think the worthful solution for that signal

                          value would be LowPC > HighPC.<br>

                          >>>>>>>>>> That

                          does not require additional bits in DWARF.<br>

                          >>>>>>>>>> It

                          would be natural to skip such address ranges

                          since they explicitly marked as invalid.<br>

                          >>>>>>>>>> It

                          could be implemented in a linker very easily.

                          Probably, it would make sense to describe that<br>

                          >>>>>>>>>> usage

                          in DWARF standard.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>> As to

                          the addresses which are not seen by the

                          linker(since they are in .dwo files) - yes,<br>

                          >>>>>>>>>> they

                          need to have another solution. Could you show

                          an example of such a case, please?<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

                          >>>>>>>>>><br>

>>>>>>>>>>>>> 2. Support of type

                          units.<br>

>>>>>>>>>>>>>>    That could

                          be implemented further.<br>

>>>>>>>>>>>>> Enabling type units

                          increases object size to make it easier to

                          deduplicate at link time by a DWARF-unaware<br>

>>>>>>>>>>>>> linker. With a

                          DWARF aware linker it'd be generally desirable

                          not to have to add that object size overhead

                          to<br>

>>>>>>>>>>>>> get the linking

                          improvements.<br>

>>>>>>>>>>>> But, DWARFLinker should

                          adequately work with type units since they are

                          already implemented.<br>

                          >>>>>>>>>>>

                          Maybe - it'd be nice & all, but I don't

                          think it's an outright necessity - if someone

                          knows they're using<br>

                          >>>>>>>>>>> a

                          DWARF-aware linker, they'd probably not use

                          type units in their object files. It's

                          possible someone<br>

                          >>>>>>>>>>>

                          doesn't know for sure & maybe they have

                          pre-canned debug object files from someone

                          else, etc.<br>

                          >>>>>>>>>> I

                          see.<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> Another thing is that

                          the idea behind type units has the potential

                          to help Dwarf-aware linker to work faster.<br>

>>>>>>>>>>>> Currently, DWARFLinker

                          analyzes context to understand whether types

                          are the same or not.<br>

                          >>>>>>>>>>>

                          When you say "analyzes context" what do you

                          mean? Usually I'd take that to mean<br>

                          >>>>>>>>>>>

                          "looks at things outside the type itself -

                          like what namespace it's in, etc" - which,

                          yes,<br>

                          >>>>>>>>>>>

                          it should do that, but it doesn't seem very

                          expensive to do. But I guess you actually<br>

                          >>>>>>>>>>>

                          mean something about doing structural

                          equivalence in some way, looking at things

                          inside the type?<br>

                          >>>>>>>>>> I

                          think it could be useful for both cases.

                          Currently, dsymutil does only first thing<br>

                          >>>>>>>>>> (look

                          at type name, namespace name, etc..) and does

                          not do the second thing<br>

                          >>>>>>>>>>

                          (doing structural equivalence). Analyzing type

                          names is currently quite expensive<br>

                          >>>>>>>>>> (the

                          only search in string pool takes ~10 sec from

                          70 sec of overall time).<br>

                          >>>>>>>>>> That

                          is expensive because of many things should be

                          done to work with strings:<br>

                          >>>>>>>>>> parse

                          DWARF, search and resolve relocations, compute

                          a hash for strings,<br>

                          >>>>>>>>>> put

                          data into a string pool, create a fully

                          qualified name(like

                          namespace::function::name).<br>

                          >>>>>>>>>> It

                          looks like it could be optimized and finally

                          require less time, but it still would be a

                          noticeable<br>

                          >>>>>>>>>> part

                          of the overall time.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>> If

                          dsymutil starts to check for the structural

                          equivalence, then the process would be even

                          more slowly.<br>

                          >>>>>>>>>> So,

                          If instead of comparing types structure, there

                          would be checked single hash-id - then this

                          process<br>

                          >>>>>>>>>> would

                          also be faster.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>> Thus

                          I think using hash-id to compare types would

                          allow to make current implementation faster

                          and would<br>

                          >>>>>>>>>> allow

                          handling incomplete types by DWARFLinker

                          without massive performance degradation also.<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> But the context is

                          known when types are generated. So, no need to

                          spent the time analyzing it.<br>

>>>>>>>>>>>> If types could be

                          compared without analyzing context, then

                          Dwarf-aware linker would work faster.<br>

>>>>>>>>>>>> That is just an

                          idea(not for immediate implementation): If

                          types would be stored in some "type table"<br>

>>>>>>>>>>>> (instead of COMDAT

                          section group) and could be accessed through

                          hash-id(like type units<br>

>>>>>>>>>>>> - then it would be the

                          solution requiring fewer bits to store but

                          allowing to compare types<br>

>>>>>>>>>>>> by hash-id(not

                          analysing context).<br>

>>>>>>>>>>>> In this case, size

                          increasing would be small. And processing time

                          could be done faster.<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>> this is just an idea

                          and could be discussed separately from the

                          problem of integrating of D74169.<br>

>>>>>>>>>>>>>> 6. -flto=thin<br>

>>>>>>>>>>>>>>      That

                          problem was described in this review <a

                            href="https://reviews.llvm.org/D54747#1503720"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">https://reviews.llvm.org/D54747#1503720</a>.

                          It also exists in<br>

>>>>>>>>>>>>>> current

                          DWARFLinker/dsymutil implementation. I think

                          that problem should be discussed more: it

                          could<br>

>>>>>>>>>>>>>> probably be

                          fixed by avoiding generation of such

                          incomplete declaration during thinlto,<br>

>>>>>>>>>>>>>> That would be

                          costly to produce extra/redundant debug info

                          in ThinLTO - actually ThinLTO could be doing<br>

>>>>>>>>>>>>>> more to reduce

                          that redundancy early on (actually removing

                          definitions from some llvm Modules if the type<br>

>>>>>>>>>>>>>> definition is

                          known to exist in another Module, etc)<br>

>>>>>>>>>>>>> I don't know if

                          it's a problem since that patch was reverted.<br>

>>>>>>>>>>>> Yes. That patch was

                          reverted, but this patch(D74169) has the same

                          problem.<br>

>>>>>>>>>>>> if D74169 would be

                          applied and --gc-debuginfo used then structure

                          type<br>

>>>>>>>>>>>> definition would be

                          removed.<br>

>>>>>>>>>>>> DWARFLinker could

                          handle that case - "removing definitions from

                          some llvm Modules if the type<br>

>>>>>>>>>>>> definition is known to

                          exist in another Module".<br>

>>>>>>>>>>>> i.e. DWARFLinker could

                          replace the declaration with the definition.<br>

>>>>>>>>>>>> But that problem could

                          be more easily resolved when debug info is

                          generated(probably without<br>

>>>>>>>>>>>> significant increase of

                          debug info size):<br>

>>>>>>>>>>>> Here we have:<br>

>>>>>>>>>>>>

                          DW_TAG_compile_unit(0x0000000b) - compile unit

                          containing concrete instance for function "f".<br>

>>>>>>>>>>>>

                          DW_TAG_compile_unit(0x00000073) - compile unit

                          containing abstract instance root for function

                          "f".<br>

>>>>>>>>>>>>

                          DW_TAG_compile_unit(0x000000c1) - compile unit

                          containing function "f" definition.<br>

>>>>>>>>>>>> Code for function "f"

                          was deleted. gc-debuginfo deletes compile unit

                          DW_TAG_compile_unit(0x000000c1)<br>

>>>>>>>>>>>> containing "f"

                          definition (since there is no corresponding

                          code). But it has structure "Foo" definition<br>

>>>>>>>>>>>>

                          DW_TAG_structure_type(0x0000011e) referenced

                          from DW_TAG_compile_unit(0x00000073)<br>

>>>>>>>>>>>> by declaration

                          DW_TAG_structure_type(0x000000ae). That

                          declaration is exactly the case when

                          definition<br>

>>>>>>>>>>>> was removed by thinlto

                          and replaced with declaration.<br>

>>>>>>>>>>>> Would it cost too much

                          if type definition would not be replaced with

                          declaration for "abstract instance root"?<br>

>>>>>>>>>>>> The number of concrete

                          instances is bigger than number of abstract

                          instance roots.<br>

>>>>>>>>>>>> Probably, it would not

                          be too costly to leave definition in abstract

                          instance root?<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>> Alternatively, Would it

                          cost too much if type definition would not be

                          replaced with declaration when<br>

>>>>>>>>>>>> declaration references

                          type from not used function? (lto could

                          understand that concrete function is not

                          used).<br>

                          >>>>>>>>>>> I

                          don't follow this example - could you provide

                          a small concrete test case I could reproduce?<br>

                          >>>>>>>>>> I

                          would provide a test case if necessary. But it

                          looks like this issue is finally clear, and

                          you already commented on that.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>>>

                          Oh, I guess this is happening perhaps because

                          ThinLTO can't know for sure that a standalone<br>

                          >>>>>>>>>>>

                          definition of 'f' won't be needed - so it

                          produces one in case one of the inlining

                          opportunities<br>

                          >>>>>>>>>>>

                          doesn't end up inlining. Then it turns out all

                          calls got inlined, so the external definition

                          wasn't needed.<br>

                          >>>>>>>>>>>

                          Oh, you're suggesting that these 3 CUs got

                          emitted into one object file during LTO, but

                          that DWARFLinker<br>

                          >>>>>>>>>>>

                          drops a CU without any code in it - even

                          though... So far as I know, in LTO, LLVM

                          directly references<br>

                          >>>>>>>>>>>

                          types across units if the CUs are all emitted

                          in the same object file. (and if they weren't

                          in the same<br>

                          >>>>>>>>>>>

                          object file - then the abstract_origin

                          couldn't be pointing cross-CU).<br>

                          >>>>>>>>>>> I

                          guess some basic things to say:<br>

                          >>>>>>>>>>>

                          With ThinLTO, the concrete/standalone function

                          definition is emitted in case some call sites

                          don't end up<br>

                          >>>>>>>>>>>

                          being inlined. So we know it'll be emitted

                          (but might not be needed by the actual linker)<br>

                          >>>>>>>>>>>

                          ANy number of inline calls might exist - but

                          we shouldn't put the type information into

                          those, because<br>

                          >>>>>>>>>>>

                          they aren't guaranteed to emit it (if the

                          inline function gets optimized away, there

                          would be nothing to<br>

                          >>>>>>>>>>>

                          enforce the type being emitted) - and even if

                          we forced the type information to be emitted

                          into one<br>

                          >>>>>>>>>>>

                          object file that has an inline copy of the

                          function - there's no guarantee that object

                          file will get linked in either.<br>

                          >>>>>>>>>>>

                          So, no, I don't think there's much we can do

                          to keep the size of object files down, while

                          guaranteeing<br>

                          >>>>>>>>>>>

                          the type information will be emitted with the

                          usual linker semantics.<br>

                          >>>>>>>>>> Then

                          dsymutil/DWARFLinker could be changed to

                          handle that(though it would probably be not

                          very efficient).<br>

                          >>>>>>>>>> If

                          thinlto would understand that function is not

                          used finally(and then must not contain

                          referenced type definition),<br>

                          >>>>>>>>>> then

                          this situation could be handled more

                          effectively.<br>

                          >>>>>>>>>><br>

                          >>>>>>>>>> Thank

                          you, Alexey.<br>

                          >>>>>>>>>><br>

>>>>>>>>>>>><br>

>>>>>>>>>>>><br>

>>>>>>>>>>>>

                          _______________________________________________<br>

>>>>>>>>>>>> LLVM Developers mailing

                          list<br>

>>>>>>>>>>>> <a

                            href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

>>>>>>>>>>>> <a

                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                          >>>>>>>>>

                          _______________________________________________<br>

                          >>>>>>>>> LLVM

                          Developers mailing list<br>

                          >>>>>>>>> <a

                            href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                          >>>>>>>>> <a

                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                          >>>

                          _______________________________________________<br>

                          >>> LLVM Developers mailing list<br>

                          >>> <a

                            href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                          >>> <a

                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

_______________________________________________<br>

                          LLVM Developers mailing list<br>

                          <a href="mailto:llvm-dev@lists.llvm.org"

                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

                          <a

                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

                            rel="noreferrer" target="_blank"

                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

                        </blockquote>

                      </div>

                    </div>

                  </blockquote>

                </div>

              </blockquote>

            </div>

          </div>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>