<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi Jonas, <br>
      <br>
      Thank you for the comments, please find my answers below...<br>
    </p>
    <div class="moz-cite-prefix">On 06.08.2020 20:39, Jonas Devlieghere
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div>Hi Alexey,</div>
          <div><br>
          </div>
          <div>I should've looked at this earlier. I went through the
            thread again and I've</div>
          <div>made some comments, mostly from the dsymutil point of
            view.</div>
          <div><br>
          </div>
          <div>> Current DWARFEmitter/DWARFStreamer has an
            implementation for DWARF</div>
          <div>> generation, which does not support DWARF5(only
            debug_names table). At the</div>
          <div>> same time, there already exists code in
            CodeGen/AsmPrinter/DwarfDebug.h,</div>
          <div>> which implements most of DWARF5. It seems that
            DWARFEmitter/DWARFStreamer</div>
          <div>> should be rewritten using DwarfDebug/DwarfFile.
            Though I am not sure</div>
          <div>> whether it would be easy to re-use
            DwarfDebug/DwarfFile. It would probably</div>
          <div>> be necessary to separate some intermediate level of
            DwarfDebug/DwarfFile.</div>
          <div><br>
          </div>
          <div>These classes serve very different purposes. Last time I
            looked at them there</div>
          <div>was very little overlap in functionality. In the compiler
            we're mostly</div>
          <div>concerned with generating the DWARF, while in dsymutil we
            try to copy</div>
          <div>everything we don't need to parse, and fix up what we
            have to. I don't want</div>
          <div>to say it's not possible, but I think supporting DWARF5
            in those classes is</div>
          <div>going to be a lot less work than trying to reuse the
            CodeGen variants.</div>
        </div>
      </div>
    </blockquote>
    I agree, in it`s current state it would be less work to write
    separate implementation <br>
    than reusing CodeGen variants. The bad thing is that in such a case
    there is a lot of <br>
    code duplication:<br>
    <br>
    DwarfStreamer::emitUnitRangesEntries<br>
    DwarfDebug::emitDebugARanges<br>
    EmitGenDwarfAranges<br>
    DWARFYAML::emitDebugAranges<br>
    <br>
    Supporting new standard would require rewriting/modification of all
    these places. In the ideal world,<br>
    having single implementation for the DWARF generation allows
    changing one place and having <br>
    benefits in others. Probably, CodeGen classes could be rewritten and
    then it would be useful<br>
    to write them assuming two use cases - generation from the scratch
    and copying/updating <br>
    existing data. In the end, there would be single implementation
    which could be reused in <br>
    many places. Though, it is indeed a lot of work.<br>
    <br>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div>> Measurements show that it is spent ~10 sec in</div>
          <div>> llvm::StringMapImpl::LookupBucketFor(). The problem
            is that the same</div>
          <div>> strings, again and again, are added to the string
            pool. Two attributes</div>
          <div>> having the same string value would be analyzed (hash
            calculated) and</div>
          <div>> searched inside the string pool. Even if these
            strings are already in</div>
          <div>> string table(DW_FORM_strp, DW_FORM_strx). The
            process could be optimized</div>
          <div>> for string tables. So that if some string from the
            string table were</div>
          <div>> accessed previously then, it would keep a reference
            into the string pool.</div>
          <div>> This would eliminate a lot of string pool searches.</div>
          <div><br>
          </div>
          <div>I'm not sure I fully understand the optimization, but I'd
            love to speed this</div>
          <div>up, if only for dsymutil's sake. I'd love to talk about
            this in a separate</div>
          <div>thread or offline.</div>
          <div><br>
          </div>
        </div>
      </div>
    </blockquote>
    The measurements show that quite a big time is taken <br>
    by llvm::StringMapImpl::LookupBucketFor(). i.e. searching inside a
    string <br>
    pool takes a significant amount of time. The idea of optimization
    was to <br>
    reduce the number of string pool searches by remembering previous <br>
    results. DW_FORM_strp, DW_FORM_strx forms do not keep string itself
    <br>
    but reference a string from a separate table by index. Currently. if
    there are <br>
    duplicated strings of DW_FORM_strp, DW_FORM_strx there would be <br>
    two/three/...(one per duplicate) searches in string pool <br>
    (llvm::StringMapImpl::LookupBucketFor() would be called). If the
    position <br>
    in the pool would be remembered for the index of the first duplicate
    <br>
    then there would not be necessary to call
    llvm::StringMapImpl::LookupBucketFor() next times.<br>
    <br>
    But prototyping of that idea did not show any worthful performance
    improvement. <br>
    <br>
    Some small performance improvement could be achieved if string pools
    would use <br>
    llvm::hash_value(StringRef S) instead of llvm::djbHash().
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div>> Currently, all object files are analyzed
            sequentially and cloned</div>
          <div>> sequentially. Cloning is started in parallel with
            analyzing. That scheme</div>
          <div>> could be changed: analyzing and cloning could be
            done in parallel for each</div>
          <div>> object file. That requires refactoring of
            DWARFLinker and making string</div>
          <div>> pools and DeclContextTree thread-safe.</div>
          <div><br>
          </div>
          <div>I'm less familiar with the way that LLD uses the
            DWARFOptimizer but this is</div>
          <div>not possible for dsymutil as it is trying to deduplicate
            DIEs from different</div>
          <div>compile units.</div>
        </div>
      </div>
    </blockquote>
    Right. dsymutil is trying to de-duplicate DIEs from different<br>
    compile units. That, probably, does not avoid multi-thread
    implementation: <br>
    <br>
    1. DeclContextTree.getChildDeclContext() should be done thread safe.<br>
        thus, even if CU would be processed in parallel - DIEs could be
    de-duplicated<br>
        based on DeclContext. <br>
    2. UniquingStringPool and OffsetsStringPool should also be done
    thread safe.<br>
    3. Since compilation units would be processed in parallel -<br>
        the size of the compilation unit would not be known until it is
    fully processed. <br>
        That means that all compilation unit's references should be
    patched after <br>
        CU content is generated. In the same manner like forward
    references <br>
        are currently patched(fixupForwardReferences).<br>
    4. DWARFStreamer provides a sequential interface. Instead of a
    single stream <br>
        as the output, there could be generated several outputs for each
    CU. <br>
        They would be glued together in the end.<br>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div>> I think improving dsymutil is a valuable thing.
            Though there are several</div>
          <div>> directions which might be considered to make it more
            robust:</div>
          <div>></div>
          <div>> 1. support of latest DWARF - DWARF5/DWARF64...</div>
          <div><br>
          </div>
          <div>Strong +1 on DWARF5. I haven't had the bandwidth yet to
            really look at this.</div>
          <div>Right now we can't find (at least some) rellocations so
            we bail out. I'd need</div>
          <div>to fix that to assess the current state of things and
            figure out how much</div>
          <div>work would be needed.</div>
          <div><br>
          </div>
          <div>I don't think anything in LLVM supports generating
            DWARF64 though.</div>
          <div><br>
          </div>
          <div>> 2. implement multi-threaded execution.</div>
          <div><br>
          </div>
          <div>See my earlier comment. At least for the dsymutil case,
            the current approach</div>
          <div>is the best we can do, but I'd love to be proven wrong.
            :-)</div>
          <div><br>
          </div>
          <div>> 3. support of split DWARF.</div>
          <div>> 4. implement dsymutil for non-darwin platform.</div>
          <div><br>
          </div>
          <div>These two seem to go together. Given the work you did to
            split off the DWARF</div>
          <div>optimization part I think we're closer to this than ever.
            Thanks again for</div>
          <div>doing that.</div>
          <div><br>
          </div>
          <div>> We considered three options:</div>
          <div>></div>
          <div>> 1. add new functionality into dsymutil. So that
            dsymutil behaves</div>
          <div>> differently on a non-darwin platform and supports
            another set of</div>
          <div>> command-line options.</div>
          <div>></div>
          <div>> 2. add new functionality into llvm-objcopy.
            llvm-objcopy already supports</div>
          <div>> various binary objects formats(MachO,ELF,COFF,wasm).
            It also has several</div>
          <div>> options to work with debug-info.</div>
          <div>></div>
          <div>> 3. create new utility llvm-dwarfutil which would
            implement the above</div>
          <div>> functionality and reuse DWARFLinker(extracted from
            dsymutil) library and</div>
          <div>> new library ObjectCopy(extracted from llvm-objcopy).</div>
          <div>></div>
          <div>> So far our preference is number three. The reason
            for this is that separate</div>
          <div>> utility specifically working with debug info looks
            as good separation of</div>
          <div>> concepts. Adding another behavior to dsymutil looks
            not very good.</div>
          <div><br>
          </div>
          <div>In its current state dsymutil itself is a pretty small
            tool on top of the</div>
          <div>DWARFOptimizer/Linker. I'm curious what the benefits of
            another tool are</div>
          <div>compared to a different frontend (like objcopy) for MachO
            and ELF. It seems</div>
          <div>like that would allow for separation of concerns, while
            still being able to</div>
          <div>share common code without having to push it all the way
            up into LLVM.</div>
        </div>
      </div>
    </blockquote>
    my concern is that this tool would have different source data and
    different set of options.<br>
    Having in mind that handling different set of input data and
    different set of options <br>
    means writing the other frontend - it, probably, would be good not
    to make dsymutil more complex but<br>
    to create another small tool. But, If extending dsymutil looks OK -
    I am OK with it. <br>
    Let`s discuss this approach within proposal thread.
    <p><br>
    </p>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div>> Extending the already rich interface of llvm-objcopy
            looks also not very</div>
          <div>> good. Having in mind that actual implementation
            would be shared by</div>
          <div>> libraries, the separate utility, working
            specifically with debug info,</div>
          <div>> looks like the right choice. That is our current
            idea.</div>
          <div><br>
          </div>
          <div>> My personal thought would be that extending dsymutil
            should be ok as the</div>
          <div>> functionality goes well with everything else
            dsymutil does (other than not</div>
          <div>> support ELF which the dsymutil maintainers are on
            board with last I</div>
          <div>> checked). That said, I definitely think a write-up
            will be helpful. No</div>
          <div>> matter what I support extracting all of the behavior
            into libraries and</div>
          <div>> using that somewhere :)</div>
          <div><br>
          </div>
          <div>Ha, so basically what I was trying to say above.</div>
          <div><br>
          </div>
          <div>I look forward to seeing the proposal!</div>
        </div>
      </div>
    </blockquote>
    <p>yep, would publish it soon.<br>
    </p>
    <p>Thank you, Alexey.<br>
    </p>
    <blockquote type="cite"
cite="mid:CAJQy47cJg=Rt1Lvq9XiiwGJO4EeO+cRXaxi6R4ORo_MwEcE6mw@mail.gmail.com">
      <div dir="ltr">
        <div dir="ltr">
          <div><br>
          </div>
          <div>Cheers,</div>
          <div>Jonas</div>
          <div><br>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Aug 4, 2020 at 11:33
          PM Eric Christopher <<a href="mailto:echristo@gmail.com"
            moz-do-not-send="true">echristo@gmail.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">
            <div dir="ltr">Hi Alexey,
              <div><br>
              </div>
              <div><br>
              </div>
            </div>
            <br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Mon, Aug 3, 2020 at
                8:32 AM Alexey Lapshin <<a
                  href="mailto:avl.lapshin@gmail.com" target="_blank"
                  moz-do-not-send="true">avl.lapshin@gmail.com</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                <div>
                  <p>Hi Eric, please <br>
                  </p>
                  <div>On 31.07.2020 22:02, Eric Christopher wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>Hi Alexey,</div>
                      <br>
                      <div class="gmail_quote">
                        <div dir="ltr" class="gmail_attr">On Fri, Jul
                          31, 2020 at 4:02 AM Alexey Lapshin via
                          llvm-dev <<a
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
                          wrote:<br>
                        </div>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><br>
                          On 28.07.2020 19:28, David Blaikie wrote:<br>
                          > On Tue, Jul 28, 2020 at 8:55 AM Alexey
                          Lapshin <<a
                            href="mailto:avl.lapshin@gmail.com"
                            target="_blank" moz-do-not-send="true">avl.lapshin@gmail.com</a>>
                          wrote:<br>
                          >><br>
                          >> On 28.07.2020 10:29, David Blaikie
                          via llvm-dev wrote:<br>
                          >>> On Fri, Jun 26, 2020 at 9:28 AM
                          Alexey Lapshin<br>
                          >>> <<a
                            href="mailto:alapshin@accesssoftek.com"
                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
                          wrote:<br>
>>>>>>>>>>>> This idea goes in
                          another direction than fragmenting dwarf<br>
>>>>>>>>>>>> using elf
                          sections&tricks. It seems to me that the
                          cost of fragmenting is too high.<br>
                          >>>>>>>>>>> I
                          tend to agree - but I'm sort of leaning
                          towards trying to use object<br>
                          >>>>>>>>>>>
                          features as much as possible, then
                          implementing just enough custom<br>
                          >>>>>>>>>>>
                          handling in the linker to recoup overhead,
                          etc. (eg: add some kind of<br>
                          >>>>>>>>>>>
                          small header/brief description that makes it
                          easy for the linker to<br>
                          >>>>>>>>>>>
                          slice-and-dice - but hopefully a
                          domain-specific such header can be a<br>
                          >>>>>>>>>>>
                          bit more compact than the fully general ELF
                          form)<br>
                          >>>>>>>>>> I
                          think this indeed should be implemented and
                          evaluated.<br>
                          >>>>>>>>>> So
                          that various approaches could be compared.<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> It is not only the
                          sizes of structures describing fragments but
                          also the complexity<br>
>>>>>>>>>>>> of tools that should be
                          taught to work with fragmented DWARF.<br>
>>>>>>>>>>>> (f.e. llvm-dwarfdump
                          applied to object file should be able to read
                          fragmented DWARF,<br>
>>>>>>>>>>>> but applied to linked
                          executable it should work with non-fragmented
                          DWARF).<br>
>>>>>>>>>>>> That idea is for the
                          tool which works the same way as dsymutil ODR.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> I will shortly describe
                          the idea of making DWARF be easier processed
                          by dsymutil/DWARFLinker:<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> The idea is to have
                          only one "type table" per object file(special
                          section .debug_types_table).<br>
>>>>>>>>>>>> This "type table" would
                          contain all types.<br>
>>>>>>>>>>>> There could be a
                          special type of reference - type_offset - that
                          offset points into the type table.<br>
>>>>>>>>>>>> Basic types could
                          always be placed into the start of "type
                          table" thus, offsets to basic types<br>
>>>>>>>>>>>> most often would be 1
                          byte. There also would be a special kind of
                          reference - reference inside the type.<br>
>>>>>>>>>>>> Type units sig8 system
                          - would not be used to reference types.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> Types deduplication is
                          assumed to be done, not by linker mechanism
                          for COMDAT,<br>
>>>>>>>>>>>> but by a tool like
                          dsymutil. This tool would create resulting
                          .debug_types_table by putting there<br>
>>>>>>>>>>>> types from source
                          .debug_types_table-s. Only one copy of the
                          type would be placed into the<br>
>>>>>>>>>>>> resulting table. All
                          references pointing to the deleted copy would
                          be corrected to point<br>
>>>>>>>>>>>> to the single copy
                          inside "type table". (that is how dsymutil
                          works currently)<br>
                          >>>>>>>>>>> ^
                          that's the step that's probably a bit
                          expensive for a general-use<br>
                          >>>>>>>>>>>
                          tool - it implies parsing all the DWARF to
                          find those references and<br>
                          >>>>>>>>>>>
                          rewrite them, I think. For a high-performance
                          solution that could be<br>
                          >>>>>>>>>>>
                          run by the linker I think it'd be necessary to
                          have a solution that<br>
                          >>>>>>>>>>>
                          doesn't involve parsing all the DIEs.<br>
                          >>>>>>>>>>
                          According to the current dsymutil processing,<br>
                          >>>>>>>>>>
                          exactly this process is not the most
                          time-consuming.<br>
                          >>>>>>>>>> That
                          could be done relatively fast.<br>
                          >>>>>>>>> Fair
                          enough - though I'd still imagine any solution
                          that involves<br>
                          >>>>>>>>> parsing
                          all the DIEs still wouldn't be fast enough
                          (maybe an order of<br>
                          >>>>>>>>> magnitude
                          faster than the current solution even - but
                          that's stuill,<br>
                          >>>>>>>>> what, 6
                          or 7x slower than linking without the
                          feature?) for most users<br>
                          >>>>>>>>> to
                          consider it a good trade-off.<br>
                          >>>>>>>> It seems to
                          me that even the current 6x-7x slowdown could
                          be useful.<br>
                          >>>>>>>> Users who
                          already use dsymutil or llvm-dwp(assuming
                          DWARFLinker<br>
                          >>>>>>>> would be
                          taught to work with a split dwarf) tools spend
                          this time and,<br>
                          >>>>>>>> in some
                          scenarios, waste disk space by inter-mediate
                          files.<br>
                          >>>>>>> FWIW, dwp
                          (llvm-dwp hasn't really been optimized
                          compared to binutils<br>
                          >>>>>>> dwp) is designed
                          to be very quick - by not needing to do a lot
                          of<br>
                          >>>>>>> parsing/fixups.
                          Which, yes, means larger output files than
                          would be<br>
                          >>>>>>> possible with
                          more parsing/etc. It also doesn't take any
                          input from<br>
                          >>>>>>> the linker (so it
                          can run in parallel with the linker) - so it
                          can't<br>
                          >>>>>>> remove dead
                          subprograms. Given Google's the major (perhaps
                          only<br>
                          >>>>>>> significant?)
                          user of Split DWARF - I can say that the needs
                          don't<br>
                          >>>>>>> necessarily
                          overlap well with something that would take
                          significantly<br>
                          >>>>>>> longer to run or
                          use significantly more memory.
                          Faster/cheaper/with<br>
                          >>>>>>> somewhat bigger
                          output files is probably the right tradeoff
                          for<br>
                          >>>>>>> Google's use
                          case, at least.<br>
                          >>>>>>><br>
                          >>>>>>> I imagine Apple's
                          use for dsymutil is somewhat similar - it's
                          not used<br>
                          >>>>>>> in the iterative
                          development cycle, only in final releases -
                          well,<br>
                          >>>>>>> maybe their
                          situation is more "neutral" - not a major pain
                          point in<br>
                          >>>>>>> any case I'd
                          guess.<br>
                          >>>>>>><br>
                          >>>>>>><br>
                          >>>>>> I see. FWIW,
                          Comparison splitdwarf+dwp and DWARFLinker from
                          lld:<br>
                          >>>>>><br>
                          >>>>>> 1.
                          split-dwarf+llvm-dwp = linking time for clang
                          6 sec,<br>
                          >>>>>>       generating time
                          for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>
                          >>>>> FWIW, llvm-dwp is not
                          very well optimized (which is to say: it is
                          not<br>
                          >>>>> optimized), binutils dwp
                          might be a better comparison (& even that<br>
                          >>>>> doesn't have the
                          parallelism & some potential further
                          memory savings<br>
                          >>>>> that lld has that we
                          could take advantage of in a dwp-like tool)<br>
                          >>>>><br>
                          >>>>> What build mode was the
                          clang binary built in? Optimized or
                          unoptimized?<br>
                          >>>> right, that is unoptimized
                          build with -ffunction-sections.<br>
                          >>>><br>
                          >>>>>> 2. DWARFLinker from
                          lld = linking time for clang 72 sec,
                          clang=760M.<br>
                          >>> And this is without Split DWARF?
                          Without linker DWARF compression? -<br>
                          >>> that seems quite a bit
                          surprising, that the deduplication of DWARF<br>
                          >>> could fit into less space than
                          the wasted/reclaimed space in ranges (&<br>
                          >>> line)?<br>
                          >> that was without split dwarf, without
                          linker compression.<br>
                          >><br>
                          >>> Could you double check these
                          numbers & provide a clearer summary?<br>
                          >> sure, I would re-check it.<br>
                          >><br>
                          >>> Here's my attempt at numbers (all
                          with function-sections+gc-sections)...<br>
                          >>><br>
                          >>> Split DWARF tests didn't seem
                          meaningful - gc-debuginfo + split DWARF<br>
                          >>> seemed to drop all the debug info
                          (except gdb_index) so wasn't<br>
                          >>> working/comparison wasn't
                          meaningful for Apples to Apples, but<br>
                          >>> included it for comparing gc'd
                          non-split to non-gc'd split (disabled<br>
                          >>> gnu-pubnames/gdb-index
                          (-gsplit-dwarf -gno-gnu-pubnames) (which turns<br>
                          >>> on by default with Split DWARF
                          because gdb needs it - but a bit of an<br>
                          >>> unfair comparison without turning
                          on gnu-pubnames/gdb-index in other<br>
                          >>> build modes too, since it...
                          /shouldn't/ be necessary) which might've<br>
                          >>> been a factor in the data you
                          were looking at)<br>
                          >> that might be the case. i.e.
                          clang=997M for split dwarf(from my previous<br>
                          >> measurement) might include
                          gnu-pubnames.<br>
                          >><br>
                          >> would recheck it and if that is the
                          case then it is a unfair comparison.<br>
                          >><br>
                          >><br>
                          >> My point was that "DWARFLinker from
                          lld" takes less space than singleton<br>
                          >> split dwarf file+.dwp file.<br>
                          >><br>
                          >> for -O0 uncompressed:<br>
                          >><br>
                          >> - .dwp took 1.1G(if I built it
                          correctly), singleton clang(from your<br>
                          >> measurements) 566 MB<br>
                          >><br>
                          >>      overall 1.6G.<br>
                          > Oh, yeah, even if there are some
                          measurement issues, linked executable<br>
                          > + .dwp is going to be larger than a
                          linked executable using non-split<br>
                          > DWARF (in v5), since v5 uses all the same
                          representations as non-split<br>
                          > DWARF, and split DWARF adds the
                          indirection overhead of a split file,<br>
                          > etc.<br>
                          ><br>
                          > Even without DWARF linking, it's true
                          that split DWARF has overhead<br>
                          > (dwp+executable will be larger than
                          executable non-split).<br>
                          ><br>
                          > But maybe we've ended up down a bit of a
                          tangent in any case.<br>
                          ><br>
                          > Trying to bring this back to "should this
                          be committed to lld" seems<br>
                          > valuable, and I'm not sure what the right
                          criteria are for that.<br>
                          I think it would be useful to do "removing
                          obsolete debug info"<br>
                          in the linker. First thing is that it would be
                          the fastest way(no need<br>
                          to copy data/create temp files/built address
                          map...) Second thing<br>
                          is that it would be a good separation of
                          concepts. All debug info<br>
                          processing, currently done in the
                          linker(gdb_index, upcoming<br>
                          debug_names), could be moved into separate
                          library processing<br>
                          debug info. When gdb_index/debug_names should
                          be built without<br>
                          "removing of obsolete debug info" it would
                          have the same<br>
                          performance results as it currently has.<br>
                          <br>
                          We decided to give the idea of "removing of
                          obsolete debug info"<br>
                          another try and are going to implement it as a
                          separate utility<br>
                          working with built binary. Making it to be
                          multi-thread would<br>
                          probably show better performance results and
                          then it could<br>
                          probably be considered as acceptable to use
                          from the linker.<br>
                          <br>
                        </blockquote>
                        <div><br>
                        </div>
                        <div>I'm quite interested in this direction. One
                          thought I had was to incorporate such a
                          library into dsymutil but with support for
                          ELF. If you get a proposal written up I'd love
                          to take a look and comment.</div>
                        <div><br>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <p><br>
                  </p>
                  yes, I would share the proposal in a separate thread
                  within a week or two.<br>
                  <br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>Excellent, thanks :)</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                <div> Shortly: we decided to move in slightly other
                  direction than adding this functionality <br>
                  into dsymutil. Though if there is a preference to
                  implement it as part of dsymutil <br>
                  we are OK to do this way.<br>
                  <br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>I have a vague preference since a lot of
                functionality already exists there on one platform and
                extending that seems straight forward, however...</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                <div> In its first version, this new utility supposed to
                  receive built binary with debug info <br>
                  as input(with the new marking for references to
                  removed code sections -1/-2 <br>
                  -<a href="https://reviews.llvm.org/D84825"
                    target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D84825</a>)
                  and create a new binary with removed obsolete <br>
                  debug info according to the above marking. In the next
                  versions, it could be extended <br>
                  with other debug info optimizations tasks. F.e.
                  generation new index tables, debug info <br>
                  optimizing... etc...<br>
                  <br>
                  We considered three options:<br>
                  <br>
                  1. add new functionality into dsymutil. So that
                  dsymutil behaves differently <br>
                      on a non-darwin platform and supports another set
                  of command-line options.<br>
                  <br>
                  2. add new functionality into llvm-objcopy.
                  llvm-objcopy already supports various <br>
                       binary objects formats(MachO,ELF,COFF,wasm). It
                  also has several options <br>
                       to work with debug-info.<br>
                  <br>
                  3. create new utility llvm-dwarfutil which would
                  implement the above functionality <br>
                       and reuse DWARFLinker(extracted from dsymutil)
                  library and new library <br>
                       ObjectCopy(extracted from llvm-objcopy).<br>
                  <br>
                  So far our preference is number three. The reason for
                  this is that separate <br>
                  utility specifically working with debug info looks as
                  good separation of concepts. <br>
                  Adding another behavior to dsymutil looks not very
                  good. Extending the already <br>
                  rich interface of llvm-objcopy looks also not very
                  good. Having in mind that actual <br>
                  implementation would be shared by libraries, the
                  separate utility, working specifically <br>
                  with debug info, looks like the right choice. That is
                  our current idea. <br>
                  <p>I would publish the proposal shortly to discuss it.<br>
                  </p>
                  <br>
                </div>
              </blockquote>
              <div><br>
              </div>
              <div>These are solid arguments - in particular, I agree
                with not extending llvm-objcopy :)</div>
              <div><br>
              </div>
              <div><a class="gmail_plusreply"
                  id="gmail-m_144640436407649066plusReplyChip-0"
                  href="mailto:jonas@devlieghere.com" target="_blank"
                  moz-do-not-send="true">+Jonas Devlieghere</a> and <a
                  class="gmail_plusreply"
                  id="gmail-m_144640436407649066plusReplyChip-1"
                  href="mailto:aprantl@apple.com" target="_blank"
                  moz-do-not-send="true">+Adrian Prantl</a> for dsymutil
                comments.</div>
              <div><br>
              </div>
              <div>My personal thought would be that extending dsymutil
                should be ok as the functionality goes well with
                everything else dsymutil does (other than not support
                ELF which the dsymutil maintainers are on board with
                last I checked). That said, I definitely think a
                write-up will be helpful. No matter what I support
                extracting all of the behavior into libraries and using
                that somewhere :)</div>
              <div><br>
              </div>
              <div>Thanks!</div>
              <div><br>
              </div>
              <div>-eric</div>
              <div> </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                <div> Thank you, Alexey.<br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_quote">
                        <div>Thanks!</div>
                        <div><br>
                        </div>
                        <div>-eric</div>
                        <div> </div>
                        <blockquote class="gmail_quote"
                          style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                          Alexey.<br>
                          <br>
                          ><br>
                          > Ray's the best person to weigh in on
                          that. My 2c is that I think it<br>
                          > probably is worthwhile, even just as an
                          experiment, assuming it's not<br>
                          > too intrusive to lld.<br>
                          ><br>
                          >> - The "DWARFLinker from lld" 820
                          MB(from your measurements).<br>
                          >><br>
                          >><br>
                          >> So "DWARFLinker from lld" looks two
                          times better.<br>
                          >><br>
                          >><br>
                          >> Anyway, thank you for pointing me to
                          possible mistake. I would recheck<br>
                          >> it and update results.<br>
                          >><br>
                          >><br>
                          >> Alexey.<br>
                          >><br>
                          >><br>
                          >>> * -O0: (baseline, just using
                          strip -g: 356 MB)<br>
                          >>>     * compressed: 25% smaller
                          with gc-debuginfo (481 MB / 641 MB) (407<br>
                          >>> MB split/non-gc)<br>
                          >>>     * uncompressed: 30% smaller
                          (820 MB / 1.2 GB) (566 MB split/non-gc)<br>
                          >>> * -O3: (baseline: 116 MB)<br>
                          >>>     * compressed: 16% smaller
                          (361 MB / 462 MB) (283 MB split/non-gc)<br>
                          >>>     * uncompressed: 22% smaller
                          (1022 MB / 1.2 GB) (156 MB split/non-gc)<br>
                          >>><br>
                          >>><br>
                          >>><br>
                          >>><br>
                          >>> On Fri, Jun 26, 2020 at 9:28 AM
                          Alexey Lapshin<br>
                          >>> <<a
                            href="mailto:alapshin@accesssoftek.com"
                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
                          wrote:<br>
>>>>>>>>>>>> This idea goes in
                          another direction than fragmenting dwarf<br>
>>>>>>>>>>>> using elf
                          sections&tricks. It seems to me that the
                          cost of fragmenting is too high.<br>
                          >>>>>>>>>>> I
                          tend to agree - but I'm sort of leaning
                          towards trying to use object<br>
                          >>>>>>>>>>>
                          features as much as possible, then
                          implementing just enough custom<br>
                          >>>>>>>>>>>
                          handling in the linker to recoup overhead,
                          etc. (eg: add some kind of<br>
                          >>>>>>>>>>>
                          small header/brief description that makes it
                          easy for the linker to<br>
                          >>>>>>>>>>>
                          slice-and-dice - but hopefully a
                          domain-specific such header can be a<br>
                          >>>>>>>>>>>
                          bit more compact than the fully general ELF
                          form)<br>
                          >>>>>>>>>> I
                          think this indeed should be implemented and
                          evaluated.<br>
                          >>>>>>>>>> So
                          that various approaches could be compared.<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> It is not only the
                          sizes of structures describing fragments but
                          also the complexity<br>
>>>>>>>>>>>> of tools that should be
                          taught to work with fragmented DWARF.<br>
>>>>>>>>>>>> (f.e. llvm-dwarfdump
                          applied to object file should be able to read
                          fragmented DWARF,<br>
>>>>>>>>>>>> but applied to linked
                          executable it should work with non-fragmented
                          DWARF).<br>
>>>>>>>>>>>> That idea is for the
                          tool which works the same way as dsymutil ODR.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> I will shortly describe
                          the idea of making DWARF be easier processed
                          by dsymutil/DWARFLinker:<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> The idea is to have
                          only one "type table" per object file(special
                          section .debug_types_table).<br>
>>>>>>>>>>>> This "type table" would
                          contain all types.<br>
>>>>>>>>>>>> There could be a
                          special type of reference - type_offset - that
                          offset points into the type table.<br>
>>>>>>>>>>>> Basic types could
                          always be placed into the start of "type
                          table" thus, offsets to basic types<br>
>>>>>>>>>>>> most often would be 1
                          byte. There also would be a special kind of
                          reference - reference inside the type.<br>
>>>>>>>>>>>> Type units sig8 system
                          - would not be used to reference types.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> Types deduplication is
                          assumed to be done, not by linker mechanism
                          for COMDAT,<br>
>>>>>>>>>>>> but by a tool like
                          dsymutil. This tool would create resulting
                          .debug_types_table by putting there<br>
>>>>>>>>>>>> types from source
                          .debug_types_table-s. Only one copy of the
                          type would be placed into the<br>
>>>>>>>>>>>> resulting table. All
                          references pointing to the deleted copy would
                          be corrected to point<br>
>>>>>>>>>>>> to the single copy
                          inside "type table". (that is how dsymutil
                          works currently)<br>
                          >>>>>>>>>>> ^
                          that's the step that's probably a bit
                          expensive for a general-use<br>
                          >>>>>>>>>>>
                          tool - it implies parsing all the DWARF to
                          find those references and<br>
                          >>>>>>>>>>>
                          rewrite them, I think. For a high-performance
                          solution that could be<br>
                          >>>>>>>>>>>
                          run by the linker I think it'd be necessary to
                          have a solution that<br>
                          >>>>>>>>>>>
                          doesn't involve parsing all the DIEs.<br>
                          >>>>>>>>>>
                          According to the current dsymutil processing,<br>
                          >>>>>>>>>>
                          exactly this process is not the most
                          time-consuming.<br>
                          >>>>>>>>>> That
                          could be done relatively fast.<br>
                          >>>>>>>>> Fair
                          enough - though I'd still imagine any solution
                          that involves<br>
                          >>>>>>>>> parsing
                          all the DIEs still wouldn't be fast enough
                          (maybe an order of<br>
                          >>>>>>>>> magnitude
                          faster than the current solution even - but
                          that's stuill,<br>
                          >>>>>>>>> what, 6
                          or 7x slower than linking without the
                          feature?) for most users<br>
                          >>>>>>>>> to
                          consider it a good trade-off.<br>
                          >>>>>>>> It seems to
                          me that even the current 6x-7x slowdown could
                          be useful.<br>
                          >>>>>>>> Users who
                          already use dsymutil or llvm-dwp(assuming
                          DWARFLinker<br>
                          >>>>>>>> would be
                          taught to work with a split dwarf) tools spend
                          this time and,<br>
                          >>>>>>>> in some
                          scenarios, waste disk space by inter-mediate
                          files.<br>
                          >>>>>>> FWIW, dwp
                          (llvm-dwp hasn't really been optimized
                          compared to binutils<br>
                          >>>>>>> dwp) is designed
                          to be very quick - by not needing to do a lot
                          of<br>
                          >>>>>>> parsing/fixups.
                          Which, yes, means larger output files than
                          would be<br>
                          >>>>>>> possible with
                          more parsing/etc. It also doesn't take any
                          input from<br>
                          >>>>>>> the linker (so it
                          can run in parallel with the linker) - so it
                          can't<br>
                          >>>>>>> remove dead
                          subprograms. Given Google's the major (perhaps
                          only<br>
                          >>>>>>> significant?)
                          user of Split DWARF - I can say that the needs
                          don't<br>
                          >>>>>>> necessarily
                          overlap well with something that would take
                          significantly<br>
                          >>>>>>> longer to run or
                          use significantly more memory.
                          Faster/cheaper/with<br>
                          >>>>>>> somewhat bigger
                          output files is probably the right tradeoff
                          for<br>
                          >>>>>>> Google's use
                          case, at least.<br>
                          >>>>>>><br>
                          >>>>>>> I imagine Apple's
                          use for dsymutil is somewhat similar - it's
                          not used<br>
                          >>>>>>> in the iterative
                          development cycle, only in final releases -
                          well,<br>
                          >>>>>>> maybe their
                          situation is more "neutral" - not a major pain
                          point in<br>
                          >>>>>>> any case I'd
                          guess.<br>
                          >>>>>>><br>
                          >>>>>>><br>
                          >>>>>> I see. FWIW,
                          Comparison splitdwarf+dwp and DWARFLinker from
                          lld:<br>
                          >>>>>><br>
                          >>>>>> 1.
                          split-dwarf+llvm-dwp = linking time for clang
                          6 sec,<br>
                          >>>>>>       generating time
                          for .dwp 53 sec, clang=997M clang.dwp=1.1G.<br>
                          >>>>> FWIW, llvm-dwp is not
                          very well optimized (which is to say: it is
                          not<br>
                          >>>>> optimized), binutils dwp
                          might be a better comparison (& even that<br>
                          >>>>> doesn't have the
                          parallelism & some potential further
                          memory savings<br>
                          >>>>> that lld has that we
                          could take advantage of in a dwp-like tool)<br>
                          >>>>><br>
                          >>>>> What build mode was the
                          clang binary built in? Optimized or
                          unoptimized?<br>
                          >>>> right, that is unoptimized
                          build with -ffunction-sections.<br>
                          >>>><br>
                          >>>>>> 2. DWARFLinker from
                          lld = linking time for clang 72 sec,
                          clang=760M.<br>
                          >>>>> It does seem a tad
                          strange that the clang binary would be smaller<br>
                          >>>>> non-split with DWARF
                          linking than it was split. Though I could
                          imagine<br>
                          >>>>> this might be possible in
                          an optimized build (wehre debug_ranges<br>
                          >>>>> become quite relatively
                          expensive in the .o file contribution with<br>
                          >>>>> Split DWARF)<br>
                          >>>>> Could you compare the
                          section sizes between these two clang
                          binaries, perhaps?<br>
                          >>>> .debug_ranges is three times
                          bigger and .debug_line is twice bigger.<br>
                          >>>><br>
                          >>>>>>>> Thus if they
                          would use this LLD feature in its current
                          state<br>
                          >>>>>>>> - they would
                          still receive benefits.<br>
                          >>>>>>>><br>
                          >>>>>>>> Speaking of
                          performance results - LLD is a multi-thread
                          linker;<br>
                          >>>>>>>> it handles
                          sections in parallel. DWARFLinker generates
                          DWARF using<br>
                          >>>>>>>> AsmPrinter
                          which is a stream - so it could make resulting
                          DWARF only<br>
                          >>>>>>>> continuously.
                          It is not surprising that the parallel
                          solution works faster.<br>
                          >>>>>>>> Making
                          DWARFLinker truly multi-threaded would
                          probably allow us<br>
                          >>>>>>>> to make
                          slowdown to be at 2x-4x range.<br>
                          >>>>>>> *nod* that's
                          still a really expensive link - but I
                          understand that's a<br>
                          >>>>>>> suitable tradeoff
                          for your users<br>
                          >>>>>>><br>
                          >>>>>> Btw, 2x or 7x is for
                          pure linking time. Overall compilation
                          slowdown<br>
                          >>>>>> is not so
                          significant. Building LLVM codebase has only
                          20% slowdown.<br>
                          >>>>> Understood - that's still
                          quite significant to most users, I'd imagine.<br>
                          >>>> I see.<br>
                          >>>><br>
                          >>>>>>>>>>
                          Anyway, I think the dsymutil approach is still
                          valuable, and it<br>
                          >>>>>>>>>> would
                          be useful to optimize it.<br>
                          >>>>>>>>>> Do
                          you think it would be useful to make
                          dsymutil/DWARFLinker truly multi-thread?<br>
                          >>>>>>>>>> (To
                          make dsymutil/DWARFLinker able to process each
                          object file in a separate thread)<br>
                          >>>>>>>>> Perhaps -
                          that I'd probably leave up to the folks who
                          are more<br>
                          >>>>>>>>> invested
                          in dsymutil (Adrian Prantl et al). Maybe one
                          day we'll get it<br>
                          >>>>>>>>>
                          integrated into llvm-dwp and then I'll be
                          interested in getting as<br>
                          >>>>>>>>> much
                          performance out of it as lld - so
                          multithreading and things would<br>
                          >>>>>>>>> be on the
                          books.<br>
                          >>>>>>>> I think
                          improving dsymutil is a valuable thing.<br>
                          >>>>>>>> Though there
                          are several directions which might be
                          considered<br>
                          >>>>>>>> to make it
                          more robust:<br>
                          >>>>>>>><br>
                          >>>>>>>> 1. support of
                          latest DWARF - DWARF5/DWARF64...<br>
                          >>>>>>> I expect/though
                          some of the Apple folks had already worked on
                          DWARF5 support?<br>
                          >>>>>>> DWARF64 - that's
                          been around for a while, and just hasn't been
                          needed<br>
                          >>>>>>> by LLVM users
                          thus far, it seems (until recently - where
                          some<br>
                          >>>>>>> developers have
                          started working on that)<br>
                          >>>>>> There already
                          implemented debug_names table, but
                          debug_rnglists,<br>
                          >>>>>> debug_loclists, type
                          units - are not implemented yet.<br>
                          >>>>> Superficially, type units
                          wouldn't be on the list of features (like<br>
                          >>>>> DWARF64 - it's optional)
                          I'd try to support in dsymutil - since their<br>
                          >>>>> size overhead is more
                          justified for a DWARF-agnostic linker that's<br>
                          >>>>> using comdat groups. With
                          a DWARF-aware linker I'd be specifically<br>
                          >>>>> hoping to avoid using
                          type units to help<br>
                          >>>>>> The thing which<br>
                          >>>>>> should probably be
                          changed is that dsymutil should not have its
                          version<br>
                          >>>>>> of code generating
                          DWARF tables. It should call already existed<br>
                          >>>>>> DWARF5/DWARF64
                          implementations. Then dsymutil would always<br>
                          >>>>>> use last DWARF
                          generators.<br>
                          >>>>> Possibly - I don't know
                          what the architectural tradeoffs for that look<br>
                          >>>>> like - I'd imagine
                          DWARFLinker has sufficiently different<br>
                          >>>>> needs/tradeoffs than
                          LLVM's DWARF generation code (rewriting
                          existing<br>
                          >>>>> DIEs compared to building
                          new ones from scratch, etc) that it might be<br>
                          >>>>> hard for them to share a
                          lot of their implementation.<br>
                          >>>> It is not easy, and would
                          require some additions, but it would benefit<br>
                          >>>> in that all format
                          implementation is in one place. Thus changing
                          that place<br>
                          >>>> would reflect in other
                          places. There are at least three
                          implementations for<br>
                          >>>> .debug_ranges, .debug_aranges
                          currently...<br>
                          >>>><br>
                          >>>><br>
                          >>>>>>>> 2. implement
                          multi-threaded execution.<br>
                          >>>>>>>> 3. support of
                          split DWARF.<br>
                          >>>>>>> Maybe, though I'm
                          still not sure it'd be the right tradeoff -<br>
                          >>>>>>> especially if it
                          involved having to wait to run the .dwo merger
                          (call<br>
                          >>>>>>> it DWARF-aware
                          dwp, or dsymutil with dwp support) until after
                          the<br>
                          >>>>>>> linker ran.<br>
                          >>>>>>><br>
                          >>>>>>>> 4. implement
                          dsymutil for non-darwin platform.<br>
                          >>>>>>> That's probably,
                          essentially (3), more-or-less. Split DWARF is<br>
                          >>>>>>> somewhat of a
                          formalization of Apple's/MachO DWARF
                          distribution model<br>
                          >>>>>>> (leave DWARF it
                          in files that aren't linked/use them from a
                          debugger,<br>
                          >>>>>>> but also be able
                          to merge them into some final file (dsym or
                          dwp) for<br>
                          >>>>>>> archival
                          purposes)<br>
                          >>>>>>><br>
                          >>>>>>>> All of this
                          is a massive piece of work.<br>
                          >>>>>>>> Our original
                          investment was to solve two problems:<br>
                          >>>>>>>><br>
                          >>>>>>>> 1. Overlapped
                          address ranges, which is currently close to
                          being solved. Thank you for helping with that!<br>
                          >>>>>>> Yeah, again,
                          sorry that's taken quite so long/somewhat
                          circuitous route.<br>
                          >>>>>>><br>
                          >>>>>>>> 2. Size of
                          debug info. That still becomes an issue, but
                          we are unsure whether we are ready to<br>
                          >>>>>>>>      invest
                          in solving all the above 1-4 problems and how
                          much community interested in it.<br>
                          >>>>>>> Fair, for sure -
                          I don't think you'd need to sign up to solve
                          all of<br>
                          >>>>>>> them (don't think
                          they necessarily need solving). Potentially
                          moving<br>
                          >>>>>>> the logic out
                          into a separate tool as Fangrui's considering
                          - a<br>
                          >>>>>>> post-link DWARF
                          optimizer, rather than in-linker DWARF
                          optimization.<br>
                          >>>>>>><br>
                          >>>>>>> I really don't
                          want to give you the runaround like this - but
                          multiple<br>
                          >>>>>>> times slower
                          links is something that seems pretty
                          problematic for most<br>
                          >>>>>>> users, to the
                          point of weighing the maintainability of lld
                          against the<br>
                          >>>>>>> convenience of
                          having this functionality in-linker rather
                          than in a<br>
                          >>>>>>> post-link
                          optimizer.<br>
                          >>>>>>><br>
                          >>>>>>> (I know you've
                          spoken a bit before about your users needs -
                          but if<br>
                          >>>>>>> it's possible,
                          could you explain (again :/) why they have
                          such a<br>
                          >>>>>>> strong need for
                          smaller DWARF? While DWARF size is an ongoing
                          concern<br>
                          >>>>>>> for many users
                          (Google certainly - hence the invention of
                          Split DWARF,<br>
                          >>>>>>> use of type units
                          and compressed DWARF, etc) - usually it's in
                          rather<br>
                          >>>>>>> large programs,
                          but it sounds like you're dealing with
                          relatively<br>
                          >>>>>>> small ones
                          (otherwise the increase in link time, I'd
                          imagine, would be<br>
                          >>>>>>> prohibitive for
                          your users?)?<br>
                          >>>>>> We have many large
                          programs and keep Dayly/Nightly debug builds,<br>
                          >>>>>> which takes a lot of
                          disk space. Compilation time for these
                          programs is big.<br>
                          >>>>>> The scenario is
                          "compile once".(not
                          compile-debug-compile-debug).<br>
                          >>>>>> So we think that
                          solution(like dsymutil/DWARFLinker) would not
                          slowdown<br>
                          >>>>>> the compilation time
                          of overall build significantly(see above
                          numbers for<br>
                          >>>>>> llvm codebase) and
                          would allow us to reduce disk space required
                          to keep<br>
                          >>>>>> all of these builds.<br>
                          >>>>> Ah, OK - for archival
                          purposes. So the interactive developers
                          wouldn't<br>
                          >>>>> necessarily be using this
                          feature. Makes sense - similar to dsymutil<br>
                          >>>>> and dwp, mostly used for
                          archival purposes & you can debug straight<br>
                          >>>> >from .o/.dwos for
                          interactive/iterative development.<br>
                          >>>><br>
                          >>>>> In that case, it seems
                          more likely that a separate tool might
                          suffice.<br>
                          >>>> agreed: if to continue the
                          work on this then it makes sense to<br>
                          >>>> do it as separate tool. Make
                          it fast enough. And if there would be interest<br>
                          >>>> in it - then it would
                          probably be possible to return to idea calling
                          it from linker.<br>
                          >>>><br>
                          >>>>> Also, out of curiosity -
                          have you tried just compressing the output<br>
                          >>>>> (-gz (I think that does
                          the right thing for the linker level<br>
                          >>>>> compression too,
                          otherwise -Wl,-compress-debug-sections might
                          do it))<br>
                          >>>>> or are you already doing
                          that in addition?<br>
                          >>>> sure. we use 
                          -Wl,-compress-debug-sections.<br>
                          >>>><br>
                          >>>> Thank you, Alexey.<br>
                          >>>><br>
                          >>>>>>> You mentioned
                          that the usability cost of<br>
                          >>>>>>> Split DWARF for
                          your users was too high (or high enough to
                          justify<br>
                          >>>>>>> this alternative
                          work of DWARF-aware linking)? That all seems a
                          bit<br>
                          >>>>>>> surprising to me
                          - though I understand the deployment issues of
                          Split<br>
                          >>>>>>> DWARF do present
                          some challenges to users in more heterogenous<br>
                          >>>>>>> environments than
                          Google's... still, I'd have thought there was
                          some<br>
                          >>>>>>> hope there)<br>
                          >>>>>> Our tools does not
                          support split dwarf yet. Though we plan to
                          implement it.<br>
                          >>>>>> When we would have
                          support of split dwarf then it would be<br>
                          >>>>>> convenient to have
                          easy way to share built debug binaries.
                          llvm-dwp is the<br>
                          >>>>>> answer to this.
                          DWARFLinker could probably be another answer.<br>
                          >>>>> Ah, fair enough - thanks
                          for the context!<br>
                          >>>>>>>>> One way
                          to do that would be to have a CU-local type
                          indirection table.<br>
                          >>>>>>>>> DIEs
                          reference local type numbers (like local
                          address/string numbers -<br>
                          >>>>>>>>>
                          addrx/strx/rnglistx) and that table contains
                          either sig8 (no linker<br>
                          >>>>>>>>> fixups
                          required) or the local type offsets you
                          describe - the linker<br>
                          >>>>>>>>> would
                          then only need to read this type number
                          indirection table and<br>
                          >>>>>>>>> rewrite
                          them to the final type numbers.<br>
                          >>>>>>>> Yes, that
                          could be additionally done if this process
                          would be time-consuming.<br>
                          >>>>>>>><br>
                          >>>>>>>> David, thank
                          you for all your comments and explanations.
                          They are extremely helpful.<br>
                          >>>>>>> Sure thing -
                          really appreciate your patience with all this
                          - it's... a<br>
                          >>>>>>> lot of moving
                          parts.<br>
                          >>>>>>> - Dave<br>
                          >>>>>>> Thank you,
                          Alexey.<br>
                          >>>>>>><br>
                          >>>>>>>> sig8 hash-id
                          would be used to compare types and to
                          deduplicate them.<br>
                          >>>>>>>> It would
                          speed up the current dsymutil context
                          analysis.<br>
                          >>>>>>>> Types having
                          the same hash-id could be deduplicated.<br>
                          >>>>>>>> This would
                          allow deduplicating a more number of types
                          than current dsymutil.<br>
                          >>>>>>>> Incomplete
                          type definitions having a similar set of
                          members are not deduplicated by dsymutil
                          currently.<br>
                          >>>>>>>> In this case
                          they would have the same hash-id.<br>
                          >>>>>>>><br>
                          >>>>>>>> This "type
                          table" would take less space than current
                          "type units" and current ODR solution.<br>
                          >>>>>>>><br>
                          >>>>>>>> Above is just
                          an idea on how to help DWARF-aware
                          linker(based on idea removing obsolete debug
                          info)<br>
                          >>>>>>>> to work
                          faster(if that is interesting).<br>
                          >>>>>>>><br>
                          >>>>>>>> Alexey.<br>
                          >>>>>>>><br>
                          >>>>>>>>> From:
                          llvm-dev <<a
                            href="mailto:llvm-dev-bounces@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev-bounces@lists.llvm.org</a>>
                          On Behalf Of James Henderson via llvm-dev<br>
                          >>>>>>>>> Sent:
                          Wednesday, June 3, 2020 3:48 AM<br>
                          >>>>>>>>> To: David
                          Blaikie <<a
                            href="mailto:dblaikie@gmail.com"
                            target="_blank" moz-do-not-send="true">dblaikie@gmail.com</a>><br>
                          >>>>>>>>> Cc: <a
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                          >>>>>>>>> Subject:
                          Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove
                          obsolete debug info in lld.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> It makes
                          me sad that the linker (via a library or
                          otherwise) has to be "DWARF-aware" to be able
                          to effectively handle --gc-sections, COMDATs,
                          --icf etc for debug info, without leaving
                          large blocks of data kicking around.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> The
                          patching to -1 (or equivalent) is probably a
                          good lightweight solution (though I'd love it
                          if it could be done based on section type in
                          the future rather than section name, but
                          that's probably outside the realm of DWARF),
                          as it requires only minimal understanding in
                          the linker, but anything beyond that seems to
                          be complicated logic that is mostly due to the
                          structure of DWARF. Patching to -1 does feel a
                          bit like a sticking plaster/band aid to patch
                          over the issue rather than properly solving it
                          too - there will still be debug data
                          (potentially significant amounts in
                          COMDAT-heavy objects) that the linker has to
                          write and the debugger has to somehow know how
                          to skip (even if it knows that -1 is
                          special-case due to the standard being
                          updated, it needs to get as far as the -1),
                          which is all wasted effort.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> We've
                          already seen from Alexey's prototyping, and
                          from our own experiences with the Sony
                          proprietary linker (which tried to rewrite
                          .debug_line only) that deconstructing the
                          DWARF so that it can be more optimally
                          reassembled at link time is slow going, and
                          will probably inevitably be however much
                          effort is put into optimising it. For a start,
                          given the current standards, it's impossible
                          to know how to deconstruct it without having
                          to parse vast amounts of DWARF, which is
                          typically going to mean a lot more parsing
                          work than the linker would normally have to
                          deal with. Additionally, much of this parsing
                          work is wasted effort, since it seems unlikely
                          in many links that large amounts of the DWARF
                          will be redundant. Having an option to opt-in
                          doesn't help much there, since it just means
                          the logic exists without most people using it,
                          due to it not being good enough, or
                          potentially they don't even know it exists.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> I don't
                          have particularly concrete suggestions as to
                          how to solve the structural problems with
                          DWARF at this point. The only thing that seems
                          obvious to me is a more "blessed" approach to
                          fragmentation of sections, similar to what I
                          tried with my prototype mentioned earlier in
                          the thread, although we'd need to figure out
                          the previously stated performance issues.
                          Other ideas might tie into this, like somehow
                          sharing the various table headers a bit like
                          CIEs in .eh_frame that could be merged by the
                          linker - each object could have separate table
                          header sections, which are referenced by the
                          individual .debug_* blocks, which in turn are
                          one per function/data piece and easily
                          discardable/merged by the linker.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> Just some
                          thoughts.<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> James<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
                          >>>>>>>>> On Tue, 2
                          Jun 2020 at 19:24, David Blaikie via llvm-dev
                          <<a href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
                          wrote:<br>
                          >>>>>>>>><br>
                          >>>>>>>>> On Tue,
                          May 19, 2020 at 7:17 AM Alexey Lapshin<br>
                          >>>>>>>>> <<a
                            href="mailto:alapshin@accesssoftek.com"
                            target="_blank" moz-do-not-send="true">alapshin@accesssoftek.com</a>>
                          wrote:<br>
                          >>>>>>>>>> Hi
                          David, please find my comments inside:<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
>>>>>>>>>>>>> Broad question: Do
                          you have any specific motivation/users/etc in
                          implementing this (if you can speak about it)?<br>
>>>>>>>>>>>>> - it might help
                          motivate the work, understand what tradeoffs
                          might be suitable for you/your users, etc.<br>
>>>>>>>>>>>> There are two general
                          requirements:<br>
>>>>>>>>>>>> 1) Remove (or clean)
                          invalid debug info.<br>
                          >>>>>>>>>>>
                          Perhaps a simpler direct solution for your
                          immediate needs might be a much narrower,<br>
                          >>>>>>>>>>>
                          and more efficient linker-DWARF-awareness
                          feature:<br>
                          >>>>>>>>>>><br>
                          >>>>>>>>>>>
                          With DWARFv5, rnglists present an opportunity
                          for a DWARF linker to rewrite the ranges<br>
                          >>>>>>>>>>>
                          without parsing the rest of the DWARF.
                          /technically/ this isn't guaranteed - rnglist
                          entries<br>
                          >>>>>>>>>>>
                          can be referenced either directly, or by
                          index. If all rnglists are referenced by
                          index, then<br>
                          >>>>>>>>>>> a
                          linker could parse only the debug_rnglists
                          section and rewrite ranges to remove any<br>
                          >>>>>>>>>>>
                          address ranges that refer to optimized-out
                          code.<br>
                          >>>>>>>>>>><br>
                          >>>>>>>>>>>
                          This would only be correct for rnglists that
                          had no direct references to them (that only
                          were<br>
                          >>>>>>>>>>>
                          referenced via the indexes) - but we could
                          either implement it with that assumption, or
                          could<br>
                          >>>>>>>>>>>
                          add an LLVM extension attribute on the CU that
                          would say "I promise I only referenced
                          rnglists<br>
                          >>>>>>>>>>>
                          via rnglistx forms/indexes). If this
                          DWARF-aware linking would have to read the CU
                          DIE (not<br>
                          >>>>>>>>>>>
                          all the other DIEs) it /could/ also then
                          rewrite high/low_pc if the CU wasn't using
                          ranges...<br>
                          >>>>>>>>>>>
                          but that wouldn't come up in the
                          function-removal case, because then you'd have
                          ranges anyway,<br>
                          >>>>>>>>>>>
                          so no need for that.<br>
                          >>>>>>>>>>><br>
                          >>>>>>>>>>>
                          Such a DWARF-aware rnglist linking could also
                          simplify rnglists, in cases where functions<br>
                          >>>>>>>>>>>
                          ended up being laid out next to each other,
                          the linker could coalesce their ranges
                          together.<br>
                          >>>>>>>>>>><br>
                          >>>>>>>>>>> I
                          imagine this could be implemented with very
                          little overhead to linking, especially
                          compared<br>
                          >>>>>>>>>>>
                          to the overhead of full DWARF-aware linking.<br>
                          >>>>>>>>>>><br>
                          >>>>>>>>>>>
                          Though none of this fixes Split DWARF, where
                          the linker doesn't get a chance to see the<br>
                          >>>>>>>>>>>
                          addresses being used - but if you only
                          want/need the CU-level ranges to be correct,
                          this<br>
                          >>>>>>>>>>>
                          might be a viable fix, and quite efficient.<br>
                          >>>>>>>>>> Yes,
                          we think about that alternative. This would
                          resolve our problem of invalid debug info<br>
                          >>>>>>>>>> and
                          would work much faster. Thus, if we would not
                          have good results for D74169 then we<br>
                          >>>>>>>>>> will
                          implement it. Do you think it could be useful
                          to have this solution in upstream?<br>
                          >>>>>>>>> A pure
                          rnglist rewriting - I think it'd be OK to have
                          in upstream -<br>
                          >>>>>>>>> again,
                          cost/benefit/etc would have to be weighed. I'm
                          not sure it<br>
                          >>>>>>>>> would
                          save enough space to be particularly valuable
                          beyond the<br>
                          >>>>>>>>>
                          correctness issue - and it doesn't completely
                          solve the correctness<br>
                          >>>>>>>>> issue for
                          zero-address usage or low-address usage
                          (because you could<br>
                          >>>>>>>>> still
                          have overlapping subprograms inside a CU - so
                          if you were<br>
                          >>>>>>>>>
                          symbolizing you could use the correct rnglist
                          to filter, but then go<br>
                          >>>>>>>>> look
                          inside the CU only to find two subprograms
                          that had that address<br>
                          >>>>>>>>> & not
                          know which one was the correct one an which
                          one was the<br>
                          >>>>>>>>> discarded
                          one).<br>
                          >>>>>>>>><br>
                          >>>>>>>>> rnglist
                          rewriting might be easy enough to prototype -
                          but depends what<br>
                          >>>>>>>>> you want
                          to spend your time on, I know this whole issue
                          has been a<br>
                          >>>>>>>>> huge
                          investment of your time already - but maybe
                          this recent<br>
                          >>>>>>>>>
                          revitalization of the conversation around
                          having an explicit value in<br>
                          >>>>>>>>> the
                          linker might be sufficient to address
                          everyone's needs... *fingers<br>
                          >>>>>>>>> crossed*)<br>
                          >>>>>>>>><br>
                          >>>>>>>>><br>
>>>>>>>>>>>> 2) Optimize the DWARF
                          size.<br>
                          >>>>>>>>>>>
                          Do your users care much about this? I imagine
                          if they had significant DWARF size issues,<br>
                          >>>>>>>>>>>
                          they'd have significant link time issues and
                          the kind of cost to link time this feature has
                          would<br>
                          >>>>>>>>>>>
                          be prohibitive - but perhaps they're sharing
                          linked binaries much more often than they're<br>
                          >>>>>>>>>>>
                          actually performing linking.<br>
                          >>>>>>>>>> Yes,
                          they do. They also have significant link-time
                          issues.<br>
                          >>>>>>>>>> So
                          current performance results of D74169 are not
                          very acceptable.<br>
                          >>>>>>>>>> We
                          hope to improve it.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> The specifics which our
                          users have:<br>
>>>>>>>>>>>>    - embedded platform
                          which uses 0 as start of .text section.<br>
>>>>>>>>>>>>    - custom toolset
                          which does not support all features yet(f.e.
                          split dwarf).<br>
>>>>>>>>>>>>    - tolerant of the
                          link-time increase.<br>
>>>>>>>>>>>>    - need a useful way
                          to share debug builds.<br>
                          >>>>>>>>>>>
                          Sharing two files (executable and dwp) is
                          significantly less useful than sharing one
                          file?<br>
                          >>>>>>>>>>
                          Probably not significantly, but yes, it looks
                          less useful comparing to D74169.<br>
                          >>>>>>>>>>
                          Having only two files (executable and .dwp)
                          looks significantly better than having
                          executable and multiple .dwo files.<br>
                          >>>>>>>>>>
                          Having only one file(executable) with minimal
                          size looks better than the two files with a
                          bigger size.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>> clang
                          compiled with -gsplitdwarf takes 0.9G for
                          executable and 0.9G for .dwp.<br>
                          >>>>>>>>>> clang
                          compiled with -gc-debuginfo takes only 0.76G
                          for single executable.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> For the first point: we
                          have a problem "Overlapping address ranges
                          starting from 0"(D59553).<br>
>>>>>>>>>>>> We use custom solution,
                          but the general solution like D74169 would be
                          better here.<br>
                          >>>>>>>>>>>
                          If CU ranges are the only ones that need
                          fixing, then I think the above solution might
                          be as<br>
                          >>>>>>>>>>>
                          good/better - if more than CU ranges need
                          fixing, then I think we might want to start
                          talking about<br>
                          >>>>>>>>>>>
                          how to fix DWARF itself (split and non-split)
                          to signal certain addresses point to dead code
                          with a<br>
                          >>>>>>>>>>>
                          specific blessed value that linkers would need
                          to implement - because with Split DWARF
                          there's<br>
                          >>>>>>>>>>>
                          no way to solve the non-CU addresses at the
                          linker.<br>
                          >>>>>>>>>> I
                          think the worthful solution for that signal
                          value would be LowPC > HighPC.<br>
                          >>>>>>>>>> That
                          does not require additional bits in DWARF.<br>
                          >>>>>>>>>> It
                          would be natural to skip such address ranges
                          since they explicitly marked as invalid.<br>
                          >>>>>>>>>> It
                          could be implemented in a linker very easily.
                          Probably, it would make sense to describe that<br>
                          >>>>>>>>>> usage
                          in DWARF standard.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>> As to
                          the addresses which are not seen by the
                          linker(since they are in .dwo files) - yes,<br>
                          >>>>>>>>>> they
                          need to have another solution. Could you show
                          an example of such a case, please?<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
                          >>>>>>>>>><br>
>>>>>>>>>>>>> 2. Support of type
                          units.<br>
>>>>>>>>>>>>>>    That could
                          be implemented further.<br>
>>>>>>>>>>>>> Enabling type units
                          increases object size to make it easier to
                          deduplicate at link time by a DWARF-unaware<br>
>>>>>>>>>>>>> linker. With a
                          DWARF aware linker it'd be generally desirable
                          not to have to add that object size overhead
                          to<br>
>>>>>>>>>>>>> get the linking
                          improvements.<br>
>>>>>>>>>>>> But, DWARFLinker should
                          adequately work with type units since they are
                          already implemented.<br>
                          >>>>>>>>>>>
                          Maybe - it'd be nice & all, but I don't
                          think it's an outright necessity - if someone
                          knows they're using<br>
                          >>>>>>>>>>> a
                          DWARF-aware linker, they'd probably not use
                          type units in their object files. It's
                          possible someone<br>
                          >>>>>>>>>>>
                          doesn't know for sure & maybe they have
                          pre-canned debug object files from someone
                          else, etc.<br>
                          >>>>>>>>>> I
                          see.<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> Another thing is that
                          the idea behind type units has the potential
                          to help Dwarf-aware linker to work faster.<br>
>>>>>>>>>>>> Currently, DWARFLinker
                          analyzes context to understand whether types
                          are the same or not.<br>
                          >>>>>>>>>>>
                          When you say "analyzes context" what do you
                          mean? Usually I'd take that to mean<br>
                          >>>>>>>>>>>
                          "looks at things outside the type itself -
                          like what namespace it's in, etc" - which,
                          yes,<br>
                          >>>>>>>>>>>
                          it should do that, but it doesn't seem very
                          expensive to do. But I guess you actually<br>
                          >>>>>>>>>>>
                          mean something about doing structural
                          equivalence in some way, looking at things
                          inside the type?<br>
                          >>>>>>>>>> I
                          think it could be useful for both cases.
                          Currently, dsymutil does only first thing<br>
                          >>>>>>>>>> (look
                          at type name, namespace name, etc..) and does
                          not do the second thing<br>
                          >>>>>>>>>>
                          (doing structural equivalence). Analyzing type
                          names is currently quite expensive<br>
                          >>>>>>>>>> (the
                          only search in string pool takes ~10 sec from
                          70 sec of overall time).<br>
                          >>>>>>>>>> That
                          is expensive because of many things should be
                          done to work with strings:<br>
                          >>>>>>>>>> parse
                          DWARF, search and resolve relocations, compute
                          a hash for strings,<br>
                          >>>>>>>>>> put
                          data into a string pool, create a fully
                          qualified name(like
                          namespace::function::name).<br>
                          >>>>>>>>>> It
                          looks like it could be optimized and finally
                          require less time, but it still would be a
                          noticeable<br>
                          >>>>>>>>>> part
                          of the overall time.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>> If
                          dsymutil starts to check for the structural
                          equivalence, then the process would be even
                          more slowly.<br>
                          >>>>>>>>>> So,
                          If instead of comparing types structure, there
                          would be checked single hash-id - then this
                          process<br>
                          >>>>>>>>>> would
                          also be faster.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>> Thus
                          I think using hash-id to compare types would
                          allow to make current implementation faster
                          and would<br>
                          >>>>>>>>>> allow
                          handling incomplete types by DWARFLinker
                          without massive performance degradation also.<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> But the context is
                          known when types are generated. So, no need to
                          spent the time analyzing it.<br>
>>>>>>>>>>>> If types could be
                          compared without analyzing context, then
                          Dwarf-aware linker would work faster.<br>
>>>>>>>>>>>> That is just an
                          idea(not for immediate implementation): If
                          types would be stored in some "type table"<br>
>>>>>>>>>>>> (instead of COMDAT
                          section group) and could be accessed through
                          hash-id(like type units<br>
>>>>>>>>>>>> - then it would be the
                          solution requiring fewer bits to store but
                          allowing to compare types<br>
>>>>>>>>>>>> by hash-id(not
                          analysing context).<br>
>>>>>>>>>>>> In this case, size
                          increasing would be small. And processing time
                          could be done faster.<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>> this is just an idea
                          and could be discussed separately from the
                          problem of integrating of D74169.<br>
>>>>>>>>>>>>>> 6. -flto=thin<br>
>>>>>>>>>>>>>>      That
                          problem was described in this review <a
                            href="https://reviews.llvm.org/D54747#1503720"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://reviews.llvm.org/D54747#1503720</a>.
                          It also exists in<br>
>>>>>>>>>>>>>> current
                          DWARFLinker/dsymutil implementation. I think
                          that problem should be discussed more: it
                          could<br>
>>>>>>>>>>>>>> probably be
                          fixed by avoiding generation of such
                          incomplete declaration during thinlto,<br>
>>>>>>>>>>>>>> That would be
                          costly to produce extra/redundant debug info
                          in ThinLTO - actually ThinLTO could be doing<br>
>>>>>>>>>>>>>> more to reduce
                          that redundancy early on (actually removing
                          definitions from some llvm Modules if the type<br>
>>>>>>>>>>>>>> definition is
                          known to exist in another Module, etc)<br>
>>>>>>>>>>>>> I don't know if
                          it's a problem since that patch was reverted.<br>
>>>>>>>>>>>> Yes. That patch was
                          reverted, but this patch(D74169) has the same
                          problem.<br>
>>>>>>>>>>>> if D74169 would be
                          applied and --gc-debuginfo used then structure
                          type<br>
>>>>>>>>>>>> definition would be
                          removed.<br>
>>>>>>>>>>>> DWARFLinker could
                          handle that case - "removing definitions from
                          some llvm Modules if the type<br>
>>>>>>>>>>>> definition is known to
                          exist in another Module".<br>
>>>>>>>>>>>> i.e. DWARFLinker could
                          replace the declaration with the definition.<br>
>>>>>>>>>>>> But that problem could
                          be more easily resolved when debug info is
                          generated(probably without<br>
>>>>>>>>>>>> significant increase of
                          debug info size):<br>
>>>>>>>>>>>> Here we have:<br>
>>>>>>>>>>>>
                          DW_TAG_compile_unit(0x0000000b) - compile unit
                          containing concrete instance for function "f".<br>
>>>>>>>>>>>>
                          DW_TAG_compile_unit(0x00000073) - compile unit
                          containing abstract instance root for function
                          "f".<br>
>>>>>>>>>>>>
                          DW_TAG_compile_unit(0x000000c1) - compile unit
                          containing function "f" definition.<br>
>>>>>>>>>>>> Code for function "f"
                          was deleted. gc-debuginfo deletes compile unit
                          DW_TAG_compile_unit(0x000000c1)<br>
>>>>>>>>>>>> containing "f"
                          definition (since there is no corresponding
                          code). But it has structure "Foo" definition<br>
>>>>>>>>>>>>
                          DW_TAG_structure_type(0x0000011e) referenced
                          from DW_TAG_compile_unit(0x00000073)<br>
>>>>>>>>>>>> by declaration
                          DW_TAG_structure_type(0x000000ae). That
                          declaration is exactly the case when
                          definition<br>
>>>>>>>>>>>> was removed by thinlto
                          and replaced with declaration.<br>
>>>>>>>>>>>> Would it cost too much
                          if type definition would not be replaced with
                          declaration for "abstract instance root"?<br>
>>>>>>>>>>>> The number of concrete
                          instances is bigger than number of abstract
                          instance roots.<br>
>>>>>>>>>>>> Probably, it would not
                          be too costly to leave definition in abstract
                          instance root?<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>> Alternatively, Would it
                          cost too much if type definition would not be
                          replaced with declaration when<br>
>>>>>>>>>>>> declaration references
                          type from not used function? (lto could
                          understand that concrete function is not
                          used).<br>
                          >>>>>>>>>>> I
                          don't follow this example - could you provide
                          a small concrete test case I could reproduce?<br>
                          >>>>>>>>>> I
                          would provide a test case if necessary. But it
                          looks like this issue is finally clear, and
                          you already commented on that.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>>>
                          Oh, I guess this is happening perhaps because
                          ThinLTO can't know for sure that a standalone<br>
                          >>>>>>>>>>>
                          definition of 'f' won't be needed - so it
                          produces one in case one of the inlining
                          opportunities<br>
                          >>>>>>>>>>>
                          doesn't end up inlining. Then it turns out all
                          calls got inlined, so the external definition
                          wasn't needed.<br>
                          >>>>>>>>>>>
                          Oh, you're suggesting that these 3 CUs got
                          emitted into one object file during LTO, but
                          that DWARFLinker<br>
                          >>>>>>>>>>>
                          drops a CU without any code in it - even
                          though... So far as I know, in LTO, LLVM
                          directly references<br>
                          >>>>>>>>>>>
                          types across units if the CUs are all emitted
                          in the same object file. (and if they weren't
                          in the same<br>
                          >>>>>>>>>>>
                          object file - then the abstract_origin
                          couldn't be pointing cross-CU).<br>
                          >>>>>>>>>>> I
                          guess some basic things to say:<br>
                          >>>>>>>>>>>
                          With ThinLTO, the concrete/standalone function
                          definition is emitted in case some call sites
                          don't end up<br>
                          >>>>>>>>>>>
                          being inlined. So we know it'll be emitted
                          (but might not be needed by the actual linker)<br>
                          >>>>>>>>>>>
                          ANy number of inline calls might exist - but
                          we shouldn't put the type information into
                          those, because<br>
                          >>>>>>>>>>>
                          they aren't guaranteed to emit it (if the
                          inline function gets optimized away, there
                          would be nothing to<br>
                          >>>>>>>>>>>
                          enforce the type being emitted) - and even if
                          we forced the type information to be emitted
                          into one<br>
                          >>>>>>>>>>>
                          object file that has an inline copy of the
                          function - there's no guarantee that object
                          file will get linked in either.<br>
                          >>>>>>>>>>>
                          So, no, I don't think there's much we can do
                          to keep the size of object files down, while
                          guaranteeing<br>
                          >>>>>>>>>>>
                          the type information will be emitted with the
                          usual linker semantics.<br>
                          >>>>>>>>>> Then
                          dsymutil/DWARFLinker could be changed to
                          handle that(though it would probably be not
                          very efficient).<br>
                          >>>>>>>>>> If
                          thinlto would understand that function is not
                          used finally(and then must not contain
                          referenced type definition),<br>
                          >>>>>>>>>> then
                          this situation could be handled more
                          effectively.<br>
                          >>>>>>>>>><br>
                          >>>>>>>>>> Thank
                          you, Alexey.<br>
                          >>>>>>>>>><br>
>>>>>>>>>>>><br>
>>>>>>>>>>>><br>
>>>>>>>>>>>>
                          _______________________________________________<br>
>>>>>>>>>>>> LLVM Developers mailing
                          list<br>
>>>>>>>>>>>> <a
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
>>>>>>>>>>>> <a
                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
                          >>>>>>>>>
                          _______________________________________________<br>
                          >>>>>>>>> LLVM
                          Developers mailing list<br>
                          >>>>>>>>> <a
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                          >>>>>>>>> <a
                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
                          >>>
                          _______________________________________________<br>
                          >>> LLVM Developers mailing list<br>
                          >>> <a
                            href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                          >>> <a
                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
_______________________________________________<br>
                          LLVM Developers mailing list<br>
                          <a href="mailto:llvm-dev@lists.llvm.org"
                            target="_blank" moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                          <a
                            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
                        </blockquote>
                      </div>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </body>
</html>