<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    Hi James,<br>

    <br>

    Thank you for the comments. <br>

    <br>

    >I think we're not terribly far from that ideal, now, for ELF.

    Maybe only these three things need to be done? --<br>

    >  1. Teach lld how to emit a separated debuginfo output file

    directly, without requiring an objcopy step.<br>

    >  2. Integrate DWARFLinker into lld.<br>

    >  3. Create a new tool which takes the separated debuginfo and

    DWO/DWP files and uses DWARFLinker library <br>

    > to create a new (dwarf-linked) separated-debug file, that

    doesn't depend on DWO/DWP files.<br>

    <br>

    The three goals which you`ve described are our far goals. <br>

    Indeed, the best solution would be to create valid optimized debug

    info without additional <br>

    stages and additional modifications of resulting binaries. <br>

    <br>

    There was an attempt to use DWARFLinker from the lld -

    <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D74169">https://reviews.llvm.org/D74169</a><br>

    It did not receive enough support to be integrated yet. There are

    fair reasons for that:<br>

    <br>

    1. Execution time. The time required by DWARFLinker for processing

    clang binary is 8x bigger<br>

    than the usual linking time. Linking clang binary with DWARFLinker

    takes 72 sec, <br>

    linking with the only lld takes 9 sec.<br>

    <br>

    2. "Removing obsolete debug info" could not be switched off. Thus,

    lld could not use DWARFLinker for<br>

    other tasks(like generation of index tables - .gdb_index,

    .debug_names) without significant performance <br>

    degradation.<br>

    <br>

    3. DWARFLinker does not support split dwarf at the moment.<br>

    <br>

    All these reasons are not blockers. And I believe implementation

    from D74169 might be integrated and <br>

    incrementally improved if there would be agreement on that.<br>

    <br>

    Using DWARFLinker from llvm-dwarfutil is another possibility to use

    and improve it. <br>

    When finally implemented - llvm-dwarfutil should solve the above

    three issues and there <br>

    would probably be more reasons to include DWARFLinker into lld.<br>

    <br>

    Even if we would have the best solution - it is still useful to have

    a tool like llvm-dwarfutil<br>

    for cases when it is necessary to process already created binaries.

    <br>

    <br>

    So in short, the suggested tool - llvm-dwarfutil - is a step towards

    the ideal solution. <br>

    Its benefit is that it could be used until we created the best

    solution or for cases <br>

    where "the best solution" is not applicable.<br>

    <br>

    Thank you, Alexey.

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 29.08.2020 00:23, James Y Knight

      wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAA2zVHogHGoDqQy84GwC5HMVkd8w=5Sns=F3HevbxrsagoaM6g@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">If we're designing a new tool and process, it would

        be wonderful if it did not require multiple stages of copying

        and slightly modifying the binary, in order to create final

        output with separate debug info. It seems to me that the

        variants of this sort of thing which exist today are somewhat

        suboptimal.

        <div><br>

        </div>

        <div>With Mach-O and dsymutil:</div>

        <div>  1. Given a collection of object files (which contain

          debuginfo), link a binary with ld. The binary then includes

          special references to the object files that were actually used

          as part of the link.<br>

        </div>

        <div>  2. Given the linked binary, and all of the same object

          files, link the debuginfo with dsymutil.</div>

        <div>  3. Strip the references to the object file paths from the

          binary.</div>

        <div>  Finally, you have a binary without debug info, and a dsym

          debuginfo file. But it would be better if the binary created

          in step 1 didn't need to include the extraneous object-file

          path info, and that was instead emitted in a second file. Then

          we wouldn't need step 3.</div>

        <div><br>

        </div>

        <div>With "normal" ELF:</div>

        <div>  1. Given a collection of object files (which contain

          debuginfo), link a binary with ld, which includes linking all

          the debug info into the binary.<br>

        </div>

        <div>  2. Given the linked binary, objcopy --only-keep-debug to

          create a new separated debug file.</div>

        <div>  3. Given the linked binary, objcopy --strip-debug to

          create a copy of the binary without debug info.</div>

        <div>  Finally you have a binary without debug info, and a

          separate debug file. But it would be better if the linker

          could just write the debug info into a separate file in the

          first place, then we'd only have the one step. (But, downside,

          the linker needs to manage all the debug info, which can be

          excessively large.)</div>

        <div><br>

        </div>

        <div>With "split-dwarf" ELF support:</div>

        <div>  1. Given object files (which exclude <i>most</i> but not

          all of the debuginfo), link a binary. The binary will include

          that smaller set of debug info.<br>

        </div>

        <div>  2. Given the collection of dwo files corresponding to the

          object files, run the "dwp" tool to create a dwp file.</div>

        <div>  3. objcopy --only-keep-debug</div>

        <div>  4. --strip-debug</div>

        <div>  And then you need to keep both a debug file <i>and</i> a

          dwp file, which is weird.</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>I think, ideally, users would have the following three <i>good</i>

          options:</div>

        <div>  Easy option: store debuginfo in the object files, and

          have the linker create a pair of {binary, separated

          dwarf-optimized debuginfo} files directly from the object

          files.</div>

        <div>  More scalable option: emit (most of the) debuginfo in

          separate *.dwo files using -gsplit-dwarf, and then,</div>

        <div>    1. run the linker on the object files to create a pair

          of {binary, separated debuginfo} files. In this case the

          latter file contains the minimal debuginfo which was in the

          object files. </div>

        <div>    2. run a second tool, which reads the minimal debuginfo

          from above, and all the DWO files, and creates a full

          optimized/deduplicated debuginfo output file.</div>

        <div>  Faster developer builds: Like previous, but omit step 2

          -- running the debugger directly after step 1 can use the dwo

          files on-disk.</div>

        <div><br>

        </div>

        <div>I think we're not terribly far from that ideal, now, for

          ELF. Maybe only these three things need to be done? --</div>

        <div>  1. Teach lld how to emit a separated debuginfo output

          file directly, without requiring an objcopy step.</div>

        <div>  2. Integrate DWARFLinker into lld.</div>

        <div>  3. Create a new tool which takes the separated debuginfo

          and DWO/DWP files and uses DWARFLinker library to create a new

          (dwarf-linked) separated-debug file, that doesn't depend on

          DWO/DWP files.</div>

        <div><br>

        </div>

        <div>My hope is that the tool you're creating will be the

          implementation of #3, but I'm afraid the intent is for this

          tool to be an additional stage that non-split-dwarf users

          would need to run post-link, <i>instead of</i> integrating

          DWARFLinker into lld.</div>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Tue, Aug 25, 2020 at 10:29

          AM Alexey via llvm-dev <<a

            href="mailto:llvm-dev@lists.llvm.org" target="_blank"

            moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>

          wrote:<br>

        </div>

        <blockquote class="gmail_quote" style="margin:0px 0px 0px

          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

          <br>

             We propose llvm-dwarfutil - a dsymutil-like tool for ELF.<br>

             Any thoughts on this?<br>

             Thanks in advance, Alexey.<br>

          <br>

======================================================================<br>

          <br>

          llvm-dwarfutil(Apndx A) - is a tool that is used for

          processing debug <br>

          info(DWARF)<br>

          located in built binary files to improve debug info quality,<br>

          reduce debug info size and accelerate debug info processing.<br>

          Supported object files formats: ELF, MachO(Apndx B),

          COFF(Apndx C), <br>

          WASM(Apndx C).<br>

          <br>

======================================================================<br>

          <br>

          Specifically, the tool would do:<br>

          <br>

             - Remove obsolete debug info which refers to code deleted

          by the linker<br>

               doing the garbage collection (gc-sections).<br>

          <br>

             - Deduplicate debug type definitions for reducing resulting

          size of <br>

          binary.<br>

          <br>

             - Build accelerator/index tables.<br>

               = .debug_aranges, .debug_names, .gdb_index,

          .debug_pubnames, <br>

          .debug_pubtypes.<br>

          <br>

             - Strip unneeded tables.<br>

               = .debug_aranges, .debug_names, .gdb_index,

          .debug_pubnames, <br>

          .debug_pubtypes.<br>

          <br>

             - Compress or decompress debug info as requested.<br>

          <br>

          Possible feature:<br>

          <br>

             - Join split dwarf .dwo files in a single file containing

          all debug info<br>

               (convert split DWARF into monolithic DWARF).<br>

          <br>

======================================================================<br>

          <br>

          User interface:<br>

          <br>

             OVERVIEW: A tool for optimizing debug info located in the

          built binary.<br>

          <br>

             USAGE: llvm-dwarfutil [options] input output<br>

          <br>

             OPTIONS: (Apndx E)<br>

          <br>

======================================================================<br>

          <br>

          Implementation notes:<br>

          <br>

          1. Removing obsolete debug info would be done using

          DWARFLinker llvm <br>

          library.<br>

          <br>

          2. Data types deduplication would be done using DWARFLinker

          llvm library.<br>

          <br>

          3. Accelerator/index tables would be generated using

          DWARFLinker llvm <br>

          library.<br>

          <br>

          4. Interface of DWARFLinker library would be changed in such

          way that it<br>

              would be possible to switch on/off various stages:<br>

          <br>

             class DWARFLinker {<br>

               setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo =

          false);<br>

          <br>

               setDoAppleNames ( bool DoAppleNames = false );<br>

               setDoAppleNamespaces ( bool DoAppleNamespaces = false );<br>

               setDoAppleTypes ( bool DoAppleTypes = false );<br>

               setDoObjC ( bool DoObjC = false );<br>

               setDoDebugPubNames ( bool DoDebugPubNames = false );<br>

               setDoDebugPubTypes ( bool DoDebugPubTypes = false );<br>

          <br>

               setDoDebugNames (bool DoDebugNames = false);<br>

               setDoGDBIndex (bool DoGDBIndex = false);<br>

             }<br>

          <br>

          5. Copying source file contents, stripping tables, <br>

          compressing/decompressing tables<br>

              would be done by ObjCopy llvm library(extracted from

          llvm-objcopy):<br>

          <br>

             Error executeObjcopyOnBinary(const CopyConfig &Config,<br>

                                        object::COFFObjectFile &In,

          Buffer &Out);<br>

             Error executeObjcopyOnBinary(const CopyConfig &Config,<br>

                                        object::ELFObjectFileBase

          &In, Buffer &Out);<br>

             Error executeObjcopyOnBinary(const CopyConfig &Config,<br>

                                        object::MachOObjectFile &In,

          Buffer &Out);<br>

             Error executeObjcopyOnBinary(const CopyConfig &Config,<br>

                                        object::WasmObjectFile &In,

          Buffer &Out);<br>

          <br>

          6. Address ranges and single addresses pointing to removed

          code should <br>

          be marked<br>

              with tombstone value in the input file:<br>

          <br>

              -2 for .debug_ranges and .debug_loc.<br>

              -1 for other .debug* tables.<br>

          <br>

          7. Prototype implementation - <a

            href="https://reviews.llvm.org/D86539" rel="noreferrer"

            target="_blank" moz-do-not-send="true">https://reviews.llvm.org/D86539</a>.<br>

          <br>

======================================================================<br>

          <br>

          Roadmap:<br>

          <br>

          1. Refactor llvm-objcopy to extract it`s implementation into

          separate <br>

          library<br>

              ObjCopy(in LLVM tree).<br>

          <br>

          2. Create a command line utility using existed DWARFLinker and

          ObjCopy<br>

              implementation. First version is supposed to work with

          only ELF <br>

          input object files.<br>

              It would take input ELF file with unoptimized debug info

          and create <br>

          output<br>

              ELF file with optimized debug info. That version would be

          done out <br>

          of the llvm tree.<br>

          <br>

          3. Make a tool to be able to work in multi-thread mode.<br>

          <br>

          4. Consider it to be included into LLVM tree.<br>

          <br>

          5. Support DWARF5 tables.<br>

          <br>

======================================================================<br>

          <br>

          Appendix A. Should this tool be implemented as a new tool or

          as an extension<br>

                       to dsymutil/llvm-objcopy?<br>

          <br>

              There already exists a tool which removes obsolete debug

          info on <br>

          darwin - dsymutil.<br>

              Why create another tool instead of extending the already

          existed <br>

          dsymutil/llvm-objcopy?<br>

          <br>

              The main functionality of dsymutil is located in a

          separate library <br>

          - DWARFLinker.<br>

              Thus, dsymutil utility is a command-line interface for

          DWARFLinker. <br>

          dsymutil has<br>

              another type of input/output data: it takes several object

          files and <br>

          address map<br>

              as input and creates a .dSYM bundle with linked debug info

          as <br>

          output. llvm-dwarfutil<br>

              would take a built executable as input and create an

          optimized <br>

          executable as output.<br>

              Additionally, there would be many command-line options

          specific for <br>

          only one utility.<br>

              This means that these utilities(implementing command line

          interface) <br>

          would significantly<br>

              differ. It makes sense not to put another command-line

          utility <br>

          inside existing dsymutil,<br>

              but make it as a separate utility. That is the reason why

          <br>

          llvm-dwarfutil suggested to be<br>

              implemented not as sub-part of dsymutil but as a separate

          tool.<br>

          <br>

              Please share your preference: whether llvm-dwarfutil

          should be<br>

              separate utility, or a variant of dsymutil compiled for

          ELF?<br>

          <br>

======================================================================<br>

          <br>

          Appendix B. The machO object file format is already supported

          by dsymutil.<br>

              Depending on the decision whether llvm-dwarfutil would be

          done as a <br>

          subproject<br>

              of dsymutil or as a separate utility - machO would be

          supported or not.<br>

          <br>

======================================================================<br>

          <br>

          Appendix C. Support for the COFF and WASM object file formats

          presented as<br>

               possible future improvement. It would be quite easy to

          add them <br>

          assuming<br>

               that llvm-objcopy already supports these formats. It also

          would require<br>

               supporting DWARF6-suggested tombstone values(-1/-2).<br>

          <br>

======================================================================<br>

          <br>

          Appendix D. Documentation.<br>

          <br>

             - proposal for DWARF6 which suggested -1/-2 values for

          marking bad <br>

          addresses<br>

               <a

            href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1"

            rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>

             - dsymutil tool <a

            href="https://llvm.org/docs/CommandGuide/dsymutil.html"

            rel="noreferrer" target="_blank" moz-do-not-send="true">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>

             - proposal "Remove obsolete debug info in lld."<br>

          <a

            href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html"

            rel="noreferrer" target="_blank" moz-do-not-send="true">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>

          <br>

======================================================================<br>

          <br>

          Appendix E. Possible command line options:<br>

          <br>

          DwarfUtil Options:<br>

          <br>

             --build-aranges           - generate .debug_aranges table.<br>

             --build-debug-names       - generate .debug_names table.<br>

             --build-debug-pubnames    - generate .debug_pubnames table.<br>

             --build-debug-pubtypes    - generate .debug_pubtypes table.<br>

             --build-gdb-index         - generate .gdb_index table.<br>

             --compress                - Compress debug tables.<br>

             --decompress              - Decompress debug tables.<br>

             --deduplicate-types       - Do ODR deduplication for debug

          types.<br>

             --garbage-collect         - Do garbage collecting for debug

          info.<br>

             --num-threads=<n>         - Specify the maximum

          number (n) of <br>

          simultaneous threads<br>

                                         to use when optimizing input

          file.<br>

                                         Defaults to the number of cores

          on the <br>

          current machine.<br>

             --strip-all               - Strip all debug tables.<br>

             --strip=<name1,name2>     - Strip specified debug

          info tables.<br>

             --strip-unoptimized-debug - Strip all unoptimized debug

          tables.<br>

             --tombstone=<value>       - Tombstone value used as a

          marker of <br>

          invalid address.<br>

               =bfd                    -   BFD default value<br>

               =dwarf6                 -   Dwarf v6.<br>

             --verbose                 - Enable verbose logging and

          encoding details.<br>

          <br>

          Generic Options:<br>

          <br>

             --help                    - Display available options

          (--help-hidden <br>

          for more)<br>

             --version                 - Display the version of this

          program<br>

          <br>

          _______________________________________________<br>

          LLVM Developers mailing list<br>

          <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"

            moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>

          <a

            href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"

            rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

        </blockquote>

      </div>

    </blockquote>

  </body>

</html>