<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 01.09.2020 06:24, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAENS6EvY-pkGwnxacQbCx+G+Wcf6kEQe++Z92x9jeOZg_TBz=w@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>On Fri, Aug 28, 2020 at 2:24 PM James Y Knight <<a
            href="mailto:jyknight@google.com" moz-do-not-send="true">jyknight@google.com</a>>
          wrote:<br>
        </div>
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">If we're designing a new tool and process, it
              would be wonderful if it did not require multiple stages
              of copying and slightly modifying the binary, in order to
              create final output with separate debug info. It seems to
              me that the variants of this sort of thing which exist
              today are somewhat suboptimal.
              <div><br>
              </div>
              <div>With Mach-O and dsymutil:</div>
              <div>  1. Given a collection of object files (which
                contain debuginfo), link a binary with ld. The binary
                then includes special references to the object files
                that were actually used as part of the link.<br>
              </div>
              <div>  2. Given the linked binary, and all of the same
                object files, link the debuginfo with dsymutil.</div>
              <div>  3. Strip the references to the object file paths
                from the binary.</div>
              <div>  Finally, you have a binary without debug info, and
                a dsym debuginfo file. But it would be better if the
                binary created in step 1 didn't need to include the
                extraneous object-file path info, and that was instead
                emitted in a second file. Then we wouldn't need step 3.</div>
              <div><br>
              </div>
              <div>With "normal" ELF:</div>
              <div>  1. Given a collection of object files (which
                contain debuginfo), link a binary with ld, which
                includes linking all the debug info into the binary.<br>
              </div>
              <div>  2. Given the linked binary,
                objcopy --only-keep-debug to create a new separated
                debug file.</div>
              <div>  3. Given the linked binary, objcopy --strip-debug
                to create a copy of the binary without debug info.</div>
              <div>  Finally you have a binary without debug info, and a
                separate debug file. But it would be better if the
                linker could just write the debug info into a separate
                file in the first place, then we'd only have the one
                step. (But, downside, the linker needs to manage all the
                debug info, which can be excessively large.)</div>
              <div><br>
              </div>
              <div>With "split-dwarf" ELF support:</div>
              <div>  1. Given object files (which exclude <i>most</i> but
                not all of the debuginfo), link a binary. The binary
                will include that smaller set of debug info.<br>
              </div>
              <div>  2. Given the collection of dwo files corresponding
                to the object files, run the "dwp" tool to create a dwp
                file.</div>
              <div>  3. objcopy --only-keep-debug</div>
              <div>  4. --strip-debug</div>
              <div>  And then you need to keep both a debug file <i>and</i>
                a dwp file, which is weird.</div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div>I think, ideally, users would have the following
                three <i>good</i> options:</div>
              <div>  Easy option: store debuginfo in the object files,
                and have the linker create a pair of {binary, separated
                dwarf-optimized debuginfo} files directly from the
                object files.</div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>(as discussed by other replies - that was an early
            proposal, didn't gain a lot of traction/Eric & Ray
            weren't super convinced it was worth adding to lld at this
            stage, given the link time cost & thus the small
            expected user base)</div>
          <div> </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div>  More scalable option: emit (most of the) debuginfo
                in separate *.dwo files using -gsplit-dwarf, and then,</div>
              <div>    1. run the linker on the object files to create a
                pair of {binary, separated debuginfo} files. In this
                case the latter file contains the minimal debuginfo
                which was in the object files. </div>
            </div>
          </blockquote>
          <div><br>
            Yeah, that ^ is probably a nice feature regardless. Save
            folks an extra objcopy, etc. Usable right now for any build
            that is already running only-keep-debug/strip-debug.<br>
             </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div>    2. run a second tool, which reads the minimal
                debuginfo from above, and all the DWO files, and creates
                a full optimized/deduplicated debuginfo output file.</div>
            </div>
          </blockquote>
          <div><br>
            Fair - this then looks a lot like the MachO debug info
            distribution/linking model (with the advantage that the
            DWARF isn't in the .o files, so doesn't have to be shipped
            to the machine doing the linking), so far as I know.<br>
             </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div dir="ltr">
              <div>  Faster developer builds: Like previous, but omit
                step 2 -- running the debugger directly after step 1 can
                use the dwo files on-disk.</div>
              <div><br>
              </div>
              <div>I think we're not terribly far from that ideal, now,
                for ELF. Maybe only these three things need to be done?
                --</div>
              <div>  1. Teach lld how to emit a separated debuginfo
                output file directly, without requiring an objcopy step.</div>
              <div>  2. Integrate DWARFLinker into lld.</div>
              <div>  3. Create a new tool which takes the separated
                debuginfo and DWO/DWP files and uses DWARFLinker library
                to create a new (dwarf-linked) separated-debug file,
                that doesn't depend on DWO/DWP files.</div>
              <div><br>
              </div>
              <div>My hope is that the tool you're creating will be the
                implementation of #3, but I'm afraid the intent is
                for this tool to be an additional stage that
                non-split-dwarf users would need to run post-link, <i>instead
                  of</i> integrating DWARFLinker into lld.</div>
            </div>
          </blockquote>
          <div><br>
            Yeah, that's the direction lld folks have pushed for - a
            post-processing, rather than link-time. Mostly due to the
            current performance of DWARF-aware linking being quite slow,
            so the idea that not many users would be willing to take
            that link-time performance hit to use the feature. (whereas
            as a post-processing step before archiving DWARF (like
            building a dwp from dwo files) it might be more
            appealing/interesting - and maybe with sufficient
            performance improvements, could then be rolled into lld as
            originally proposed)<br>
            <br>
            Curiously Alexey's needs include not wanting to use fission
            because a single debuggable binary simplifies his users
            use-case/makes it easier to distribute than two files. So
            he's probably not interested in the
            strip-debug/only-keep-debug kind of debug info distribution
            model, at least for his own users/use case. So far as I
            understand it.<br>
            <br>
            I've got mixed feelings about that - and encourage you to
            express/clarify/discuss your thoughts here, as I think the
            whole conversation could use some more voices.<br>
          </div>
        </div>
      </div>
    </blockquote>
    Not that we do not interested in strip-debug/only-keep-debug kind of
    debug info distribution model.<br>
    But our customers also found the model, when optimized debug info is
    already put into the binary, useful.<br>
    It is a bit more convenient to pass a single binary to someone other
    to debug. Another thing is that it is a bit more convenient to
    manage/keep a single binary with debug info for daily builds to be
    able to quickly evaluate possible problems. Using a stripped debug
    info file assumes some process to work with it(how it is stored/how
    is distributed). Such a process makes sense when binaries shared
    with customers. But when debug builds are shared inside an
    organization it might be more convenient to share just a single
    file.<br>
    <br>
    Thus, it would be convenient if tools would support both scenarios.
    <blockquote type="cite"
cite="mid:CAENS6EvY-pkGwnxacQbCx+G+Wcf6kEQe++Z92x9jeOZg_TBz=w@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
            - Dave<br>
             </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex"><br>
            <div class="gmail_quote">
              <div dir="ltr" class="gmail_attr">On Tue, Aug 25, 2020 at
                10:29 AM Alexey via llvm-dev <<a
                  href="mailto:llvm-dev@lists.llvm.org" target="_blank"
                  moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
                wrote:<br>
              </div>
              <blockquote class="gmail_quote" style="margin:0px 0px 0px
                0.8ex;border-left:1px solid
                rgb(204,204,204);padding-left:1ex">Hi,<br>
                <br>
                   We propose llvm-dwarfutil - a dsymutil-like tool for
                ELF.<br>
                   Any thoughts on this?<br>
                   Thanks in advance, Alexey.<br>
                <br>
======================================================================<br>
                <br>
                llvm-dwarfutil(Apndx A) - is a tool that is used for
                processing debug <br>
                info(DWARF)<br>
                located in built binary files to improve debug info
                quality,<br>
                reduce debug info size and accelerate debug info
                processing.<br>
                Supported object files formats: ELF, MachO(Apndx B),
                COFF(Apndx C), <br>
                WASM(Apndx C).<br>
                <br>
======================================================================<br>
                <br>
                Specifically, the tool would do:<br>
                <br>
                   - Remove obsolete debug info which refers to code
                deleted by the linker<br>
                     doing the garbage collection (gc-sections).<br>
                <br>
                   - Deduplicate debug type definitions for reducing
                resulting size of <br>
                binary.<br>
                <br>
                   - Build accelerator/index tables.<br>
                     = .debug_aranges, .debug_names, .gdb_index,
                .debug_pubnames, <br>
                .debug_pubtypes.<br>
                <br>
                   - Strip unneeded tables.<br>
                     = .debug_aranges, .debug_names, .gdb_index,
                .debug_pubnames, <br>
                .debug_pubtypes.<br>
                <br>
                   - Compress or decompress debug info as requested.<br>
                <br>
                Possible feature:<br>
                <br>
                   - Join split dwarf .dwo files in a single file
                containing all debug info<br>
                     (convert split DWARF into monolithic DWARF).<br>
                <br>
======================================================================<br>
                <br>
                User interface:<br>
                <br>
                   OVERVIEW: A tool for optimizing debug info located in
                the built binary.<br>
                <br>
                   USAGE: llvm-dwarfutil [options] input output<br>
                <br>
                   OPTIONS: (Apndx E)<br>
                <br>
======================================================================<br>
                <br>
                Implementation notes:<br>
                <br>
                1. Removing obsolete debug info would be done using
                DWARFLinker llvm <br>
                library.<br>
                <br>
                2. Data types deduplication would be done using
                DWARFLinker llvm library.<br>
                <br>
                3. Accelerator/index tables would be generated using
                DWARFLinker llvm <br>
                library.<br>
                <br>
                4. Interface of DWARFLinker library would be changed in
                such way that it<br>
                    would be possible to switch on/off various stages:<br>
                <br>
                   class DWARFLinker {<br>
                     setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo
                = false);<br>
                <br>
                     setDoAppleNames ( bool DoAppleNames = false );<br>
                     setDoAppleNamespaces ( bool DoAppleNamespaces =
                false );<br>
                     setDoAppleTypes ( bool DoAppleTypes = false );<br>
                     setDoObjC ( bool DoObjC = false );<br>
                     setDoDebugPubNames ( bool DoDebugPubNames = false
                );<br>
                     setDoDebugPubTypes ( bool DoDebugPubTypes = false
                );<br>
                <br>
                     setDoDebugNames (bool DoDebugNames = false);<br>
                     setDoGDBIndex (bool DoGDBIndex = false);<br>
                   }<br>
                <br>
                5. Copying source file contents, stripping tables, <br>
                compressing/decompressing tables<br>
                    would be done by ObjCopy llvm library(extracted from
                llvm-objcopy):<br>
                <br>
                   Error executeObjcopyOnBinary(const CopyConfig
                &Config,<br>
                                              object::COFFObjectFile
                &In, Buffer &Out);<br>
                   Error executeObjcopyOnBinary(const CopyConfig
                &Config,<br>
                                              object::ELFObjectFileBase
                &In, Buffer &Out);<br>
                   Error executeObjcopyOnBinary(const CopyConfig
                &Config,<br>
                                              object::MachOObjectFile
                &In, Buffer &Out);<br>
                   Error executeObjcopyOnBinary(const CopyConfig
                &Config,<br>
                                              object::WasmObjectFile
                &In, Buffer &Out);<br>
                <br>
                6. Address ranges and single addresses pointing to
                removed code should <br>
                be marked<br>
                    with tombstone value in the input file:<br>
                <br>
                    -2 for .debug_ranges and .debug_loc.<br>
                    -1 for other .debug* tables.<br>
                <br>
                7. Prototype implementation - <a
                  href="https://reviews.llvm.org/D86539"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">https://reviews.llvm.org/D86539</a>.<br>
                <br>
======================================================================<br>
                <br>
                Roadmap:<br>
                <br>
                1. Refactor llvm-objcopy to extract it`s implementation
                into separate <br>
                library<br>
                    ObjCopy(in LLVM tree).<br>
                <br>
                2. Create a command line utility using existed
                DWARFLinker and ObjCopy<br>
                    implementation. First version is supposed to work
                with only ELF <br>
                input object files.<br>
                    It would take input ELF file with unoptimized debug
                info and create <br>
                output<br>
                    ELF file with optimized debug info. That version
                would be done out <br>
                of the llvm tree.<br>
                <br>
                3. Make a tool to be able to work in multi-thread mode.<br>
                <br>
                4. Consider it to be included into LLVM tree.<br>
                <br>
                5. Support DWARF5 tables.<br>
                <br>
======================================================================<br>
                <br>
                Appendix A. Should this tool be implemented as a new
                tool or as an extension<br>
                             to dsymutil/llvm-objcopy?<br>
                <br>
                    There already exists a tool which removes obsolete
                debug info on <br>
                darwin - dsymutil.<br>
                    Why create another tool instead of extending the
                already existed <br>
                dsymutil/llvm-objcopy?<br>
                <br>
                    The main functionality of dsymutil is located in a
                separate library <br>
                - DWARFLinker.<br>
                    Thus, dsymutil utility is a command-line interface
                for DWARFLinker. <br>
                dsymutil has<br>
                    another type of input/output data: it takes several
                object files and <br>
                address map<br>
                    as input and creates a .dSYM bundle with linked
                debug info as <br>
                output. llvm-dwarfutil<br>
                    would take a built executable as input and create an
                optimized <br>
                executable as output.<br>
                    Additionally, there would be many command-line
                options specific for <br>
                only one utility.<br>
                    This means that these utilities(implementing command
                line interface) <br>
                would significantly<br>
                    differ. It makes sense not to put another
                command-line utility <br>
                inside existing dsymutil,<br>
                    but make it as a separate utility. That is the
                reason why <br>
                llvm-dwarfutil suggested to be<br>
                    implemented not as sub-part of dsymutil but as a
                separate tool.<br>
                <br>
                    Please share your preference: whether llvm-dwarfutil
                should be<br>
                    separate utility, or a variant of dsymutil compiled
                for ELF?<br>
                <br>
======================================================================<br>
                <br>
                Appendix B. The machO object file format is already
                supported by dsymutil.<br>
                    Depending on the decision whether llvm-dwarfutil
                would be done as a <br>
                subproject<br>
                    of dsymutil or as a separate utility - machO would
                be supported or not.<br>
                <br>
======================================================================<br>
                <br>
                Appendix C. Support for the COFF and WASM object file
                formats presented as<br>
                     possible future improvement. It would be quite easy
                to add them <br>
                assuming<br>
                     that llvm-objcopy already supports these formats.
                It also would require<br>
                     supporting DWARF6-suggested tombstone
                values(-1/-2).<br>
                <br>
======================================================================<br>
                <br>
                Appendix D. Documentation.<br>
                <br>
                   - proposal for DWARF6 which suggested -1/-2 values
                for marking bad <br>
                addresses<br>
                     <a
                  href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>
                   - dsymutil tool <a
                  href="https://llvm.org/docs/CommandGuide/dsymutil.html"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>
                   - proposal "Remove obsolete debug info in lld."<br>
                <a
                  href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>
                <br>
======================================================================<br>
                <br>
                Appendix E. Possible command line options:<br>
                <br>
                DwarfUtil Options:<br>
                <br>
                   --build-aranges           - generate .debug_aranges
                table.<br>
                   --build-debug-names       - generate .debug_names
                table.<br>
                   --build-debug-pubnames    - generate .debug_pubnames
                table.<br>
                   --build-debug-pubtypes    - generate .debug_pubtypes
                table.<br>
                   --build-gdb-index         - generate .gdb_index
                table.<br>
                   --compress                - Compress debug tables.<br>
                   --decompress              - Decompress debug tables.<br>
                   --deduplicate-types       - Do ODR deduplication for
                debug types.<br>
                   --garbage-collect         - Do garbage collecting for
                debug info.<br>
                   --num-threads=<n>         - Specify the maximum
                number (n) of <br>
                simultaneous threads<br>
                                               to use when optimizing
                input file.<br>
                                               Defaults to the number of
                cores on the <br>
                current machine.<br>
                   --strip-all               - Strip all debug tables.<br>
                   --strip=<name1,name2>     - Strip specified
                debug info tables.<br>
                   --strip-unoptimized-debug - Strip all unoptimized
                debug tables.<br>
                   --tombstone=<value>       - Tombstone value
                used as a marker of <br>
                invalid address.<br>
                     =bfd                    -   BFD default value<br>
                     =dwarf6                 -   Dwarf v6.<br>
                   --verbose                 - Enable verbose logging
                and encoding details.<br>
                <br>
                Generic Options:<br>
                <br>
                   --help                    - Display available options
                (--help-hidden <br>
                for more)<br>
                   --version                 - Display the version of
                this program<br>
                <br>
                _______________________________________________<br>
                LLVM Developers mailing list<br>
                <a href="mailto:llvm-dev@lists.llvm.org" target="_blank"
                  moz-do-not-send="true">llvm-dev@lists.llvm.org</a><br>
                <a
                  href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev"
                  rel="noreferrer" target="_blank"
                  moz-do-not-send="true">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>