<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 2, 2020 at 9:56 AM Alexey <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 01.09.2020 20:07, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">Fair enough - thanks for clarifying the
        differences! (I'd still lean a bit towards this being dwz-esque,
        as you say "an extension of classic dwz"</div>
    </blockquote>
    I doubt a little about "llvm-dwz" since it might confuse people who
    would expect exactly the same behavior.<br>
    But if we think of it as "an extension of classic dwz" and the
    possible confusion is not a big deal then<br>
    I would be fine with "llvm-dwz".<br>
    <blockquote type="cite">
      <div dir="ltr"> using a bit more domain knowledge (of terminators
        and C++ odr - though I'm not sure dsymutil does rely on the ODR,
        does it? It relies on it to know that two names represent the
        same type, I suppose, but doesn't assume they're already
        identical, instead it merges their members))<br>
      </div>
    </blockquote>
    <p>if dsymutil is able to find a full definition then it would
      remove all other definitions(which matched by name) and set all
      references to that found definition. If it is not able to find a
      full definition then it would do nothing. i.e. if there are two
      incomplete definitions(DW_AT_declaration   (true)) with the same
      name then they would not be merged. That is a possible improvement
      - to teach dsymutil to merge incomplete types.<br></p></div></blockquote><div>Huh, what does it do with extra member function definitions found in later definitions? (eg: struct x { template<typename T> void f(); }; - in one translation unit x::f<int> is instantiated, in another x::f<float> is instantiated - how are the two represented with dsymutil?) </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p>Alexey.<br>
    </p>
    <blockquote type="cite">
      <div dir="ltr"><br>
        But I don't have super strong feelings about the naming.</div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Tue, Sep 1, 2020 at 6:36 AM
          Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div>
            <p><br>
            </p>
            <div>On 01.09.2020 06:27, David Blaikie wrote:<br>
            </div>
            <blockquote type="cite">
              <div dir="ltr">A quick note: The feature as currently
                proposed sounds like it's an exact match for 'dwz'? Is
                there any benefit to this over the existing dwz project?
                Is it different in some ways I'm not aware of? (I
                haven't actually used dwz, so I might have some mistaken
                ideas about how it should work)<br>
                <br>
                If it's going to solve the same general problem, but be
                in the llvm project instead, then maybe it should be
                called llvm-dwz.<br>
              </div>
            </blockquote>
            It looks like dwz and llvm-dwarfutil are not exactly matched
            in functionality. <br>
            <br>
            dwz is a  program that attempts to optimize DWARF debugging
            information <br>
            contained in ELF shared libraries and ELF executables for
            *size*.<br>
            <br>
            llvm-dwarfutil is a tool that is used for processing debug<br>
            info(DWARF) located in built binary files to improve debug
            info *quality*,<br>
            reduce debug info *size* and accelerate debug info
            *processing*.<br>
            <br>
            Things which are supposed to be done by llvm-dwarfutil and
            which are not <br>
            done by dwz: removing obsolete debug info, building indexes,
            stripping <br>
            unneeded debug sections, compress/decompress debug sections.<br>
            <br>
            Common thing is that both of these tools do debug info size
            reduction. <br>
            But they do this using different approaches:<br>
            <br>
            1. dwz reduces the size of debug info by creating partial
            compilation units <br>
                for duplicated parts. So that these partial compilation
            units could be imported <br>
                in every duplicated place. AFAIU, That optimization
            gives the most size saving effect.<br>
            <br>
               another size saving optimization is ODR types
            deduplication.<br>
            <br>
            2. llvm-dwarfutil reduces the size of debug info by ODR
            types deduplication <br>
               which gives the most size saving effect in llvm-dwarfutil
            case. <br>
            <br>
               another size saving optimization is removing obsolete
            debug info.<br>
               (which actually is not only about size but about
            correctness also)<br>
            <br>
            So, it looks like these tools are not equal. If we would
            consider that <br>
            llvm-dwz is an extension of classic dwz then we could
            probably<br>
            name it as llvm-dwz.<br>
            <br>
            <blockquote type="cite">
              <div dir="ltr"><br>
                Though I understand the desire for this to grow other
                functionality, like DWARF-aware dwp-ing. Might be better
                for this to busybox and provide that functionality under
                llvm-dwp instead, or more likely I Suspect, that the
                existing llvm-dwp will be rewritten (probably by me) to
                use more of lld's infrastructure to be more efficient
                (it's current object reading/writing logic is using
                LLVM's libObject and MCStreamer, which is a bit
                inefficient for a very content-unaware linking process)
                and then maybe that could be taught to use DwarfLinker
                as a library to optionally do DWARF-aware linking
                depending on the users time/space tradeoff desires.
                Still benefiting from any improvements to the underlying
                DwarfLinker library (at which point that would be shared
                between llvm-dsymutil, llvm-dwz, and llvm-dwp).</div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Tue, Aug 25, 2020
                  at 7:29 AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
                  <br>
                     We propose llvm-dwarfutil - a dsymutil-like tool
                  for ELF.<br>
                     Any thoughts on this?<br>
                     Thanks in advance, Alexey.<br>
                  <br>
======================================================================<br>
                  <br>
                  llvm-dwarfutil(Apndx A) - is a tool that is used for
                  processing debug <br>
                  info(DWARF)<br>
                  located in built binary files to improve debug info
                  quality,<br>
                  reduce debug info size and accelerate debug info
                  processing.<br>
                  Supported object files formats: ELF, MachO(Apndx B),
                  COFF(Apndx C), <br>
                  WASM(Apndx C).<br>
                  <br>
======================================================================<br>
                  <br>
                  Specifically, the tool would do:<br>
                  <br>
                     - Remove obsolete debug info which refers to code
                  deleted by the linker<br>
                       doing the garbage collection (gc-sections).<br>
                  <br>
                     - Deduplicate debug type definitions for reducing
                  resulting size of <br>
                  binary.<br>
                  <br>
                     - Build accelerator/index tables.<br>
                       = .debug_aranges, .debug_names, .gdb_index,
                  .debug_pubnames, <br>
                  .debug_pubtypes.<br>
                  <br>
                     - Strip unneeded tables.<br>
                       = .debug_aranges, .debug_names, .gdb_index,
                  .debug_pubnames, <br>
                  .debug_pubtypes.<br>
                  <br>
                     - Compress or decompress debug info as requested.<br>
                  <br>
                  Possible feature:<br>
                  <br>
                     - Join split dwarf .dwo files in a single file
                  containing all debug info<br>
                       (convert split DWARF into monolithic DWARF).<br>
                  <br>
======================================================================<br>
                  <br>
                  User interface:<br>
                  <br>
                     OVERVIEW: A tool for optimizing debug info located
                  in the built binary.<br>
                  <br>
                     USAGE: llvm-dwarfutil [options] input output<br>
                  <br>
                     OPTIONS: (Apndx E)<br>
                  <br>
======================================================================<br>
                  <br>
                  Implementation notes:<br>
                  <br>
                  1. Removing obsolete debug info would be done using
                  DWARFLinker llvm <br>
                  library.<br>
                  <br>
                  2. Data types deduplication would be done using
                  DWARFLinker llvm library.<br>
                  <br>
                  3. Accelerator/index tables would be generated using
                  DWARFLinker llvm <br>
                  library.<br>
                  <br>
                  4. Interface of DWARFLinker library would be changed
                  in such way that it<br>
                      would be possible to switch on/off various stages:<br>
                  <br>
                     class DWARFLinker {<br>
                       setDoRemoveObsoleteInfo ( bool
                  DoRemoveObsoleteInfo = false);<br>
                  <br>
                       setDoAppleNames ( bool DoAppleNames = false );<br>
                       setDoAppleNamespaces ( bool DoAppleNamespaces =
                  false );<br>
                       setDoAppleTypes ( bool DoAppleTypes = false );<br>
                       setDoObjC ( bool DoObjC = false );<br>
                       setDoDebugPubNames ( bool DoDebugPubNames = false
                  );<br>
                       setDoDebugPubTypes ( bool DoDebugPubTypes = false
                  );<br>
                  <br>
                       setDoDebugNames (bool DoDebugNames = false);<br>
                       setDoGDBIndex (bool DoGDBIndex = false);<br>
                     }<br>
                  <br>
                  5. Copying source file contents, stripping tables, <br>
                  compressing/decompressing tables<br>
                      would be done by ObjCopy llvm library(extracted
                  from llvm-objcopy):<br>
                  <br>
                     Error executeObjcopyOnBinary(const CopyConfig
                  &Config,<br>
                                                object::COFFObjectFile
                  &In, Buffer &Out);<br>
                     Error executeObjcopyOnBinary(const CopyConfig
                  &Config,<br>
                                               
                  object::ELFObjectFileBase &In, Buffer &Out);<br>
                     Error executeObjcopyOnBinary(const CopyConfig
                  &Config,<br>
                                                object::MachOObjectFile
                  &In, Buffer &Out);<br>
                     Error executeObjcopyOnBinary(const CopyConfig
                  &Config,<br>
                                                object::WasmObjectFile
                  &In, Buffer &Out);<br>
                  <br>
                  6. Address ranges and single addresses pointing to
                  removed code should <br>
                  be marked<br>
                      with tombstone value in the input file:<br>
                  <br>
                      -2 for .debug_ranges and .debug_loc.<br>
                      -1 for other .debug* tables.<br>
                  <br>
                  7. Prototype implementation - <a href="https://reviews.llvm.org/D86539" rel="noreferrer" target="_blank">https://reviews.llvm.org/D86539</a>.<br>
                  <br>
======================================================================<br>
                  <br>
                  Roadmap:<br>
                  <br>
                  1. Refactor llvm-objcopy to extract it`s
                  implementation into separate <br>
                  library<br>
                      ObjCopy(in LLVM tree).<br>
                  <br>
                  2. Create a command line utility using existed
                  DWARFLinker and ObjCopy<br>
                      implementation. First version is supposed to work
                  with only ELF <br>
                  input object files.<br>
                      It would take input ELF file with unoptimized
                  debug info and create <br>
                  output<br>
                      ELF file with optimized debug info. That version
                  would be done out <br>
                  of the llvm tree.<br>
                  <br>
                  3. Make a tool to be able to work in multi-thread
                  mode.<br>
                  <br>
                  4. Consider it to be included into LLVM tree.<br>
                  <br>
                  5. Support DWARF5 tables.<br>
                  <br>
======================================================================<br>
                  <br>
                  Appendix A. Should this tool be implemented as a new
                  tool or as an extension<br>
                               to dsymutil/llvm-objcopy?<br>
                  <br>
                      There already exists a tool which removes obsolete
                  debug info on <br>
                  darwin - dsymutil.<br>
                      Why create another tool instead of extending the
                  already existed <br>
                  dsymutil/llvm-objcopy?<br>
                  <br>
                      The main functionality of dsymutil is located in a
                  separate library <br>
                  - DWARFLinker.<br>
                      Thus, dsymutil utility is a command-line interface
                  for DWARFLinker. <br>
                  dsymutil has<br>
                      another type of input/output data: it takes
                  several object files and <br>
                  address map<br>
                      as input and creates a .dSYM bundle with linked
                  debug info as <br>
                  output. llvm-dwarfutil<br>
                      would take a built executable as input and create
                  an optimized <br>
                  executable as output.<br>
                      Additionally, there would be many command-line
                  options specific for <br>
                  only one utility.<br>
                      This means that these utilities(implementing
                  command line interface) <br>
                  would significantly<br>
                      differ. It makes sense not to put another
                  command-line utility <br>
                  inside existing dsymutil,<br>
                      but make it as a separate utility. That is the
                  reason why <br>
                  llvm-dwarfutil suggested to be<br>
                      implemented not as sub-part of dsymutil but as a
                  separate tool.<br>
                  <br>
                      Please share your preference: whether
                  llvm-dwarfutil should be<br>
                      separate utility, or a variant of dsymutil
                  compiled for ELF?<br>
                  <br>
======================================================================<br>
                  <br>
                  Appendix B. The machO object file format is already
                  supported by dsymutil.<br>
                      Depending on the decision whether llvm-dwarfutil
                  would be done as a <br>
                  subproject<br>
                      of dsymutil or as a separate utility - machO would
                  be supported or not.<br>
                  <br>
======================================================================<br>
                  <br>
                  Appendix C. Support for the COFF and WASM object file
                  formats presented as<br>
                       possible future improvement. It would be quite
                  easy to add them <br>
                  assuming<br>
                       that llvm-objcopy already supports these formats.
                  It also would require<br>
                       supporting DWARF6-suggested tombstone
                  values(-1/-2).<br>
                  <br>
======================================================================<br>
                  <br>
                  Appendix D. Documentation.<br>
                  <br>
                     - proposal for DWARF6 which suggested -1/-2 values
                  for marking bad <br>
                  addresses<br>
                       <a href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1" rel="noreferrer" target="_blank">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>
                     - dsymutil tool <a href="https://llvm.org/docs/CommandGuide/dsymutil.html" rel="noreferrer" target="_blank">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>
                     - proposal "Remove obsolete debug info in lld."<br>
                  <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>
                  <br>
======================================================================<br>
                  <br>
                  Appendix E. Possible command line options:<br>
                  <br>
                  DwarfUtil Options:<br>
                  <br>
                     --build-aranges           - generate .debug_aranges
                  table.<br>
                     --build-debug-names       - generate .debug_names
                  table.<br>
                     --build-debug-pubnames    - generate
                  .debug_pubnames table.<br>
                     --build-debug-pubtypes    - generate
                  .debug_pubtypes table.<br>
                     --build-gdb-index         - generate .gdb_index
                  table.<br>
                     --compress                - Compress debug tables.<br>
                     --decompress              - Decompress debug
                  tables.<br>
                     --deduplicate-types       - Do ODR deduplication
                  for debug types.<br>
                     --garbage-collect         - Do garbage collecting
                  for debug info.<br>
                     --num-threads=<n>         - Specify the
                  maximum number (n) of <br>
                  simultaneous threads<br>
                                                 to use when optimizing
                  input file.<br>
                                                 Defaults to the number
                  of cores on the <br>
                  current machine.<br>
                     --strip-all               - Strip all debug tables.<br>
                     --strip=<name1,name2>     - Strip specified
                  debug info tables.<br>
                     --strip-unoptimized-debug - Strip all unoptimized
                  debug tables.<br>
                     --tombstone=<value>       - Tombstone value
                  used as a marker of <br>
                  invalid address.<br>
                       =bfd                    -   BFD default value<br>
                       =dwarf6                 -   Dwarf v6.<br>
                     --verbose                 - Enable verbose logging
                  and encoding details.<br>
                  <br>
                  Generic Options:<br>
                  <br>
                     --help                    - Display available
                  options (--help-hidden <br>
                  for more)<br>
                     --version                 - Display the version of
                  this program<br>
                  <br>
                </blockquote>
              </div>
            </blockquote>
          </div>
        </blockquote>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>