<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 2, 2020 at 3:26 PM Alexey <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
  
    
  
  <div>
    <p><br>
    </p>
    <div>On 02.09.2020 21:44, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Wed, Sep 2, 2020 at 9:56
            AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p><br>
              </p>
              <div>On 01.09.2020 20:07, David Blaikie wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">Fair enough - thanks for clarifying the
                  differences! (I'd still lean a bit towards this being
                  dwz-esque, as you say "an extension of classic dwz"</div>
              </blockquote>
              I doubt a little about "llvm-dwz" since it might confuse
              people who would expect exactly the same behavior.<br>
              But if we think of it as "an extension of classic dwz" and
              the possible confusion is not a big deal then<br>
              I would be fine with "llvm-dwz".<br>
              <blockquote type="cite">
                <div dir="ltr"> using a bit more domain knowledge (of
                  terminators and C++ odr - though I'm not sure dsymutil
                  does rely on the ODR, does it? It relies on it to know
                  that two names represent the same type, I suppose, but
                  doesn't assume they're already identical, instead it
                  merges their members))<br>
                </div>
              </blockquote>
              <p>if dsymutil is able to find a full definition then it
                would remove all other definitions(which matched by
                name) and set all references to that found definition.
                If it is not able to find a full definition then it
                would do nothing. i.e. if there are two incomplete
                definitions(DW_AT_declaration   (true)) with the same
                name then they would not be merged. That is a possible
                improvement - to teach dsymutil to merge incomplete
                types.<br>
              </p>
            </div>
          </blockquote>
          <div>Huh, what does it do with extra member function
            definitions found in later definitions? (eg: struct x {
            template<typename T> void f(); }; - in one translation
            unit x::f<int> is instantiated, in another
            x::f<float> is instantiated - how are the two
            represented with dsymutil?) <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>They would be considered as two not matched types. dsymutil would
      not merge them somehow and thus would not use single type
      description. There would be two separate types called "x" which
      would have mostly matched members but differ with x::f<int>
      and x::f<float>. No any de-duplication in that case.</p></div></blockquote><div>Oh, that's unfortunate. It'd be nice for C++ at least, to implement a potentially faster dsymutil mode that could get this right and not have to actually check for type equivalence, instead relying on the name of the type to determine that it must be identical.<br><br>The first instance of the type that's encountered has its fully qualified name or mangled name recorded in a map pointing to the DIE. Any future instance gets downgraded to a declaration, and /certain/ members get dropped, but other members get stuck on the declaration (same sort of DWARF you see with "struct foo { virtual void f1(); template<typename T> void f2() { } }; void test(foo& f) { f.f2<int>(); }"). Recording all the member functions of the type/static member variable types might be needed in cases where some member functions are defined in one translation unit and some defined in another - though I guess that infrastructure is already in place/that just works today.<br><br>- Dave</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p> </p>
    <p><br>
    </p>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>Alexey.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr"><br>
                  But I don't have super strong feelings about the
                  naming.</div>
                <br>
                <div class="gmail_quote">
                  <div dir="ltr" class="gmail_attr">On Tue, Sep 1, 2020
                    at 6:36 AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                    wrote:<br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
                    <div>
                      <p><br>
                      </p>
                      <div>On 01.09.2020 06:27, David Blaikie wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">A quick note: The feature as
                          currently proposed sounds like it's an exact
                          match for 'dwz'? Is there any benefit to this
                          over the existing dwz project? Is it different
                          in some ways I'm not aware of? (I haven't
                          actually used dwz, so I might have some
                          mistaken ideas about how it should work)<br>
                          <br>
                          If it's going to solve the same general
                          problem, but be in the llvm project instead,
                          then maybe it should be called llvm-dwz.<br>
                        </div>
                      </blockquote>
                      It looks like dwz and llvm-dwarfutil are not
                      exactly matched in functionality. <br>
                      <br>
                      dwz is a  program that attempts to optimize DWARF
                      debugging information <br>
                      contained in ELF shared libraries and ELF
                      executables for *size*.<br>
                      <br>
                      llvm-dwarfutil is a tool that is used for
                      processing debug<br>
                      info(DWARF) located in built binary files to
                      improve debug info *quality*,<br>
                      reduce debug info *size* and accelerate debug info
                      *processing*.<br>
                      <br>
                      Things which are supposed to be done by
                      llvm-dwarfutil and which are not <br>
                      done by dwz: removing obsolete debug info,
                      building indexes, stripping <br>
                      unneeded debug sections, compress/decompress debug
                      sections.<br>
                      <br>
                      Common thing is that both of these tools do debug
                      info size reduction. <br>
                      But they do this using different approaches:<br>
                      <br>
                      1. dwz reduces the size of debug info by creating
                      partial compilation units <br>
                          for duplicated parts. So that these partial
                      compilation units could be imported <br>
                          in every duplicated place. AFAIU, That
                      optimization gives the most size saving effect.<br>
                      <br>
                         another size saving optimization is ODR types
                      deduplication.<br>
                      <br>
                      2. llvm-dwarfutil reduces the size of debug info
                      by ODR types deduplication <br>
                         which gives the most size saving effect in
                      llvm-dwarfutil case. <br>
                      <br>
                         another size saving optimization is removing
                      obsolete debug info.<br>
                         (which actually is not only about size but
                      about correctness also)<br>
                      <br>
                      So, it looks like these tools are not equal. If we
                      would consider that <br>
                      llvm-dwz is an extension of classic dwz then we
                      could probably<br>
                      name it as llvm-dwz.<br>
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr"><br>
                          Though I understand the desire for this to
                          grow other functionality, like DWARF-aware
                          dwp-ing. Might be better for this to busybox
                          and provide that functionality under llvm-dwp
                          instead, or more likely I Suspect, that the
                          existing llvm-dwp will be rewritten (probably
                          by me) to use more of lld's infrastructure to
                          be more efficient (it's current object
                          reading/writing logic is using LLVM's
                          libObject and MCStreamer, which is a bit
                          inefficient for a very content-unaware linking
                          process) and then maybe that could be taught
                          to use DwarfLinker as a library to optionally
                          do DWARF-aware linking depending on the users
                          time/space tradeoff desires. Still benefiting
                          from any improvements to the underlying
                          DwarfLinker library (at which point that would
                          be shared between llvm-dsymutil, llvm-dwz, and
                          llvm-dwp).</div>
                        <br>
                        <div class="gmail_quote">
                          <div dir="ltr" class="gmail_attr">On Tue, Aug
                            25, 2020 at 7:29 AM Alexey <<a href="mailto:avl.lapshin@gmail.com" target="_blank">avl.lapshin@gmail.com</a>>
                            wrote:<br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
                            <br>
                               We propose llvm-dwarfutil - a
                            dsymutil-like tool for ELF.<br>
                               Any thoughts on this?<br>
                               Thanks in advance, Alexey.<br>
                            <br>
======================================================================<br>
                            <br>
                            llvm-dwarfutil(Apndx A) - is a tool that is
                            used for processing debug <br>
                            info(DWARF)<br>
                            located in built binary files to improve
                            debug info quality,<br>
                            reduce debug info size and accelerate debug
                            info processing.<br>
                            Supported object files formats: ELF,
                            MachO(Apndx B), COFF(Apndx C), <br>
                            WASM(Apndx C).<br>
                            <br>
======================================================================<br>
                            <br>
                            Specifically, the tool would do:<br>
                            <br>
                               - Remove obsolete debug info which refers
                            to code deleted by the linker<br>
                                 doing the garbage collection
                            (gc-sections).<br>
                            <br>
                               - Deduplicate debug type definitions for
                            reducing resulting size of <br>
                            binary.<br>
                            <br>
                               - Build accelerator/index tables.<br>
                                 = .debug_aranges, .debug_names,
                            .gdb_index, .debug_pubnames, <br>
                            .debug_pubtypes.<br>
                            <br>
                               - Strip unneeded tables.<br>
                                 = .debug_aranges, .debug_names,
                            .gdb_index, .debug_pubnames, <br>
                            .debug_pubtypes.<br>
                            <br>
                               - Compress or decompress debug info as
                            requested.<br>
                            <br>
                            Possible feature:<br>
                            <br>
                               - Join split dwarf .dwo files in a single
                            file containing all debug info<br>
                                 (convert split DWARF into monolithic
                            DWARF).<br>
                            <br>
======================================================================<br>
                            <br>
                            User interface:<br>
                            <br>
                               OVERVIEW: A tool for optimizing debug
                            info located in the built binary.<br>
                            <br>
                               USAGE: llvm-dwarfutil [options] input
                            output<br>
                            <br>
                               OPTIONS: (Apndx E)<br>
                            <br>
======================================================================<br>
                            <br>
                            Implementation notes:<br>
                            <br>
                            1. Removing obsolete debug info would be
                            done using DWARFLinker llvm <br>
                            library.<br>
                            <br>
                            2. Data types deduplication would be done
                            using DWARFLinker llvm library.<br>
                            <br>
                            3. Accelerator/index tables would be
                            generated using DWARFLinker llvm <br>
                            library.<br>
                            <br>
                            4. Interface of DWARFLinker library would be
                            changed in such way that it<br>
                                would be possible to switch on/off
                            various stages:<br>
                            <br>
                               class DWARFLinker {<br>
                                 setDoRemoveObsoleteInfo ( bool
                            DoRemoveObsoleteInfo = false);<br>
                            <br>
                                 setDoAppleNames ( bool DoAppleNames =
                            false );<br>
                                 setDoAppleNamespaces ( bool
                            DoAppleNamespaces = false );<br>
                                 setDoAppleTypes ( bool DoAppleTypes =
                            false );<br>
                                 setDoObjC ( bool DoObjC = false );<br>
                                 setDoDebugPubNames ( bool
                            DoDebugPubNames = false );<br>
                                 setDoDebugPubTypes ( bool
                            DoDebugPubTypes = false );<br>
                            <br>
                                 setDoDebugNames (bool DoDebugNames =
                            false);<br>
                                 setDoGDBIndex (bool DoGDBIndex =
                            false);<br>
                               }<br>
                            <br>
                            5. Copying source file contents, stripping
                            tables, <br>
                            compressing/decompressing tables<br>
                                would be done by ObjCopy llvm
                            library(extracted from llvm-objcopy):<br>
                            <br>
                               Error executeObjcopyOnBinary(const
                            CopyConfig &Config,<br>
                                                         
                            object::COFFObjectFile &In, Buffer
                            &Out);<br>
                               Error executeObjcopyOnBinary(const
                            CopyConfig &Config,<br>
                                                         
                            object::ELFObjectFileBase &In, Buffer
                            &Out);<br>
                               Error executeObjcopyOnBinary(const
                            CopyConfig &Config,<br>
                                                         
                            object::MachOObjectFile &In, Buffer
                            &Out);<br>
                               Error executeObjcopyOnBinary(const
                            CopyConfig &Config,<br>
                                                         
                            object::WasmObjectFile &In, Buffer
                            &Out);<br>
                            <br>
                            6. Address ranges and single addresses
                            pointing to removed code should <br>
                            be marked<br>
                                with tombstone value in the input file:<br>
                            <br>
                                -2 for .debug_ranges and .debug_loc.<br>
                                -1 for other .debug* tables.<br>
                            <br>
                            7. Prototype implementation - <a href="https://reviews.llvm.org/D86539" rel="noreferrer" target="_blank">https://reviews.llvm.org/D86539</a>.<br>
                            <br>
======================================================================<br>
                            <br>
                            Roadmap:<br>
                            <br>
                            1. Refactor llvm-objcopy to extract it`s
                            implementation into separate <br>
                            library<br>
                                ObjCopy(in LLVM tree).<br>
                            <br>
                            2. Create a command line utility using
                            existed DWARFLinker and ObjCopy<br>
                                implementation. First version is
                            supposed to work with only ELF <br>
                            input object files.<br>
                                It would take input ELF file with
                            unoptimized debug info and create <br>
                            output<br>
                                ELF file with optimized debug info. That
                            version would be done out <br>
                            of the llvm tree.<br>
                            <br>
                            3. Make a tool to be able to work in
                            multi-thread mode.<br>
                            <br>
                            4. Consider it to be included into LLVM
                            tree.<br>
                            <br>
                            5. Support DWARF5 tables.<br>
                            <br>
======================================================================<br>
                            <br>
                            Appendix A. Should this tool be implemented
                            as a new tool or as an extension<br>
                                         to dsymutil/llvm-objcopy?<br>
                            <br>
                                There already exists a tool which
                            removes obsolete debug info on <br>
                            darwin - dsymutil.<br>
                                Why create another tool instead of
                            extending the already existed <br>
                            dsymutil/llvm-objcopy?<br>
                            <br>
                                The main functionality of dsymutil is
                            located in a separate library <br>
                            - DWARFLinker.<br>
                                Thus, dsymutil utility is a command-line
                            interface for DWARFLinker. <br>
                            dsymutil has<br>
                                another type of input/output data: it
                            takes several object files and <br>
                            address map<br>
                                as input and creates a .dSYM bundle with
                            linked debug info as <br>
                            output. llvm-dwarfutil<br>
                                would take a built executable as input
                            and create an optimized <br>
                            executable as output.<br>
                                Additionally, there would be many
                            command-line options specific for <br>
                            only one utility.<br>
                                This means that these
                            utilities(implementing command line
                            interface) <br>
                            would significantly<br>
                                differ. It makes sense not to put
                            another command-line utility <br>
                            inside existing dsymutil,<br>
                                but make it as a separate utility. That
                            is the reason why <br>
                            llvm-dwarfutil suggested to be<br>
                                implemented not as sub-part of dsymutil
                            but as a separate tool.<br>
                            <br>
                                Please share your preference: whether
                            llvm-dwarfutil should be<br>
                                separate utility, or a variant of
                            dsymutil compiled for ELF?<br>
                            <br>
======================================================================<br>
                            <br>
                            Appendix B. The machO object file format is
                            already supported by dsymutil.<br>
                                Depending on the decision whether
                            llvm-dwarfutil would be done as a <br>
                            subproject<br>
                                of dsymutil or as a separate utility -
                            machO would be supported or not.<br>
                            <br>
======================================================================<br>
                            <br>
                            Appendix C. Support for the COFF and WASM
                            object file formats presented as<br>
                                 possible future improvement. It would
                            be quite easy to add them <br>
                            assuming<br>
                                 that llvm-objcopy already supports
                            these formats. It also would require<br>
                                 supporting DWARF6-suggested tombstone
                            values(-1/-2).<br>
                            <br>
======================================================================<br>
                            <br>
                            Appendix D. Documentation.<br>
                            <br>
                               - proposal for DWARF6 which suggested
                            -1/-2 values for marking bad <br>
                            addresses<br>
                                 <a href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1" rel="noreferrer" target="_blank">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>
                               - dsymutil tool <a href="https://llvm.org/docs/CommandGuide/dsymutil.html" rel="noreferrer" target="_blank">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>
                               - proposal "Remove obsolete debug info in
                            lld."<br>
                            <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html" rel="noreferrer" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>
                            <br>
======================================================================<br>
                            <br>
                            Appendix E. Possible command line options:<br>
                            <br>
                            DwarfUtil Options:<br>
                            <br>
                               --build-aranges           - generate
                            .debug_aranges table.<br>
                               --build-debug-names       - generate
                            .debug_names table.<br>
                               --build-debug-pubnames    - generate
                            .debug_pubnames table.<br>
                               --build-debug-pubtypes    - generate
                            .debug_pubtypes table.<br>
                               --build-gdb-index         - generate
                            .gdb_index table.<br>
                               --compress                - Compress
                            debug tables.<br>
                               --decompress              - Decompress
                            debug tables.<br>
                               --deduplicate-types       - Do ODR
                            deduplication for debug types.<br>
                               --garbage-collect         - Do garbage
                            collecting for debug info.<br>
                               --num-threads=<n>         - Specify
                            the maximum number (n) of <br>
                            simultaneous threads<br>
                                                           to use when
                            optimizing input file.<br>
                                                           Defaults to
                            the number of cores on the <br>
                            current machine.<br>
                               --strip-all               - Strip all
                            debug tables.<br>
                               --strip=<name1,name2>     - Strip
                            specified debug info tables.<br>
                               --strip-unoptimized-debug - Strip all
                            unoptimized debug tables.<br>
                               --tombstone=<value>       -
                            Tombstone value used as a marker of <br>
                            invalid address.<br>
                                 =bfd                    -   BFD default
                            value<br>
                                 =dwarf6                 -   Dwarf v6.<br>
                               --verbose                 - Enable
                            verbose logging and encoding details.<br>
                            <br>
                            Generic Options:<br>
                            <br>
                               --help                    - Display
                            available options (--help-hidden <br>
                            for more)<br>
                               --version                 - Display the
                            version of this program<br>
                            <br>
                          </blockquote>
                        </div>
                      </blockquote>
                    </div>
                  </blockquote>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </div>

</blockquote></div></div>