<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Jonas, please find my comments below...<br>
</p>
<div class="moz-cite-prefix">On 27.08.2020 02:05, Jonas Devlieghere
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hey Alexey,</div>
<div><br>
</div>
<div>I haven't had time to look at the corresponding patch
yet, but I hope to do that soon. Here are my initial
thoughts on the proposal. </div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Aug 25, 2020 at
7:29 AM Alexey <<a
href="mailto:avl.lapshin@gmail.com"
moz-do-not-send="true">avl.lapshin@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
We propose llvm-dwarfutil - a dsymutil-like tool for
ELF.<br>
Any thoughts on this?<br>
Thanks in advance, Alexey.<br>
<br>
======================================================================<br>
<br>
llvm-dwarfutil(Apndx A) - is a tool that is used for
processing debug <br>
info(DWARF)<br>
located in built binary files to improve debug info
quality,<br>
reduce debug info size and accelerate debug info
processing.<br>
Supported object files formats: ELF, MachO(Apndx B),
COFF(Apndx C), <br>
WASM(Apndx C).<br>
<br>
======================================================================<br>
<br>
Specifically, the tool would do:<br>
<br>
- Remove obsolete debug info which refers to code
deleted by the linker<br>
doing the garbage collection (gc-sections).<br>
<br>
- Deduplicate debug type definitions for reducing
resulting size of <br>
binary.<br>
<br>
- Build accelerator/index tables.<br>
= .debug_aranges, .debug_names, .gdb_index,
.debug_pubnames, <br>
.debug_pubtypes.<br>
<br>
- Strip unneeded tables.<br>
= .debug_aranges, .debug_names, .gdb_index,
.debug_pubnames, <br>
.debug_pubtypes.<br>
<br>
- Compress or decompress debug info as requested.<br>
<br>
Possible feature:<br>
<br>
- Join split dwarf .dwo files in a single file
containing all debug info<br>
(convert split DWARF into monolithic DWARF).<br>
<br>
======================================================================<br>
<br>
User interface:<br>
<br>
OVERVIEW: A tool for optimizing debug info located in
the built binary.<br>
<br>
USAGE: llvm-dwarfutil [options] input output<br>
</blockquote>
<div><br>
</div>
<div>Nit: I would make the output a separate flag with
`-o` for consistency with other similar tools. <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>Ok.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
OPTIONS: (Apndx E)<br>
<br>
======================================================================<br>
<br>
Implementation notes:<br>
<br>
1. Removing obsolete debug info would be done using
DWARFLinker llvm <br>
library.<br>
<br>
2. Data types deduplication would be done using
DWARFLinker llvm library.<br>
<br>
3. Accelerator/index tables would be generated using
DWARFLinker llvm <br>
library.<br>
</blockquote>
<div><br>
</div>
<div>This sounds reasonable to me. I think there is value
in having all this in LLVM because LLD wants to use a
subset of this functionality. If it weren't for that I'd
probably prefer to have this isolated to just the tool. </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
4. Interface of DWARFLinker library would be changed in
such way that it<br>
would be possible to switch on/off various stages:<br>
<br>
class DWARFLinker {<br>
setDoRemoveObsoleteInfo ( bool DoRemoveObsoleteInfo
= false);<br>
<br>
setDoAppleNames ( bool DoAppleNames = false );<br>
setDoAppleNamespaces ( bool DoAppleNamespaces =
false );<br>
setDoAppleTypes ( bool DoAppleTypes = false );<br>
setDoObjC ( bool DoObjC = false );<br>
setDoDebugPubNames ( bool DoDebugPubNames = false
);<br>
setDoDebugPubTypes ( bool DoDebugPubTypes = false
);<br>
<br>
setDoDebugNames (bool DoDebugNames = false);<br>
setDoGDBIndex (bool DoGDBIndex = false);<br>
}<br>
</blockquote>
<div><br>
</div>
<div>We can discuss this in the patch, but in dsymutil we
pass LinkOption to the linker. I think that would work
great for enabling certain functionality. <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
Ok, Let`s discuss this in the patch. <br>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
5. Copying source file contents, stripping tables, <br>
compressing/decompressing tables<br>
would be done by ObjCopy llvm library(extracted from
llvm-objcopy):<br>
<br>
Error executeObjcopyOnBinary(const CopyConfig
&Config,<br>
object::COFFObjectFile
&In, Buffer &Out);<br>
Error executeObjcopyOnBinary(const CopyConfig
&Config,<br>
object::ELFObjectFileBase
&In, Buffer &Out);<br>
Error executeObjcopyOnBinary(const CopyConfig
&Config,<br>
object::MachOObjectFile
&In, Buffer &Out);<br>
Error executeObjcopyOnBinary(const CopyConfig
&Config,<br>
object::WasmObjectFile
&In, Buffer &Out);<br>
</blockquote>
<div><br>
</div>
<div>Just to make sure I understand this correctly. The
current method names suggest that you'd be running
objcopy as an external tool, but when implemented as a
library you'd call the code in-process, right?</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>Not exactly. I suggest to move them into the library first and
then call from dwarfutil code:</p>
<p>The example of such call is in the prototype :<br>
</p>
<p><span class="diff-banner-path">tools/llvm-dwarfutil/</span><span
class="diff-banner-file">llvm-dwarfutil.cpp:</span></p>
<p>template <class ELFT><br>
Error writeOutputFile(const Options &Options,
ELFObjectFile<ELFT> &InputFile,<br>
DataBits &OutBits) {<br>
........<br>
objectcopy::FileBuffer FB(Config.OutputFilename);<br>
return objectcopy::elf::executeObjcopyOnBinary(Config,
InputFile, FB);<br>
}<br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
6. Address ranges and single addresses pointing to
removed code should <br>
be marked<br>
with tombstone value in the input file:<br>
<br>
-2 for .debug_ranges and .debug_loc.<br>
-1 for other .debug* tables.<br>
<br>
7. Prototype implementation - <a
href="https://reviews.llvm.org/D86539"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://reviews.llvm.org/D86539</a>.<br>
<br>
======================================================================<br>
<br>
Roadmap:<br>
<br>
1. Refactor llvm-objcopy to extract it`s implementation
into separate <br>
library<br>
ObjCopy(in LLVM tree).<br>
</blockquote>
<div><br>
</div>
<div>What exactly needs to be copied? In dsymutil we
create a Mach-O companion file, which is really just a
regular Mach-O with only the debug info sections in it.
I think we do copy over a few segments, but we have to
rewrite the load commands and obviously the DWARF
sections. Which part of that would be handled by the
objcopy library. It seems like this could be a first,
standalone patch. Or do you only plan to use this for
the ELF parts?</div>
</div>
</div>
</div>
</div>
</blockquote>
objcopy could replace debug info sections. So the idea is to use
objcopy functionality to copy <br>
original file without modifications except replacing debug info
sections. i.e.<br>
specify new sections to objcopy config:<br>
<br>
CopyConfig.h<br>
StringMap<StringRef> NewDebugSections; <br>
<br>
add code to copy these sections <span class="diff-banner-path">to
ELF/</span><span class="diff-banner-file">ELFObjcopy.cpp</span>:<br>
<br>
for (const auto &Sec : Config.NewDebugSections) {<br>
ArrayRef<uint8_t> DataBits((const uint8_t
*)Sec.getValue().data(),<br>
Sec.getValue().size());<br>
Section NewSection(DataBits);<br>
<br>
if (Config.CompressionType != DebugCompressionType::None)<br>
Obj.addSection<CompressedSection>(NewSection,
Config.CompressionType);<br>
else<br>
Obj.addSection<Section>(NewSection);<br>
}<br>
<br>
Finally, it would be possible to call executeObjcopyOnBinary() <br>
and source file would be copied with replaced debug info sections:<br>
<br>
objectcopy::elf::executeObjcopyOnBinary(Config, InputFile, FB);<br>
<br>
Speaking of what should be moved from llvm-obcopy into ObjCopy
library.<br>
It is Buffer.h, CopyConfig.h and entire ELF, MachO, WASM, COFF
directories.<br>
It is done in the prototype(prototype copied only ELF part.)<br>
<br>
The external interface of that library would be described by :<br>
<br>
ELF/ELFObjcopy.h<br>
COFF/COFFObjcopy.h<br>
MachO/MachOObjcopy.h<br>
wasm/WasmObjcopy.h<br>
<p><br>
</p>
<p>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">2.
Create a command line utility using existed DWARFLinker
and ObjCopy<br>
implementation. First version is supposed to work
with only ELF <br>
input object files.<br>
It would take input ELF file with unoptimized debug
info and create <br>
output<br>
ELF file with optimized debug info. That version
would be done out <br>
of the llvm tree.<br>
</blockquote>
<div><br>
</div>
<div>I would prefer doing this incrementally in-tree. It
will make reviewing these patches much easier and
hopefully allow us to identify opportunities where we
can improve both the ELF and the Mach-O variant. <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>It is OK to me to start doing it in-tree.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
3. Make a tool to be able to work in multi-thread mode.<br>
</blockquote>
<div><br>
</div>
<div>I'm a bit confused by what you mean here. The current
DwarfLinker already does the analysis and cloning in
parallel. As I've mentioned in the original thread, when
I implemented this, there was no way to do better if you
want to deduplicate across compilation units which is
what gives the biggest size reduction. </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
4. Consider it to be included into LLVM tree.<br>
</blockquote>
<div><br>
</div>
<div>As I said before I'd rather see this developed
incrementally in-tree. </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
5. Support DWARF5 tables.<br>
</blockquote>
<div><br>
</div>
<div>I assume you mean the line tables (and not the
accelerator tables, i.e. debug names)? <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>debug_names is already done in dsymutil/DWARFLinker - so no need
to support this.<br>
</p>
<p>I mean debug_line/.debug_line_str, debug_rnglists,
debug_loclists, DW_OP_addrx.<br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
======================================================================<br>
<br>
Appendix A. Should this tool be implemented as a new
tool or as an extension<br>
to dsymutil/llvm-objcopy?<br>
<br>
There already exists a tool which removes obsolete
debug info on <br>
darwin - dsymutil.<br>
Why create another tool instead of extending the
already existed <br>
dsymutil/llvm-objcopy?<br>
<br>
The main functionality of dsymutil is located in a
separate library <br>
- DWARFLinker.<br>
Thus, dsymutil utility is a command-line interface
for DWARFLinker. <br>
dsymutil has<br>
another type of input/output data: it takes several
object files and <br>
address map<br>
as input and creates a .dSYM bundle with linked
debug info as <br>
output. llvm-dwarfutil<br>
would take a built executable as input and create an
optimized <br>
executable as output.<br>
Additionally, there would be many command-line
options specific for <br>
only one utility.<br>
This means that these utilities(implementing command
line interface) <br>
would significantly<br>
differ. It makes sense not to put another
command-line utility <br>
inside existing dsymutil,<br>
but make it as a separate utility. That is the
reason why <br>
llvm-dwarfutil suggested to be<br>
implemented not as sub-part of dsymutil but as a
separate tool.<br>
<br>
Please share your preference: whether llvm-dwarfutil
should be<br>
separate utility, or a variant of dsymutil compiled
for ELF?<br>
</blockquote>
<div><br>
</div>
<div>As the majority of the code has already been hoisted
to LLVM for use in LLD, I think two separate tools are
fine. I would prefer trying to share a common interface,
I'm thinking mostly of the command line options. I'm not
saying they should be a drop-in replacement for each
other, but I'd be nice if we didn't diverge on common
functionality. <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
agreed.<br>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
======================================================================<br>
<br>
Appendix B. The machO object file format is already
supported by dsymutil.<br>
Depending on the decision whether llvm-dwarfutil
would be done as a <br>
subproject<br>
of dsymutil or as a separate utility - machO would
be supported or not.<br>
</blockquote>
<div><br>
</div>
<div>I don't think there's any value in having the new
tool support Mach-O. Things that could be shared should
be hoisted into L</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<br>
======================================================================<br>
<br>
Appendix C. Support for the COFF and WASM object file
formats presented as<br>
possible future improvement. It would be quite easy
to add them <br>
assuming<br>
that llvm-objcopy already supports these formats.
It also would require<br>
supporting DWARF6-suggested tombstone
values(-1/-2).<br>
<br>
======================================================================<br>
<br>
Appendix D. Documentation.<br>
<br>
- proposal for DWARF6 which suggested -1/-2 values
for marking bad <br>
addresses<br>
<a
href="http://www.dwarfstd.org/ShowIssue.php?issue=200609.1"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://www.dwarfstd.org/ShowIssue.php?issue=200609.1</a><br>
- dsymutil tool <a
href="https://llvm.org/docs/CommandGuide/dsymutil.html"
rel="noreferrer" target="_blank"
moz-do-not-send="true">https://llvm.org/docs/CommandGuide/dsymutil.html</a>.<br>
- proposal "Remove obsolete debug info in lld."<br>
<a
href="http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://lists.llvm.org/pipermail/llvm-dev/2020-May/141468.html</a><br>
<br>
======================================================================<br>
<br>
Appendix E. Possible command line options:<br>
<br>
DwarfUtil Options:<br>
<br>
--build-aranges - generate .debug_aranges
table.<br>
--build-debug-names - generate .debug_names
table.<br>
--build-debug-pubnames - generate .debug_pubnames
table.<br>
--build-debug-pubtypes - generate .debug_pubtypes
table.<br>
--build-gdb-index - generate .gdb_index
table.<br>
--compress - Compress debug tables.<br>
--decompress - Decompress debug tables.<br>
--deduplicate-types - Do ODR deduplication for
debug types.<br>
--garbage-collect - Do garbage collecting for
debug info.<br>
</blockquote>
<div><br>
</div>
<div>This is of course up to you to decide, but as a
potential user I might be worried about making all the
functionality opt-in. For dsymutil you don't have pass
any options most of the time. Maybe it would be nice to
have a set of defaults and the ability to -fenable or
-fdisable them? Or having something like
-debugger-tuning in clang? <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>yes, the idea is to have defaults and be able to switch options
on/off. <br>
</p>
<p>For the updated prototype:</p>
<p>"llvm-dwarfutil bin/test_clang_in -o bin/test_clang_out" <br>
</p>
<p>assumes --garbage-collect, --strip-unoptimized-debug,
--tombstone=bfd.</p>
<p>additionally these options could be explicitly switched on/off:<br>
</p>
<p>"llvm-dwarfutil --strip-unoptimized-debug=0 bin/test_clang_in -o
bin/test_clang_out" <br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
--num-threads=<n> - Specify the maximum
number (n) of <br>
simultaneous threads<br>
to use when optimizing
input file.<br>
Defaults to the number of
cores on the <br>
current machine.<br>
</blockquote>
<div><br>
</div>
<div>
<div style="color:rgb(0,0,0)">We can make `j` the
default alias for this option. It's supported by
dsymutil but we kept the long option in the help
output but I'm happy to change that. <br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>added "j" as alias for the --num-threads.</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
--strip-all - Strip all debug tables.<br>
--strip=<name1,name2> - Strip specified
debug info tables.<br>
--strip-unoptimized-debug - Strip all unoptimized
debug tables.<br>
--tombstone=<value> - Tombstone value
used as a marker of <br>
invalid address.<br>
=bfd - BFD default value<br>
=dwarf6 - Dwarf v6.<br>
--verbose - Enable verbose logging
and encoding details.<br>
<br>
Generic Options:<br>
<br>
--help - Display available options
(--help-hidden <br>
for more)<br>
--version - Display the version of
this program<br>
<br>
</blockquote>
<div><br>
</div>
<div>dsymutil also has a --verify option which runs the
DWARF verifier on the output (I'm working on a patch to
also run it on the input). It might be a nice addition
to have this too down the road. <br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<p>Ok, would add it.</p>
<p><br>
</p>
<p>Thank you for the comments!</p>
<p>Alexey.<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:CAJQy47eCQgW+VTqqrb6FVD5_uUaeL1eCFuL3Y+XXMeyxk-ExvQ@mail.gmail.com">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div class="gmail_quote">
<div> </div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>