[llvm-dev] llvm-objcopy proposal

Thu Jun 1 18:28:40 PDT 2017

I've thought about building an llvm-objcopy for a long time and the
approach you've outlined is the same one that I would have suggested
(analyzing a set of critical use cases, triaging them, and then
incrementally building them. In other words, this approach SGTM. I've CC'ed
a couple other people who might have some comments (but I've talked with
them about objcopy before in one way or another and I don't get the feeling
that they would disagree with the overall approach).

A couple specific suggestions about the more concrete code design.

IIRC, when I looked at GNU objcopy I saw why it was called objcopy: it
basically looked like it was originally a program that copied an object
file without modification. Then command line argument parsing was added and
tons of flags appeared that triggered a mess of random `if` statements that
would modify the copying process. I don't think we want to have an
implementation like that, especially since we don't have anything even
remotely similar to the "writing" side of the BFD library (libObject's
object format agnostic interface is only for reading).

1. It seems that (besides the format conversion operations) everything is
ELF. It will dramatically simplify the implementation to make it ELF-only
at first. I would even recommend against using libObject's object-format
agnostic reading implementation. One of the things we have learned while
working on LLD is that abstracting across object formats is very difficult
to get right. There are just too many subtle semantic differences that
penetrate very deep into the program. As an example, LLD/ELF (which is
ELF-only) and LLD/COFF (which is COFF-only) are each about 1/3 (or less)
the size of the previous linker design that attempted to handle all 3
formats (MachO is the third format) together (and they are actually much
more complete than the previous design was before we switched to the new
design; normalizing for the difference in features, 1/6 the size is
probably more accurate). Unless you also have as a goal (I don't think you
do) to make progress towards an LLVM-based analog of the GNU BFD library as
you work on objcopy, sticking to object-format specific code is probably
preferable. It's *a lot* easier to look at format-specific implementations
and see what can be shared vs making a mistake about the abstractions used
across object formats and require untangling the incorrect abstraction.

2. I would really suggest making sure that there is a very, very clear
separation between the objcopy-compatible command line parsing and the
internals that actually do the work. In fact, it may be reasonable to have
the separation be so profound that tool is called `llvm-objtool` (with
subcommands like `llvm-objtool formatconvert ...`) and have the
objcopy-compatible command line parsing essentially dispatch into one of
them (with such parsing be triggered by looking at argv[0]). Regardless of
whether it makes sense to go that far, it's best to err on the side of
having separate implementations even if it seems to require duplicating
some code. For example, if you have the same for loop in two different
"subcommands", it may be best to make an iterator encapsulating it (or a
helper function that takes a lambda) rather than adding a bool parameter to
the function containing that loop.

3. (This is just a "keep an eye out" type thing. No specific suggestion.)
As the implementation of objcopy progresses, especially if the object
writing code is incrementally factored out between shared routines (as we
try to avoid one huge writing routine taking 17 arguments controlling what
it does), we may want to look at it together with other object file writing
code in the LLVM project (LLD, llvm-dwp, MC) to see what can be unified.
llvm-dwp is probably the most similar and most likely to be able to share
code.

-- Sean Silva

On Thu, Jun 1, 2017 at 5:21 PM, Jake Ehrlich via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> LLVM already implements its own version of almost all of binutils. The
> exceptions to this rule are objcopy and strip. This is a proposal to
> implement
> an llvm version of objcopy/strip to complete llvm’s binutils.
>
> Several projects only use gnu binutils because of objcopy/strip. LLVM
> itself
> uses objcopy in fact. Chromium and Fuchsia currently use objcopy as well.
> If you
> want to distribute your build tools this is a problem due to licensing.
> It’s
> also a bit of a blemish on LLVM because LLVM could be made more self
> sufficient
> if there was an llvm version of objcopy. Additionally Chromium is one of
> the
> popular benchmarks for LLVM so it would be nice if Chromium didn’t have to
> use
> binutils. Using
> [elftoolchain](https://sourceforge.net/p/elftoolchain/wiki/Home/)
> solves the licensing issue for Fuchsia but is elf specific and only solves
> the
> issue for Fuchsia. I propose implementing llvm-objcopy to be a minimum
> viable
> replacement for objcopy.
>
> I’ve gone though the sources of LLVM, Clang, Chromium, and Fuchsia to try
> and
> find the major use cases of objcopy. Here is a list of use cases I have
> found
> and which projects use them. This list includes some use cases not found in
> these 4 projects.
>
> 1. Use Case: Stripping debug information of an executable to a file
>    Who uses it: LLVM, Fuchsia, Chromium
>
>    ```sh
>    objcopy --only-keep-debug foo foo.debug
>    objcopy --strip-debug foo foo
>    ```
>
>    [Example use](https://github.com/llvm-mirror/llvm/blob/
> cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake)
>    When it is useful:
>    This reduces the size of the file for distribution while maintaining
> the debug
>    information in a file for later use. Anyone distributing an executable
> in
>    anyway could benefit from this.
>
> 2. Use Case: Stripping debug information of a relocatable object to a file
>
>    Who uses it: None of the 4 projects considered
>
>    ```sh
>    objcopy --only-keep-debug foo.o foo.debug
>    objcopy --strip-debug foo.o foo.o
>    ```
>
>    When it is useful:
>    In distribution of an SDK in the form of an archive it would be nice to
> strip
>    this information. This allows debug information to be distributed
> separately.
>
> 3. Use Case: Stripping debug information of a shared library to a file
>    Who uses it: None of the 4 projects
>
>    ```sh
>    objcopy --only-keep-debug foo.so foo.debug
>    objcopy --strip-debug foo.so foo.so
>    ```
>
>    When is it Useful:
>    Same benefits as the previous case. If you want to distribute a library
> this
>    option allows you to distribute a smaller binary while maintaining the
> ability
>    to debug.
>
> 4. Use Case: Stripping an executable
>    Who uses it: None of the 4 projects
>
>    ```sh
>    objcopy --strip-all foo foo
>    ```
>
>    When is it useful:
>    Anytime an executable is being distributed and there is no reason to
> keep
>    debugging information. This makes the executable smaller than simply
>    stripping debug info and doesn't produce an extra file.
>
> 5. Use Case: “Complete stripping” an executable
>    Who uses it: None of the 4 projects
>    ```sh
>    eu-strip --strip-sections foo
>    ```
>    When is it useful:
>    This is an extreme form of stripping that even strips the section
> headers
>    since they are not needed for loading. This is useful in the same
> contexts as
>    stripping but some tools and dynamic linkers may be confused by it.
> This is
>    possibly only valid on ELF unlike general stripping which is a valid
> option on
>    multiple platforms.
>
> 6. Use Case: DWARF fission
>    Who uses it: Clang, Fuchsia, Chromium
>
>    ```sh
>    objcopy --extract-dwo foo foo.debug
>    objcopy --strip-dwo foo foo
>    ```
>
>    [Example use  1](https://github.com/llvm-mirror/clang/blob/
> 3efd04e48004628cfaffead00ecb1c206b0b6cb2/lib/Driver/
> ToolChains/CommonArgs.cpp)
>    [Example use 2](https://github.com/llvm-mirror/clang/blob/
> a0badfbffbee71c2c757d580fc852d2124dadc5a/test/Driver/split-debug.s)
>
>    When is it useful:
>    DWARF fission can be used to speed up large builds. In some cases
> builds can
>    be too large to be handled and DWARF fission makes this manageable.
> DWARF
>    fission is useful in almost any project of sufficient size.
>
> 7. Use Case: Converting an executable to binary
>    Who uses it: Fuchsia
>
>    ```sh
>    objcopy -O binary magenta.elf magenta.bin
>    ```
>
>    [Example use](https://fuchsia.googlesource.com/magenta/+/
> master/make/build.mk#20)
>
>    When is it useful:
>    For kernels and embedded applications that need just the raw segments.
>
> 8. Use Case: Adding a gdb index
>    Who uses it: Chromium
>
>    ```sh
>    gdb -batch foo -ex "save gdb-index dir" -ex quit
>    objcopy --add-section .gdb_index="dir/foo.gdb-index" \
>            --set-section-flags .gdb_index=readonly foo foo
>    ```
>
>    [Example use](https://cs.chromium.org/chromium/src/build/gdb-add-
> index?type=cs&q=objcopy&l=71)
>
>    When is it useful:
>    Adding a gdb index reduces startup time for debugging an application.
> Any
>    sufficiently large program with a sufficiently large amount of debug
>    information can potentially benefit from this.
>
> 9. Use Case: Converting between formats
>    Who uses it: Fuchsia (only in Magenta GCC build)
>
>    ```sh
>    objcopy --target=pei-x86-64 magenta.elf megenta.pe
>    ```
>
>    [Example use](https://fuchsia.googlesource.com/magenta/+/
> master/bootloader/build.mk#97)
>
>    When is it useful:
>    This is primarily useful when you can’t directly target a needed format.
>
> 10. Use Case: Removing symbols not needed for relocation
>     Who uses it: Chromium
>
>     ```sh
>     objcopy --strip-unneeded foo foo
>     ```
>
>     [Example use](https://cs.chromium.org/chromium/src/third_party/
> libevdev/src/common.mk?type=cs&q=objcopy&l=397)
>
>     When is it useful:
>     This is useful when shipping an SDK or some relocatable binaries.
>
> 11. Use Case: Removing local symbols
>     Who uses it: LLVM
>
>     ```sh
>     objcopy --discard-all foo foo
>     ```
>
>     [Example use](https://github.com/llvm-mirror/llvm/blob/
> cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake)
>     (hidden in definition of “strip_command” using strip instead of
> objcopy and
>     using -x instead of --discard-all)
>
>     When is it useful:
>     Anytime you don’t need locals for debugging this can be useful.
>
> 12. Use Case: Removing a specific unwanted section
>     Who uses it: LLVM
>
>     ```sh
>     objcopy --remove-section=.debug_aranges foo foo
>     ```
>
>     [Example use](https://github.com/llvm-mirror/llvm/blob/
> 93e6e5414ded14bcbb233baaaa5567132fee9a0c/test/DebugInfo/
> Inputs/fission-ranges.cc)
>
>     When is it useful:
>     This is useful when you know that you have an unwanted section that
> isn’t
>     removed by one of the other stripping options. This can also be used to
>     remove an existing section for replacement by a new section.
>
> We would like to build this up incrementally by solving specific use cases
> as they come up. To start with we would like to tackle the use cases
> important to us. We primarily care about fully linked executables and not
> relocatable files. I plan to implement conversion from ELF to binary first.
> After that I plan on implementing stripping for ELF executables.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170601/cd908c63/attachment.html>