[llvm-dev] llvm-objcopy proposal

Fri Jun 2 11:54:54 PDT 2017

Hello Jake,

I don’t have any experience with objcopy but have some with the classic UNIX strip especially on darwin for Mach-O.  I even prototyped up an llvm-strip and got it working enough to do the default stripping on an hello world executable to get a "bit for bit" match for a darwin for Mach-O file against the existing strip(1) tool.

As Sean points out, there is nothing currently in the llvm’s libObject code that writes a binary.  I do agree with Sean that getting this correct it is best to not try to make a unified bit of code that writes the three formats llvm currently cares about (ELF, COFF and Mach-O).

Also my experience suggests, creating tools that write “modified binaries” from fully linked binaries is quite different than writing binaries from an assembler or linker.  As you have very limited degrees of freedom slicing and dicing a fully linked file and still have a correctly formed file.  That is you can’t usually change any addresses, etc. and you have to update all the references to things like indexes into the symbol table, string table, etc from other tables in the object file. So while it might be good to "keep an eye out” for what could be shared, if you push too hard on that I think your design may not turn out all that clean.

That said, I do think there is value sharing the "object file reader” code so that all the error checking can be in one place.  While I’m not a big fan of libObject it did prove workable for my prototype for llvm-strip for the reading in of object files.  But I did as Sean suggested and went with a totally object format dependent bit of code to write a modified linked object.  I did this a bit cleaner that what I did with the darwin cctools open source code I wrote many decades ago.  But I feel it is best to have an object format dependent bit of code to put back together the modified parts of a linked Mach-O file.  As that is easy to get wrong and a pain to debug when one does get it wrong.  My thinking was to have a bit of library code that the darwin tools like install_name_tool(1), bitcode_strip(1), etc could shared and use to reconstruct their modified fully linked binaries.

My thoughts,
Kev

> On Jun 1, 2017, at 6:28 PM, Sean Silva via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> I've thought about building an llvm-objcopy for a long time and the approach you've outlined is the same one that I would have suggested (analyzing a set of critical use cases, triaging them, and then incrementally building them. In other words, this approach SGTM. I've CC'ed a couple other people who might have some comments (but I've talked with them about objcopy before in one way or another and I don't get the feeling that they would disagree with the overall approach).
> 
> A couple specific suggestions about the more concrete code design.
> 
> IIRC, when I looked at GNU objcopy I saw why it was called objcopy: it basically looked like it was originally a program that copied an object file without modification. Then command line argument parsing was added and tons of flags appeared that triggered a mess of random `if` statements that would modify the copying process. I don't think we want to have an implementation like that, especially since we don't have anything even remotely similar to the "writing" side of the BFD library (libObject's object format agnostic interface is only for reading).
> 
> 1. It seems that (besides the format conversion operations) everything is ELF. It will dramatically simplify the implementation to make it ELF-only at first. I would even recommend against using libObject's object-format agnostic reading implementation. One of the things we have learned while working on LLD is that abstracting across object formats is very difficult to get right. There are just too many subtle semantic differences that penetrate very deep into the program. As an example, LLD/ELF (which is ELF-only) and LLD/COFF (which is COFF-only) are each about 1/3 (or less) the size of the previous linker design that attempted to handle all 3 formats (MachO is the third format) together (and they are actually much more complete than the previous design was before we switched to the new design; normalizing for the difference in features, 1/6 the size is probably more accurate). Unless you also have as a goal (I don't think you do) to make progress towards an LLVM-based analog of the GNU BFD library as you work on objcopy, sticking to object-format specific code is probably preferable. It's *a lot* easier to look at format-specific implementations and see what can be shared vs making a mistake about the abstractions used across object formats and require untangling the incorrect abstraction.
> 
> 2. I would really suggest making sure that there is a very, very clear separation between the objcopy-compatible command line parsing and the internals that actually do the work. In fact, it may be reasonable to have the separation be so profound that tool is called `llvm-objtool` (with subcommands like `llvm-objtool formatconvert ...`) and have the objcopy-compatible command line parsing essentially dispatch into one of them (with such parsing be triggered by looking at argv[0]). Regardless of whether it makes sense to go that far, it's best to err on the side of having separate implementations even if it seems to require duplicating some code. For example, if you have the same for loop in two different "subcommands", it may be best to make an iterator encapsulating it (or a helper function that takes a lambda) rather than adding a bool parameter to the function containing that loop.
> 
> 3. (This is just a "keep an eye out" type thing. No specific suggestion.) As the implementation of objcopy progresses, especially if the object writing code is incrementally factored out between shared routines (as we try to avoid one huge writing routine taking 17 arguments controlling what it does), we may want to look at it together with other object file writing code in the LLVM project (LLD, llvm-dwp, MC) to see what can be unified. llvm-dwp is probably the most similar and most likely to be able to share code.
> 
> 
> -- Sean Silva 
> 
> On Thu, Jun 1, 2017 at 5:21 PM, Jake Ehrlich via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> LLVM already implements its own version of almost all of binutils. The
> exceptions to this rule are objcopy and strip. This is a proposal to implement
> an llvm version of objcopy/strip to complete llvm’s binutils.
> 
> Several projects only use gnu binutils because of objcopy/strip. LLVM itself
> uses objcopy in fact. Chromium and Fuchsia currently use objcopy as well. If you
> want to distribute your build tools this is a problem due to licensing. It’s
> also a bit of a blemish on LLVM because LLVM could be made more self sufficient
> if there was an llvm version of objcopy. Additionally Chromium is one of the
> popular benchmarks for LLVM so it would be nice if Chromium didn’t have to use
> binutils. Using
> [elftoolchain](https://sourceforge.net/p/elftoolchain/wiki/Home/ <https://sourceforge.net/p/elftoolchain/wiki/Home/>)
> solves the licensing issue for Fuchsia but is elf specific and only solves the
> issue for Fuchsia. I propose implementing llvm-objcopy to be a minimum viable
> replacement for objcopy.
> 
> I’ve gone though the sources of LLVM, Clang, Chromium, and Fuchsia to try and
> find the major use cases of objcopy. Here is a list of use cases I have found
> and which projects use them. This list includes some use cases not found in
> these 4 projects.
> 
> 1. Use Case: Stripping debug information of an executable to a file  
>    Who uses it: LLVM, Fuchsia, Chromium
> 
>    ```sh
>    objcopy --only-keep-debug foo foo.debug
>    objcopy --strip-debug foo foo
>    ```
> 
>    [Example use](https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake <https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake>)
>    When it is useful:  
>    This reduces the size of the file for distribution while maintaining the debug
>    information in a file for later use. Anyone distributing an executable in
>    anyway could benefit from this.
> 
> 2. Use Case: Stripping debug information of a relocatable object to a file  
>    Who uses it: None of the 4 projects considered
> 
>    ```sh
>    objcopy --only-keep-debug foo.o foo.debug
>    objcopy --strip-debug foo.o foo.o
>    ```
> 
>    When it is useful:  
>    In distribution of an SDK in the form of an archive it would be nice to strip
>    this information. This allows debug information to be distributed separately.
> 
> 3. Use Case: Stripping debug information of a shared library to a file  
>    Who uses it: None of the 4 projects
> 
>    ```sh
>    objcopy --only-keep-debug foo.so foo.debug
>    objcopy --strip-debug foo.so foo.so
>    ```
> 
>    When is it Useful:  
>    Same benefits as the previous case. If you want to distribute a library this
>    option allows you to distribute a smaller binary while maintaining the ability
>    to debug.
> 
> 4. Use Case:		Stripping an executable
>    Who uses it:		None of the 4 projects
> 
>    ```sh
>    objcopy --strip-all foo foo
>    ```
> 
>    When is it useful:  
>    Anytime an executable is being distributed and there is no reason to keep
>    debugging information. This makes the executable smaller than simply
>    stripping debug info and doesn't produce an extra file.
> 
> 5. Use Case: “Complete stripping” an executable  
>    Who uses it: None of the 4 projects
>    ```sh
>    eu-strip --strip-sections foo
>    ```
>    When is it useful:  
>    This is an extreme form of stripping that even strips the section headers
>    since they are not needed for loading. This is useful in the same contexts as
>    stripping but some tools and dynamic linkers may be confused by it. This is
>    possibly only valid on ELF unlike general stripping which is a valid option on
>    multiple platforms.
> 
> 6. Use Case: DWARF fission  
>    Who uses it: Clang, Fuchsia, Chromium
> 
>    ```sh
>    objcopy --extract-dwo foo foo.debug
>    objcopy --strip-dwo foo foo
>    ```
> 
>    [Example use  1](https://github.com/llvm-mirror/clang/blob/3efd04e48004628cfaffead00ecb1c206b0b6cb2/lib/Driver/ToolChains/CommonArgs.cpp <https://github.com/llvm-mirror/clang/blob/3efd04e48004628cfaffead00ecb1c206b0b6cb2/lib/Driver/ToolChains/CommonArgs.cpp>)
>    [Example use 2](https://github.com/llvm-mirror/clang/blob/a0badfbffbee71c2c757d580fc852d2124dadc5a/test/Driver/split-debug.s <https://github.com/llvm-mirror/clang/blob/a0badfbffbee71c2c757d580fc852d2124dadc5a/test/Driver/split-debug.s>)
> 
>    When is it useful:  
>    DWARF fission can be used to speed up large builds. In some cases builds can
>    be too large to be handled and DWARF fission makes this manageable. DWARF
>    fission is useful in almost any project of sufficient size.
> 
> 7. Use Case: Converting an executable to binary  
>    Who uses it: Fuchsia
> 
>    ```sh
>    objcopy -O binary magenta.elf magenta.bin
>    ```
> 
>    [Example use](https://fuchsia.googlesource.com/magenta/+/master/make/build.mk#20 <https://fuchsia.googlesource.com/magenta/+/master/make/build.mk#20>)
> 
>    When is it useful:  
>    For kernels and embedded applications that need just the raw segments.
> 
> 8. Use Case: Adding a gdb index  
>    Who uses it: Chromium
> 
>    ```sh
>    gdb -batch foo -ex "save gdb-index dir" -ex quit
>    objcopy --add-section .gdb_index="dir/foo.gdb-index" \
>            --set-section-flags .gdb_index=readonly foo foo
>    ```
> 
>    [Example use](https://cs.chromium.org/chromium/src/build/gdb-add-index?type=cs&q=objcopy&l=71 <https://cs.chromium.org/chromium/src/build/gdb-add-index?type=cs&q=objcopy&l=71>)
> 
>    When is it useful:  
>    Adding a gdb index reduces startup time for debugging an application. Any
>    sufficiently large program with a sufficiently large amount of debug
>    information can potentially benefit from this.
> 
> 9. Use Case: Converting between formats  
>    Who uses it: Fuchsia (only in Magenta GCC build)
> 
>    ```sh
>    objcopy --target=pei-x86-64 magenta.elf megenta.pe <http://megenta.pe/>
>    ```
> 
>    [Example use](https://fuchsia.googlesource.com/magenta/+/master/bootloader/build.mk#97 <https://fuchsia.googlesource.com/magenta/+/master/bootloader/build.mk#97>)
> 
>    When is it useful:  
>    This is primarily useful when you can’t directly target a needed format.
> 
> 10. Use Case: Removing symbols not needed for relocation  
>     Who uses it: Chromium
> 
>     ```sh
>     objcopy --strip-unneeded foo foo
>     ```
> 
>     [Example use](https://cs.chromium.org/chromium/src/third_party/libevdev/src/common.mk?type=cs&q=objcopy&l=397 <https://cs.chromium.org/chromium/src/third_party/libevdev/src/common.mk?type=cs&q=objcopy&l=397>)
> 
>     When is it useful:  
>     This is useful when shipping an SDK or some relocatable binaries.
> 
> 11. Use Case: Removing local symbols  
>     Who uses it: LLVM
> 
>     ```sh
>     objcopy --discard-all foo foo
>     ```
> 
>     [Example use](https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake <https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake>)
>     (hidden in definition of “strip_command” using strip instead of objcopy and
>     using -x instead of --discard-all)
> 
>     When is it useful:  
>     Anytime you don’t need locals for debugging this can be useful.
> 
> 12. Use Case: Removing a specific unwanted section  
>     Who uses it: LLVM
> 
>     ```sh
>     objcopy --remove-section=.debug_aranges foo foo
>     ```
> 
>     [Example use](https://github.com/llvm-mirror/llvm/blob/93e6e5414ded14bcbb233baaaa5567132fee9a0c/test/DebugInfo/Inputs/fission-ranges.cc <https://github.com/llvm-mirror/llvm/blob/93e6e5414ded14bcbb233baaaa5567132fee9a0c/test/DebugInfo/Inputs/fission-ranges.cc>)
> 
>     When is it useful:
>     This is useful when you know that you have an unwanted section that isn’t
>     removed by one of the other stripping options. This can also be used to
>     remove an existing section for replacement by a new section.
> 
> We would like to build this up incrementally by solving specific use cases
> as they come up. To start with we would like to tackle the use cases
> important to us. We primarily care about fully linked executables and not
> relocatable files. I plan to implement conversion from ELF to binary first.
> After that I plan on implementing stripping for ELF executables.
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170602/f3807549/attachment.html>