<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hello Jake,<div class=""><br class=""></div><div class="">I don’t have any experience with objcopy but have some with the classic UNIX strip especially on darwin for Mach-O.  I even prototyped up an llvm-strip and got it working enough to do the default stripping on an hello world executable to get a "bit for bit" match for a darwin for Mach-O file against the existing strip(1) tool.</div><div class=""><br class=""></div><div class="">As Sean points out, there is nothing currently in the llvm’s libObject code that writes a binary.  I do agree with Sean that getting this correct it is best to not try to make a unified bit of code that writes the three formats llvm currently cares about (ELF, COFF and Mach-O).</div><div class=""><br class=""></div><div class="">Also my experience suggests, creating tools that write “modified binaries” from fully linked binaries is quite different than writing binaries from an assembler or linker.  As you have very limited degrees of freedom slicing and dicing a fully linked file and still have a correctly formed file.  That is you can’t usually change any addresses, etc. and you have to update all the references to things like indexes into the symbol table, string table, etc from other tables in the object file. So while it might be good to "keep an eye out” for what could be shared, if you push too hard on that I think your design may not turn out all that clean.</div><div class=""><br class=""></div><div class="">That said, I do think there is value sharing the "object file reader” code so that all the error checking can be in one place.  While I’m not a big fan of libObject it did prove workable for my prototype for llvm-strip for the reading in of object files.  But I did as Sean suggested and went with a totally object format dependent bit of code to write a modified linked object.  I did this a bit cleaner that what I did with the darwin cctools open source code I wrote many decades ago.  But I feel it is best to have an object format dependent bit of code to put back together the modified parts of a linked Mach-O file.  As that is easy to get wrong and a pain to debug when one does get it wrong.  My thinking was to have a bit of library code that the darwin tools like install_name_tool(1), bitcode_strip(1), etc could shared and use to reconstruct their modified fully linked binaries.</div><div class=""><br class=""></div><div class="">My thoughts,</div><div class="">Kev</div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On Jun 1, 2017, at 6:28 PM, Sean Silva via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">I've thought about building an llvm-objcopy for a long time and the approach you've outlined is the same one that I would have suggested (analyzing a set of critical use cases, triaging them, and then incrementally building them. In other words, this approach SGTM. I've CC'ed a couple other people who might have some comments (but I've talked with them about objcopy before in one way or another and I don't get the feeling that they would disagree with the overall approach).<div class=""><br class=""></div><div class="">A couple specific suggestions about the more concrete code design.</div><div class=""><br class=""></div><div class="">IIRC, when I looked at GNU objcopy I saw why it was called objcopy: it basically looked like it was originally a program that copied an object file without modification. Then command line argument parsing was added and tons of flags appeared that triggered a mess of random `if` statements that would modify the copying process. I don't think we want to have an implementation like that, especially since we don't have anything even remotely similar to the "writing" side of the BFD library (libObject's object format agnostic interface is only for reading).</div><div class=""><br class=""></div><div class="">1. It seems that (besides the format conversion operations) everything is ELF. It will dramatically simplify the implementation to make it ELF-only at first. I would even recommend against using libObject's object-format agnostic reading implementation. One of the things we have learned while working on LLD is that abstracting across object formats is very difficult to get right. There are just too many subtle semantic differences that penetrate very deep into the program. As an example, LLD/ELF (which is ELF-only) and LLD/COFF (which is COFF-only) are each about 1/3 (or less) the size of the previous linker design that attempted to handle all 3 formats (MachO is the third format) together (and they are actually much more complete than the previous design was before we switched to the new design; normalizing for the difference in features, 1/6 the size is probably more accurate). Unless you also have as a goal (I don't think you do) to make progress towards an LLVM-based analog of the GNU BFD library as you work on objcopy, sticking to object-format specific code is probably preferable. It's *a lot* easier to look at format-specific implementations and see what can be shared vs making a mistake about the abstractions used across object formats and require untangling the incorrect abstraction.</div><div class=""><br class=""></div><div class="">2. I would really suggest making sure that there is a very, very clear separation between the objcopy-compatible command line parsing and the internals that actually do the work. In fact, it may be reasonable to have the separation be so profound that tool is called `llvm-objtool` (with subcommands like `llvm-objtool formatconvert ...`) and have the objcopy-compatible command line parsing essentially dispatch into one of them (with such parsing be triggered by looking at argv[0]). Regardless of whether it makes sense to go that far, it's best to err on the side of having separate implementations even if it seems to require duplicating some code. For example, if you have the same for loop in two different "subcommands", it may be best to make an iterator encapsulating it (or a helper function that takes a lambda) rather than adding a bool parameter to the function containing that loop.</div><div class=""><br class=""></div><div class="">3. (This is just a "keep an eye out" type thing. No specific suggestion.) As the implementation of objcopy progresses, especially if the object writing code is incrementally factored out between shared routines (as we try to avoid one huge writing routine taking 17 arguments controlling what it does), we may want to look at it together with other object file writing code in the LLVM project (LLD, llvm-dwp, MC) to see what can be unified. llvm-dwp is probably the most similar and most likely to be able to share code.</div><div class=""><div class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class="">-- Sean Silva </div></div></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Jun 1, 2017 at 5:21 PM, Jake Ehrlich via llvm-dev <span dir="ltr" class=""><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class=""><div class="">LLVM already implements its own version of almost all of binutils. The</div><div class="">exceptions to this rule are objcopy and strip. This is a proposal to implement</div><div class="">an llvm version of objcopy/strip to complete llvm’s binutils.</div><div class=""><br class=""></div><div class="">Several projects only use gnu binutils because of objcopy/strip. LLVM itself</div><div class="">uses objcopy in fact. Chromium and Fuchsia currently use objcopy as well. If you</div><div class="">want to distribute your build tools this is a problem due to licensing. It’s</div><div class="">also a bit of a blemish on LLVM because LLVM could be made more self sufficient</div><div class="">if there was an llvm version of objcopy. Additionally Chromium is one of the</div><div class="">popular benchmarks for LLVM so it would be nice if Chromium didn’t have to use</div><div class="">binutils. Using</div><div class="">[elftoolchain](<a href="https://sourceforge.net/p/elftoolchain/wiki/Home/" target="_blank" class="">https://<wbr class="">sourceforge.net/p/<wbr class="">elftoolchain/wiki/Home/</a>)</div><div class="">solves the licensing issue for Fuchsia but is elf specific and only solves the</div><div class="">issue for Fuchsia. I propose implementing llvm-objcopy to be a minimum viable</div><div class="">replacement for objcopy.</div><div class=""><br class=""></div><div class="">I’ve gone though the sources of LLVM, Clang, Chromium, and Fuchsia to try and</div><div class="">find the major use cases of objcopy. Here is a list of use cases I have found</div><div class="">and which projects use them. This list includes some use cases not found in</div><div class="">these 4 projects.</div><div class=""><br class=""></div><div class="">1. Use Case: Stripping debug information of an executable to a file  </div><div class="">   Who uses it: LLVM, Fuchsia, Chromium</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --only-keep-debug foo foo.debug</div><div class="">   objcopy --strip-debug foo foo</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   [Example use](<a href="https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake" target="_blank" class="">https://github.com/llvm-<wbr class="">mirror/llvm/blob/<wbr class="">cd789d8cfe12aa374e66eafc748f4f<wbr class="">c06e149ca7/cmake/modules/<wbr class="">AddLLVM.cmake</a>)</div><div class="">   When it is useful:  </div><div class="">   This reduces the size of the file for distribution while maintaining the debug</div><div class="">   information in a file for later use. Anyone distributing an executable in</div><div class="">   anyway could benefit from this.</div><div class=""><br class=""></div><div class="">2. Use Case: Stripping debug information of a relocatable object to a file  </div><div class="">   Who uses it: None of the 4 projects considered</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --only-keep-debug foo.o foo.debug</div><div class="">   objcopy --strip-debug foo.o foo.o</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   When it is useful:  </div><div class="">   In distribution of an SDK in the form of an archive it would be nice to strip</div><div class="">   this information. This allows debug information to be distributed separately.</div><div class=""><br class=""></div><div class="">3. Use Case: Stripping debug information of a shared library to a file  </div><div class="">   Who uses it: None of the 4 projects</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --only-keep-debug foo.so foo.debug</div><div class="">   objcopy --strip-debug foo.so foo.so</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   When is it Useful:  </div><div class="">   Same benefits as the previous case. If you want to distribute a library this</div><div class="">   option allows you to distribute a smaller binary while maintaining the ability</div><div class="">   to debug.</div><div class=""><br class=""></div><div class="">4. Use Case:<span class="m_-1673984935941539911Apple-tab-span" style="white-space:pre-wrap">         </span>Stripping an executable</div><div class="">   Who uses it:<span class="m_-1673984935941539911Apple-tab-span" style="white-space:pre-wrap">               </span>None of the 4 projects</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --strip-all foo foo</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   When is it useful:  </div><div class="">   Anytime an executable is being distributed and there is no reason to keep</div><div class="">   debugging information. This makes the executable smaller than simply</div><div class="">   stripping debug info and doesn't produce an extra file.</div><div class=""><br class=""></div><div class="">5. Use Case: “Complete stripping” an executable  </div><div class="">   Who uses it: None of the 4 projects</div><div class="">   ```sh</div><div class="">   eu-strip --strip-sections foo</div><div class="">   ```</div><div class="">   When is it useful:  </div><div class="">   This is an extreme form of stripping that even strips the section headers</div><div class="">   since they are not needed for loading. This is useful in the same contexts as</div><div class="">   stripping but some tools and dynamic linkers may be confused by it. This is</div><div class="">   possibly only valid on ELF unlike general stripping which is a valid option on</div><div class="">   multiple platforms.</div><div class=""><br class=""></div><div class="">6. Use Case: DWARF fission  </div><div class="">   Who uses it: Clang, Fuchsia, Chromium</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --extract-dwo foo foo.debug</div><div class="">   objcopy --strip-dwo foo foo</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   [Example use  1](<a href="https://github.com/llvm-mirror/clang/blob/3efd04e48004628cfaffead00ecb1c206b0b6cb2/lib/Driver/ToolChains/CommonArgs.cpp" target="_blank" class="">https://github.com/llvm-<wbr class="">mirror/clang/blob/<wbr class="">3efd04e48004628cfaffead00ecb1c<wbr class="">206b0b6cb2/lib/Driver/<wbr class="">ToolChains/CommonArgs.cpp</a>)</div><div class="">   [Example use 2](<a href="https://github.com/llvm-mirror/clang/blob/a0badfbffbee71c2c757d580fc852d2124dadc5a/test/Driver/split-debug.s" target="_blank" class="">https://github.com/llvm-<wbr class="">mirror/clang/blob/<wbr class="">a0badfbffbee71c2c757d580fc852d<wbr class="">2124dadc5a/test/Driver/split-<wbr class="">debug.s</a>)</div><div class=""><br class=""></div><div class="">   When is it useful:  </div><div class="">   DWARF fission can be used to speed up large builds. In some cases builds can</div><div class="">   be too large to be handled and DWARF fission makes this manageable. DWARF</div><div class="">   fission is useful in almost any project of sufficient size.</div><div class=""><br class=""></div><div class="">7. Use Case: Converting an executable to binary  </div><div class="">   Who uses it: Fuchsia</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy -O binary magenta.elf magenta.bin</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   [Example use](<a href="https://fuchsia.googlesource.com/magenta/+/master/make/build.mk#20" target="_blank" class="">https://fuchsia.<wbr class="">googlesource.com/magenta/+/<wbr class="">master/make/build.mk#20</a>)</div><div class=""><br class=""></div><div class="">   When is it useful:  </div><div class="">   For kernels and embedded applications that need just the raw segments.</div><div class=""><br class=""></div><div class="">8. Use Case: Adding a gdb index  </div><div class="">   Who uses it: Chromium</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   gdb -batch foo -ex "save gdb-index dir" -ex quit</div><div class="">   objcopy --add-section .gdb_index="dir/foo.gdb-index" \</div><div class="">           --set-section-flags .gdb_index=readonly foo foo</div><div class="">   ```</div><div class=""><br class=""></div><div class="">   [Example use](<a href="https://cs.chromium.org/chromium/src/build/gdb-add-index?type=cs&q=objcopy&l=71" target="_blank" class="">https://cs.chromium.org/<wbr class="">chromium/src/build/gdb-add-<wbr class="">index?type=cs&q=objcopy&l=71</a>)</div><div class=""><br class=""></div><div class="">   When is it useful:  </div><div class="">   Adding a gdb index reduces startup time for debugging an application. Any</div><div class="">   sufficiently large program with a sufficiently large amount of debug</div><div class="">   information can potentially benefit from this.</div><div class=""><br class=""></div><div class="">9. Use Case: Converting between formats  </div><div class="">   Who uses it: Fuchsia (only in Magenta GCC build)</div><div class=""><br class=""></div><div class="">   ```sh</div><div class="">   objcopy --target=pei-x86-64 magenta.elf <a href="http://megenta.pe/" target="_blank" class="">megenta.pe</a></div><div class="">   ```</div><div class=""><br class=""></div><div class="">   [Example use](<a href="https://fuchsia.googlesource.com/magenta/+/master/bootloader/build.mk#97" target="_blank" class="">https://fuchsia.<wbr class="">googlesource.com/magenta/+/<wbr class="">master/bootloader/build.mk#97</a>)</div><div class=""><br class=""></div><div class="">   When is it useful:  </div><div class="">   This is primarily useful when you can’t directly target a needed format.</div><div class=""><br class=""></div><div class="">10. Use Case: Removing symbols not needed for relocation  </div><div class="">    Who uses it: Chromium</div><div class=""><br class=""></div><div class="">    ```sh</div><div class="">    objcopy --strip-unneeded foo foo</div><div class="">    ```</div><div class=""><br class=""></div><div class="">    [Example use](<a href="https://cs.chromium.org/chromium/src/third_party/libevdev/src/common.mk?type=cs&q=objcopy&l=397" target="_blank" class="">https://cs.chromium.org/<wbr class="">chromium/src/third_party/<wbr class="">libevdev/src/common.mk?type=<wbr class="">cs&q=objcopy&l=397</a>)</div><div class=""><br class=""></div><div class="">    When is it useful:  </div><div class="">    This is useful when shipping an SDK or some relocatable binaries.</div><div class=""><br class=""></div><div class="">11. Use Case: Removing local symbols  </div><div class="">    Who uses it: LLVM</div><div class=""><br class=""></div><div class="">    ```sh</div><div class="">    objcopy --discard-all foo foo</div><div class="">    ```</div><div class=""><br class=""></div><div class="">    [Example use](<a href="https://github.com/llvm-mirror/llvm/blob/cd789d8cfe12aa374e66eafc748f4fc06e149ca7/cmake/modules/AddLLVM.cmake" target="_blank" class="">https://github.com/llvm-<wbr class="">mirror/llvm/blob/<wbr class="">cd789d8cfe12aa374e66eafc748f4f<wbr class="">c06e149ca7/cmake/modules/<wbr class="">AddLLVM.cmake</a>)</div><div class="">    (hidden in definition of “strip_command” using strip instead of objcopy and</div><div class="">    using -x instead of --discard-all)</div><div class=""><br class=""></div><div class="">    When is it useful:  </div><div class="">    Anytime you don’t need locals for debugging this can be useful.</div><div class=""><br class=""></div><div class="">12. Use Case: Removing a specific unwanted section  </div><div class="">    Who uses it: LLVM</div><div class=""><br class=""></div><div class="">    ```sh</div><div class="">    objcopy --remove-section=.debug_<wbr class="">aranges foo foo</div><div class="">    ```</div><div class=""><br class=""></div><div class="">    [Example use](<a href="https://github.com/llvm-mirror/llvm/blob/93e6e5414ded14bcbb233baaaa5567132fee9a0c/test/DebugInfo/Inputs/fission-ranges.cc" target="_blank" class="">https://github.com/llvm-<wbr class="">mirror/llvm/blob/<wbr class="">93e6e5414ded14bcbb233baaaa5567<wbr class="">132fee9a0c/test/DebugInfo/<wbr class="">Inputs/fission-ranges.cc</a>)</div><div class=""><br class=""></div><div class="">    When is it useful:</div><div class="">    This is useful when you know that you have an unwanted section that isn’t</div><div class="">    removed by one of the other stripping options. This can also be used to</div><div class="">    remove an existing section for replacement by a new section.</div><div class=""><br class=""></div><div class="">We would like to build this up incrementally by solving specific use cases</div><div class="">as they come up. To start with we would like to tackle the use cases</div><div class="">important to us. We primarily care about fully linked executables and not</div><div class="">relocatable files. I plan to implement conversion from ELF to binary first.</div><div class="">After that I plan on implementing stripping for ELF executables.</div><div class=""><br class=""></div></div>

<br class="">______________________________<wbr class="">_________________<br class="">

LLVM Developers mailing list<br class="">

<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-dev</a><br class="">

<br class=""></blockquote></div><br class=""></div>

_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></div></body></html>