[llvm-dev] [RFC] LLVM Busybox Proposal
Fangrui Song via llvm-dev
llvm-dev at lists.llvm.org
Mon Jun 21 11:17:24 PDT 2021
On 2021-06-21, Leonard Chan via llvm-dev wrote:
>When building LLVM tools, including Clang and lld, it's currently possible
>to use either static or shared linking for LLVM libraries. The latter can
>significantly reduce the size of the toolchain since we aren't duplicating
>the same code in every binary, but the dynamic relocations can affect
>performance. The former doesn't affect performance but significantly
>increases the size of our toolchain.
The dynamic relocation claim is not true.
A thin executable using just -Bsymbolic libLLVM-13git.so is almost
identical to a mostly statically linked PIE.
I added -Bsymbolic-functions to libLLVM.so and libclang-cpp.so which
has claimed most of the -Bsymbolic benefits.
The shared object approach *can be* inferior to static linking plus
-Wl,--gc-sections because with libLLVM.so and libclang-cpp.so we are
making many many API dynamic and that inhibits the --gc-sections
benefits. However, if clang and lld are shipped together with
llvm-objdump/llvm-readobj/llvm-objcopy/.... , I expect the non-GCable
code due to shared objects will be significantly smaller.
I am conservative on adding yet another mechanism.
>We would like to implement a support for a third approach which we call,
>for a lack of better term, "busybox" feature, where everything is compiled
>into a single binary which then dispatches into an appropriate tool
>depending on the first command. This approach can significantly reduce the
>size by deduplicating all of the shared code without affecting the
>In terms of implementation, the build would produce a single binary called
>`llvm` and the first command would identify the tool. For example, instead
>of invoking `llvm-nm` you'd invoke `llvm nm`. Ideally we would also support
>creation of `llvm-nm` symlink which redirects to `llvm` for backwards
>This functionality would ideally be implemented as an option in the CMake
>build that toolchain vendors can opt into.
>The implementation would have to replace `main` function of each tool with
>an entrypoint regular function which is registered into a tool registry.
>This could be wrapped in a macro for convenience. When the "busybox"
>feature is disabled, the macro would expand to a `main` function as before
>and redirect to the entrypoint function. When the "busybox" feature is
>enabled, it would register the entrypoint function into the registry, which
>would be responsible for the dispatching based on the tool name. Ideally,
>toolchain maintainers would also be able to control which tools they could
>add to the "busybox" binary via CMake build options, so toolchains will
>only include the tools they use.
>One implementation detail we think will be an issue is merging arguments in
>individual tools that use `cl::opt`. `cl::opt` works by maintaining a
>global state of flags, but we aren’t confident of what the resulting
>behavior will be when merging them together in the dispatching `main`. What
>we would like to avoid is having flags used by one specific tool available
>on other tools. To address this issue, we would like to migrate all tools
>to use `OptTable` which doesn't have this issue and has been the general
>direction most tools have been already moving into.
>A second issue would be resolving symlinks. For example, llvm-objcopy will
>check argv and behave as llvm-strip (ie. use the right flags +
>configuration) if it is called via a symlink that “looks like” a strip
>tool, but for all other cases it will run under the default objcopy mode.
>The “looks like” function is usually an `Is` function copied in multiple
>tools that is essentially a substring check: so symlinks like `llvm-strip`,
>strip.exe, and `gnu-llvm-strip-10` all result in using the strip “mode”
>while all other names use the objcopy mode. To replicate the same behavior,
>we will need to take great care in making sure symlinks to the busybox tool
>dispatch correctly to the appropriate llvm tool, which might mean exposing
>and merging these `Is` functions.
>Some open questions:
>- People's initial thoughts/opinions?
>- Are there existing tools in LLVM that already do this?
>- Other implementation details/global states that we would also need to
crunchgen. As you said, argv checking code needs to be taken care of.
We should make these executables' main file not have colliding symbols.
I have cleaned up a lot of files.
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
More information about the llvm-dev