[llvm-dev] [RFC] LLVM Busybox Proposal

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 21 11:17:24 PDT 2021


On 2021-06-21, Leonard Chan via llvm-dev wrote:
>Hello all,
>
>When building LLVM tools, including Clang and lld, it's currently possible
>to use either static or shared linking for LLVM libraries. The latter can
>significantly reduce the size of the toolchain since we aren't duplicating
>the same code in every binary, but the dynamic relocations can affect
>performance. The former doesn't affect performance but significantly
>increases the size of our toolchain.

The dynamic relocation claim is not true.

A thin executable using just -Bsymbolic libLLVM-13git.so is almost
identical to a mostly statically linked PIE.

I added -Bsymbolic-functions to libLLVM.so and libclang-cpp.so which
has claimed most of the -Bsymbolic benefits.

The shared object approach *can be* inferior to static linking plus
-Wl,--gc-sections because with libLLVM.so and libclang-cpp.so we are
making many many API dynamic and that inhibits the --gc-sections
benefits. However, if clang and lld are shipped together with
llvm-objdump/llvm-readobj/llvm-objcopy/.... , I expect the non-GCable
code due to shared objects will be significantly smaller.

I am conservative on adding yet another mechanism.

>We would like to implement a support for a third approach which we call,
>for a lack of better term, "busybox" feature, where everything is compiled
>into a single binary which then dispatches into an appropriate tool
>depending on the first command. This approach can significantly reduce the
>size by deduplicating all of the shared code without affecting the
>performance.
>
>In terms of implementation, the build would produce a single binary called
>`llvm` and the first command would identify the tool. For example, instead
>of invoking `llvm-nm` you'd invoke `llvm nm`. Ideally we would also support
>creation of `llvm-nm` symlink which redirects to `llvm` for backwards
>compatibility.
>This functionality would ideally be implemented as an option in the CMake
>build that toolchain vendors can opt into.
>
>The implementation would have to replace `main` function of each tool with
>an entrypoint regular function which is registered into a tool registry.
>This could be wrapped in a macro for convenience. When the "busybox"
>feature is disabled, the macro would expand to a `main` function as before
>and redirect to the entrypoint function. When the "busybox" feature is
>enabled, it would register the entrypoint function into the registry, which
>would be responsible for the dispatching based on the tool name. Ideally,
>toolchain maintainers would also be able to control which tools they could
>add to the "busybox" binary via CMake build options, so toolchains will
>only include the tools they use.
>
>One implementation detail we think will be an issue is merging arguments in
>individual tools that use `cl::opt`. `cl::opt` works by maintaining a
>global state of flags, but we aren’t confident of what the resulting
>behavior will be when merging them together in the dispatching `main`. What
>we would like to avoid is having flags used by one specific tool available
>on other tools. To address this issue, we would like to migrate all tools
>to use `OptTable` which doesn't have this issue and has been the general
>direction most tools have been already moving into.
>
>A second issue would be resolving symlinks. For example, llvm-objcopy will
>check argv[0] and behave as llvm-strip (ie. use the right flags +
>configuration) if it is called via a symlink that “looks like” a strip
>tool, but for all other cases it will run under the default objcopy mode.
>The “looks like” function is usually an `Is` function copied in multiple
>tools that is essentially a substring check: so symlinks like `llvm-strip`,
>strip.exe, and `gnu-llvm-strip-10` all result in using the strip “mode”
>while all other names use the objcopy mode. To replicate the same behavior,
>we will need to take great care in making sure symlinks to the busybox tool
>dispatch correctly to the appropriate llvm tool, which might mean exposing
>and merging these `Is` functions.
>
>Some open questions:
>- People's initial thoughts/opinions?
>- Are there existing tools in LLVM that already do this?
>- Other implementation details/global states that we would also need to
>account for?

crunchgen. As you said, argv[0] checking code needs to be taken care of.
We should make these executables' main file not have colliding symbols.
I have cleaned up a lot of files.

>- Leonard

>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list