<div dir="ltr"><div>From our perspective as a toolchain vendor, even if using shared libraries could get us closer to static linking in terms of performance, we'd still prefer static linking for the ease of distribution. Dealing with a single statically linked executable is much easier than dealing with multiple shared libraries. This is especially important in distributed compilation environments like Goma.</div><div><br></div><div>When comparing performance between static and dynamic linking, I'd also recommend doing a comparison between binaries built with PGO+LTO. Plain -O3 leaves a lot of performance on the table and as far as I'm aware, most toolchain vendors use PGO+LTO.</div><div><br></div><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 22, 2021 at 5:00 PM Fangrui Song via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2021-06-22, Leonard Chan via llvm-dev wrote:<br>

>Small update: I have a WIP prototype of the tool at<br>

><a href="https://reviews.llvm.org/D104686" rel="noreferrer" target="_blank">https://reviews.llvm.org/D104686</a>. The prototype only includes llvm-objcopy<br>

>and llvm-objdump packed together, but we're seeing size benefits from<br>

>busyboxing those two compared against having two separate tools. (More<br>

>details in the prototype's description.) I don't plan on landing this as-is<br>

>anytime soon and there's still some things I'd like to improve/change and<br>

>get feedback on.<br>

><br>

>To answer some replies:<br>

><br>

>- Ideally, we could start off with an incremental approach and not package<br>

>large tools like clang/lld off the bat. The llvm-* tools seem like a good<br>

>place to start since they're generally a bunch of relatively small binaries<br>

>that all share a subset of functions in libLLVM, but don't necessarily use<br>

>all of libLLVM, so statically linking them together (with --gc-sections)<br>

>can help dedup a lot of shared components vs having separate statically<br>

>compiled tools. In my measurements, the busybox tool containing<br>

>llvm-objcopy+objdump is negligibly larger than llvm-objdump on its own (a<br>

>couple KB difference) indicating a lot of shared code between objdump and<br>

>objcopy.<br>

><br>

>- Will Dietz's multiplexing tool looks like a good place to start from. The<br>

>only concern I can see though is mostly the amount of work needed to update<br>

>it to LLVM 13.<br>

><br>

>- We don't have plans for windows support now, but it's not off the table.<br>

>(Been mostly focusing on *nix for now). Depending on overall traction for<br>

>this idea, we could approach incrementally and add support for different<br>

>platforms over time.<br>

<br>

-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on -DLLVM_TARGETS_TO_BUILD=X86 (custom1)<br>

vs<br>

-DLLVM_TARGETS_TO_BUILD=X86 (custom2)<br>

<br>

<br>

# This is the lower bound for any multiplexing approach. clang is the largest executable.<br>

% stat -c %s /tmp/out/custom2/bin/clang-13<br>

102900408<br>

<br>

I have built clang, lld and a bunch of ELF binary utilities.<br>

<br>

% stat -c %s /tmp/out/custom1/lib/libLLVM-13git.so /tmp/out/custom1/lib/libclang-cpp.so.13git /tmp/out/custom1/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'<br>

138896544<br>

<br>

% stat -c %s /tmp/out/custom2/bin/{clang-13,lld,llvm-{ar,cov,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}} | awk '{s+=$1}END{print s}'<br>

209054440<br>

<br>

<br>

The -DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on build is doing a really good job.<br>

<br>

A multiplexing approach can squeeze some bytes from 138896544 toward 102900408,<br>

but how much can it do?<br>

<br>

<br>

>- I'm starting to think the `cl::opt` to `OptTable` issue might be<br>

>orthogonal to the busybox implementation. The tool essentially dispatches<br>

>to different "main" functions in different tools, but as long as we don't<br>

>do anything within busybox after exiting that tool's main, then the global<br>

>state issues we weren't sure of with `cl::opt` might not be of any concern<br>

>now. It may be an issue down the line if, let's say, the tool flags moved<br>

>from being "owned" by the tools themselves to instead being "owned" by<br>

>busybox, and then we'd have to merge similarly-named flags together. In<br>

>that case, migrating these tools to use `OptTable` may be necessary since<br>

>(I think) `OptTable` should handle this. This may be a tedious task, but<br>

>this is just to say that busybox won't need to be immediately blocked on it.<br>

<br>

Such improvement is useful even if we don't do multiplexing.<br>

I switched llvm-symbolizer. thakis switched llvm-objdump.<br>

I can look at some binary utilities.<br>

<br>

>- I haven't seen any issues with colliding symbols when linking (although<br>

>I've only merged two tools for now). I suspect that with small-ish llvm-*<br>

>tools, the bulk of their code is shared from libLLVM, and they have their<br>

>own distinct logic built on top of it, which could mean a low chance of<br>

>conflicting internal ABIs.<br>

><br>

>On Mon, Jun 21, 2021 at 10:54 AM Leonard Chan <<a href="mailto:leonardchan@google.com" target="_blank">leonardchan@google.com</a>><br>

>wrote:<br>

><br>

>> Hello all,<br>

>><br>

>> When building LLVM tools, including Clang and lld, it's currently possible<br>

>> to use either static or shared linking for LLVM libraries. The latter can<br>

>> significantly reduce the size of the toolchain since we aren't duplicating<br>

>> the same code in every binary, but the dynamic relocations can affect<br>

>> performance. The former doesn't affect performance but significantly<br>

>> increases the size of our toolchain.<br>

>><br>

>> We would like to implement a support for a third approach which we call,<br>

>> for a lack of better term, "busybox" feature, where everything is compiled<br>

>> into a single binary which then dispatches into an appropriate tool<br>

>> depending on the first command. This approach can significantly reduce the<br>

>> size by deduplicating all of the shared code without affecting the<br>

>> performance.<br>

>><br>

>> In terms of implementation, the build would produce a single binary called<br>

>> `llvm` and the first command would identify the tool. For example, instead<br>

>> of invoking `llvm-nm` you'd invoke `llvm nm`. Ideally we would also support<br>

>> creation of `llvm-nm` symlink which redirects to `llvm` for backwards<br>

>> compatibility.<br>

>> This functionality would ideally be implemented as an option in the CMake<br>

>> build that toolchain vendors can opt into.<br>

>><br>

>> The implementation would have to replace `main` function of each tool with<br>

>> an entrypoint regular function which is registered into a tool registry.<br>

>> This could be wrapped in a macro for convenience. When the "busybox"<br>

>> feature is disabled, the macro would expand to a `main` function as before<br>

>> and redirect to the entrypoint function. When the "busybox" feature is<br>

>> enabled, it would register the entrypoint function into the registry, which<br>

>> would be responsible for the dispatching based on the tool name. Ideally,<br>

>> toolchain maintainers would also be able to control which tools they could<br>

>> add to the "busybox" binary via CMake build options, so toolchains will<br>

>> only include the tools they use.<br>

>><br>

>> One implementation detail we think will be an issue is merging arguments<br>

>> in individual tools that use `cl::opt`. `cl::opt` works by maintaining a<br>

>> global state of flags, but we aren’t confident of what the resulting<br>

>> behavior will be when merging them together in the dispatching `main`. What<br>

>> we would like to avoid is having flags used by one specific tool available<br>

>> on other tools. To address this issue, we would like to migrate all tools<br>

>> to use `OptTable` which doesn't have this issue and has been the general<br>

>> direction most tools have been already moving into.<br>

>><br>

>> A second issue would be resolving symlinks. For example, llvm-objcopy will<br>

>> check argv[0] and behave as llvm-strip (ie. use the right flags +<br>

>> configuration) if it is called via a symlink that “looks like” a strip<br>

>> tool, but for all other cases it will run under the default objcopy mode.<br>

>> The “looks like” function is usually an `Is` function copied in multiple<br>

>> tools that is essentially a substring check: so symlinks like `llvm-strip`,<br>

>> strip.exe, and `gnu-llvm-strip-10` all result in using the strip “mode”<br>

>> while all other names use the objcopy mode. To replicate the same behavior,<br>

>> we will need to take great care in making sure symlinks to the busybox tool<br>

>> dispatch correctly to the appropriate llvm tool, which might mean exposing<br>

>> and merging these `Is` functions.<br>

>><br>

>> Some open questions:<br>

>> - People's initial thoughts/opinions?<br>

>> - Are there existing tools in LLVM that already do this?<br>

>> - Other implementation details/global states that we would also need to<br>

>> account for?<br>

>><br>

>> - Leonard<br>

>><br>

<br>

>_______________________________________________<br>

>LLVM Developers mailing list<br>

><a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div></div>