[llvm-dev] [cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries

Tue Feb 26 18:32:34 PST 2019

On Tue, Feb 26, 2019 at 5:35 PM Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi folks,
>
> I'd like to propose adding a feature to ELF lld for automatically
> partitioning a program into multiple binaries. (This will also involve
> adding a feature to clang, so I've cc'd cfe-dev as well.)
>
> == Problem statement ==
>
> Embedded devices such as cell phones, especially lower end devices, are
> typically highly resource constrained. Users of cell phone applications
> must pay a cost (in terms of download size as well as storage space) for
> all features that the application implements, even for features that are
> only used by a minority of users. Therefore, there is a desire to split
> applications into multiple pieces that can be downloaded independently, so
> that the majority of users only pay the cost of the commonly used features.
> This can technically be achieved using traditional ELF dynamic linking: the
> main part of the program can be compiled as an executable or DSO that
> exports symbols that are then imported by a separate DSO containing the
> part of the program implementing the optional feature. However, this itself
> imposes costs:
> - Each exported symbol by itself imposes additional binary size costs, as
> it requires the name of the symbol and a dynamic symbol table entry to be
> stored in both the exporting and importing DSO, and on the importing side a
> dynamic relocation, a GOT entry and likely a PLT entry must be present.
> These additional costs go some way towards defeating the purpose of
> splitting the program into pieces in the first place, and can also impact
> program startup and overall performance because of the additional
> indirections.
> - It can result in more code needing to appear in the main part of the
> program than necessary. For example, imagine that both the feature and the
> main program make use of a common (statically linked) library, but they
> call different subsets of the functions in that library. With traditional
> ELF linking we are forced to either link and export the entire library from
> the main program (even the functions unused by either part of the program)
> or carefully maintain a list of functions that are used by the other parts
> of the program.
> - Since the linker does not see the whole program at once and links each
> piece independently, a number of link-time optimizations and features stop
> working, such as LTO across partition boundaries, whole-program
> devirtualization and non-cross-DSO control flow integrity (control flow
> integrity has a cross-DSO mode, but that also imposes binary size costs
> because a significant amount of metadata needs to appear in each DSO).
> There are ways around at least the first point. For example, the program
> could arrange to use a custom mechanism for binding references between the
> main program and the feature code, such as a table of entry points.
> However, this can impose maintenance costs (for example, the binding
> mechanism can be intrusive in the source code and typically has to be
> maintained manually), and it still does not address the last point.
>
> == Proposed solution ==
>
> I propose to extend lld so that it can perform the required partitioning
> automatically, given a set of entry points for each part of the program.
> The end product of linking will be a main program (which can be either an
> executable or a DSO) combined with a set of DSOs that must be loaded at
> fixed addresses relative to the base address of the main program. These
> binaries will all share a virtual address space so that they can refer to
> one another directly using PC-relative references or RELATIVE dynamic
> relocations as if they were all statically linked together in the first
> place, rather than via the GOT (or custom GOT-equivalent).
>
> The way that it will work is that we can extend the graph reachability
> algorithm currently implemented by the linker for --gc-sections. The entry
> points for each partition are marked up with a string naming the partition,
> either at the source level with an attribute on the function or global
> variable, or by passing a flag to the compiler (this string becomes the
> partition's soname). These symbols will act as the GC roots for the
> partition and will be exported from its dynsym. Assuming that there is a
> single partition, let's call this set of symbols S2, while all other GC
> roots (e.g. non-marked-up exported symbols, sections in .init_array) we
> call S1. Any sections reachable from S1 are allocated to the main
> partition, while sections reachable only from S2 but not from S1 are
> allocated to S2's partition. We can extend this idea to multiple loadable
> partitions by defining S3, S4 and so on, but any sections reachable from
> multiple loadable partitions are allocated to the main partition even if
> they aren’t reachable from the main partition.
>
> When assigning input sections to output sections, we take into account, in
> addition to the name of the input section, the partition that the input
> section is assigned to. The SHF_ALLOC output sections are first sorted by
> partition, and then by the usual sorting rules. As usual, non-SHF_ALLOC
> sections appear last and are not sorted by partition. In the end we are
> left with a collection of output sections that might look like this:
>
> Main partition:
> 0x0000 ELF header, phdrs
> 0x1000 .rodata
> 0x2000 .dynsym
> 0x3000 .text
>
> Loadable partition 1:
> 0x4000 ELF header, phdrs
> 0x5000 .rodata
> 0x6000 .dynsym
> 0x7000 .text
>
> Loadable partition 2:
> 0x8000 ELF header, phdrs
> 0x9000 .rodata
> 0xa000 .dynsym
> 0xb000 .text
>
> Non-SHF_ALLOC sections from all partitions:
> .comment
> .debug_info
> (etc.)
>
> Now linking proceeds mostly as usual, and we’re effectively left with a
> single .so that contains all of the partitions concatenated together. This
> isn’t very useful on its own and is likely to confuse tools (e.g. due to
> the presence of multiple .dynsyms); we can add a feature to llvm-objcopy
> that will extract the individual partitions from the output file
> essentially by taking a slice of the combined .so file. These slices can
> also be fed to tools such as debuggers provided that the non-SHF_ALLOC
> sections are left in place.
>
> The envisaged usage of this feature is as follows:
> $ clang -ffunction-sections -fdata-sections -c main.c # compile the main
> program
> $ clang -ffunction-sections -fdata-sections
> -fsymbol-partition=libfeature.so -c feature.c # compile the feature
> $ clang main.o feature.o -fuse-ld=lld -shared -o libcombined.so
> -Wl,-soname,libmain.so -Wl,--gc-sections
> $ llvm-objcopy libcombined.so libmain.so --extract-partition=libmain.so
> $ llvm-objcopy libcombined.so libfeature.so
> --extract-partition=libfeature.so
>
> On Android, the loadable partitions can be loaded with the
> android_dlopen_ext
> <https://developer.android.com/ndk/reference/group/libdl> function
> passing ANDROID_DLEXT_RESERVED_ADDRESS to force it to be loaded at the
> correct address relative to the main partition. Other platforms that wish
> to support this feature will likely either need to add a similar feature to
> their dynamic loader or (in order to support loading the partitions with a
> regular dlopen) define a custom dynamic tag that will cause the dynamic
> loader to first load the main partition and then the loadable partition at
> the correct relative address.
>
> == In more detail ==
>
> Each loadable partition will require its own sections to support the
> dynamic loader and unwinder (namely: .ARM.exidx, .dynamic, .dynstr,
> .dynsym, .eh_frame_hdr, .gnu.hash, .gnu.version, .gnu.version_r, .hash,
> .interp, .rela.dyn, .relr.dyn), but will be able to share a GOT and PLT
> with the main partition. This means that all addresses associated with
> symbols will continue to be fixed.
>
> In order to cause the dynamic loader to reserve address space for the
> loadable partitions so that they can be loaded at the correct address
> later, a PT_LOAD segment is added to the main partition that allocates a
> page of bss at the address one byte past the end of the last address in the
> last partition. In the Android dynamic loader at least, this is enough to
> cause the required space to be reserved. Other platforms would need to
> ensure that their dynamic loader implements similar behaviour.
>
> I haven't thought about how this feature will interact with linker
> scripts. At least to start with we will likely need to forbid using this
> feature together with the PHDRS or SECTIONS linker script directives.
>
> Some sections will need to be present in each partition (e.g. .interp and
> .note sections). Probably the most straightforward way to do this will be
> to cause the linker to create a clone of these sections for each partition.
>
> == Other use cases ==
>
> An example of another use case for this feature could be an operating
> system API which is exposed across multiple DSOs. Typically these DSOs will
> be implemented using private APIs that are not exposed to the application.
> This feature would allow you to create a common DSO that contains the
> shared code implementing the private APIs (i.e. the main partition),
> together with individual DSOs (i.e. the loadable partitions) that use the
> private APIs and expose the public ones, but without actually exposing the
> private APIs in the dynamic symbol table or paying the binary size cost of
> doing so.
>
> == Prototype ==
>
> A prototype/proof of concept of this feature has been implemented here:
> https://github.com/pcc/llvm-project/tree/lld-module-symbols
> There is a test app in the test-progs/app directory that demonstrates the
> feature on Android with a simple hello world app (based on
> https://www.hanshq.net/command-line-android.html ). I have successfully
> tested debugging the loadable partition with gdb (e.g. setting breakpoints
> and printing globals), but getting unwinding working will need a bit more
> work.
>
> Note that the feature as exposed by the prototype is different from what
> I'm proposing here, e.g. it uses a linker flag to specify which symbols go
> in which partitions. I think the best place to specify this information is
> at either the source level or the compiler flag level, so that is what I
> intend to implement.
>
> Thanks,
> --
> --
> Peter
>
>
This is an interesting idea.  What's the behaivor for calling an unloaded
function?  Would it be hard to make that force a load?  An interesting way
to do it would be to map the other partitions without execute privs and
load the partition on a fault, although this would require restricting the
alignment of partitions to the page size.

- Michael Spencer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190226/690e628d/attachment.html>