[cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries

Wed Feb 27 16:23:57 PST 2019

On Wed, Feb 27, 2019 at 3:42 PM Eli Friedman <efriedma at quicinc.com> wrote:

> Comments inline
>
>
>
> *From:* Peter Collingbourne <peter at pcc.me.uk>
> *Sent:* Tuesday, February 26, 2019 7:48 PM
> *To:* Eli Friedman <efriedma at quicinc.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org; George
> Rimar <grimar at accesssoftek.com>
> *Subject:* [EXT] Re: [cfe-dev] RFC: Linker feature for automatically
> partitioning a program into multiple binaries
>
>
>
> On Tue, Feb 26, 2019 at 6:41 PM Eli Friedman <efriedma at quicinc.com> wrote:
>
> This seems like a very complicated approach… do you have some numbers to
> give some idea how much of an improvement we’re talking about here over a
> more conventional solution involving shared libraries?  Or have you not
> gotten that far?
>
>
>
> I can talk to my internal customer to see what kind of overhead they were
> seeing. But I do know that at the start of the project they did evaluate
> using regular dynamic linking for the feature partitions, and that was
> quickly rejected in favour of other approaches due to the code size and
> maintenance overhead. And with control flow integrity the binary size of
> the cross-DSO metadata dwarfed the binary size savings that they were
> hoping to gain by splitting their program in two.
>
>
>
> Furthermore, there are things that you simply cannot do with a more
> conventional approach, such as optimizations relying on whole-program
> information (like whole-program devirtualization, which helps significantly
> in my customer's program).
>
>
>
> Okay.
>
> What’s the tradeoff involved in the specific sections you chose to split?
> It seems like it would be possible to, for example, split the GOT, or avoid
> splitting the relocation/EH/etc. sections.  Some variation would require
> different runtime support, I guess.
>
>
>
> We could certainly consider having multiple GOTs which are allocated to
> partitions in the same way as sections are. This might be useful if for
> example one of the partitions references a DSO that is unused by the main
> program and we need to avoid having the main program depend on the DSO. But
> I consider this an optimization over the proposed approach and not
> something that would be strictly required for correctness. I chose to omit
> this for now for the sake of simplicity and because my customer does not
> require it for now.
>
>
>
> I think we need to split the dynamic relocation section because otherwise
> the dynamic loader will try to relocate the unreadable memory of the other
> partitions and cause a SIGSEGV. Similarly, we need to split the EH sections
> because unwinders will generally expect to be able to find the unwind info
> for a function by enumerating PT_LOADs to map an address onto a DSO and
> then using that DSO's PT_ARM_EXIDX/PT_GNU_EH_FRAME to find the unwind info.
> See for example what libunwind does:
>
>
> https://github.com/llvm/llvm-project/blob/e739ac0e255597d818c907223034ddf3bc18a593/libunwind/src/AddressSpace.hpp#L523
>
>
>
> As you point out, the latter part could vary based on the runtime, but I
> don't see a strong reason to do it another way.
>
>
>
> I could imagine a different approach where the main executable contains
> everything except some non-relocatable read-only sections, and you just
> write a small “loader” which just mmaps the raw text/rodata sections into
> the right spot when they’re necessary. But that makes sense.
>

That might work too, I suppose, and it might be worth considering as an
alternative model if the system loader cannot be changed. But you wouldn't
be able to dlsym the partitions (unless you parse the dynsym yourself, but
that costs binary size and wouldn't be compatible with other programs that
might expect to be able to use the system loader's dlsym, or I suppose you
could have dynsym just in the main partition, but that also costs binary
size and seems more error prone), and if the system loader is involved
you'd also need a proper ELF header and unwinding phdr... and at that point
you might as well not leave the additional binary size gains of moving the
relocatable sections on the table.

>
> It looks like this doesn’t include a proposal for the corresponding LLVM
> IR extension?  I think it might be sort of complicated to define correctly…
> specifically, in terms of what it means to “use” a function or global from
> a different partition (so the program doesn’t try to speculatively access
> something which isn’t loaded).  This could come up even without LTO if you
> have C++ inline functions, since all functions with weak linkage have to be
> in the first partition.  (At least, I think they do, unless you invent a
> new kind of “partition” visibility for this.)
>
>
>
> The idea here is that for code to "use" a function or global is exactly
> the same thing as having a relocation pointing to it. This is the same
> principle that is used to implement --gc-sections.
>
>
>
> So for a program to end up accessing a section (speculatively or
> otherwise) there needs to be a chain of relocations referring to it from
> the entry points. That would force the section into either the main
> partition or the same partition as the referent.
>
>
>
> Another way to think about it is: when I load the main partition into
> memory, I have loaded all code that is reachable from the main partition's
> entry points. Now I dynamically load a feature partition. I've now loaded
> all code that is reachable from the combination of the main partition and
> the feature partition's entry points. That's pretty much the same thing as
> having first loaded a conventional ELF DSO linked with --gc-sections with
> just the main partition's entry points, and then replacing it with a second
> DSO linked with --gc-sections with the main partition + feature partition's
> entry points, except that none of the addresses in the main partition
> happen to have changed. So if --gc-sections works, this should work too.
>
>
>
> You might be wondering: what happens if I directly reference one of the
> feature partition's entry points from the main partition? Well, something
> interesting will happen. The feature partition's dynamic symbol table will
> contain an entry for the entry point, but the entry's address will point
> inside the main partition. This should work out just fine because the main
> partition is guaranteed to be loaded if the feature partition is also
> loaded. (Which is the same reason why direct pc-relative references from
> the feature partition to the main partition will also work.)
>
>
>
> I don't think any significant IR extensions are necessary here, except
> perhaps for the part involving attaching the -fsymbol-partition names to
> globals, but I think that part is mostly trivial and it would probably end
> up looking like the custom section name field.
>
>
>
> I'm not sure I understand how weak linkage is impacted here. With this
> nothing special happens inside the linker until we start handling
> --gc-sections, and by that time weak/strong resolution has already
> happened. In ELF, dynamic loaders do not care about symbol bindings (except
> for weak undefined symbols), so we get the same result whether the symbols
> are weak or not.
>
>
>
> Oh, that model is simpler than what I was thinking.  I was expecting that
> you were partitioning the code based on certain marked entry points,
> regardless of how those entry points were actually used.  But if code in
> the main partition can’t directly refer to code in any other partition, how
> do you actually call code in other partitions?  dlsym?
>

Yes, dlsym is the intended usage model.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190227/aeab6cf7/attachment.html>