[llvm-dev] [cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries

Wed Feb 27 15:41:57 PST 2019

Comments inline

From: Peter Collingbourne <peter at pcc.me.uk>
Sent: Tuesday, February 26, 2019 7:48 PM
To: Eli Friedman <efriedma at quicinc.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org; George Rimar <grimar at accesssoftek.com>
Subject: [EXT] Re: [cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries

On Tue, Feb 26, 2019 at 6:41 PM Eli Friedman <efriedma at quicinc.com<mailto:efriedma at quicinc.com>> wrote:
This seems like a very complicated approach… do you have some numbers to give some idea how much of an improvement we’re talking about here over a more conventional solution involving shared libraries?  Or have you not gotten that far?

I can talk to my internal customer to see what kind of overhead they were seeing. But I do know that at the start of the project they did evaluate using regular dynamic linking for the feature partitions, and that was quickly rejected in favour of other approaches due to the code size and maintenance overhead. And with control flow integrity the binary size of the cross-DSO metadata dwarfed the binary size savings that they were hoping to gain by splitting their program in two.

Furthermore, there are things that you simply cannot do with a more conventional approach, such as optimizations relying on whole-program information (like whole-program devirtualization, which helps significantly in my customer's program).

Okay.
What’s the tradeoff involved in the specific sections you chose to split?  It seems like it would be possible to, for example, split the GOT, or avoid splitting the relocation/EH/etc. sections.  Some variation would require different runtime support, I guess.

We could certainly consider having multiple GOTs which are allocated to partitions in the same way as sections are. This might be useful if for example one of the partitions references a DSO that is unused by the main program and we need to avoid having the main program depend on the DSO. But I consider this an optimization over the proposed approach and not something that would be strictly required for correctness. I chose to omit this for now for the sake of simplicity and because my customer does not require it for now.

I think we need to split the dynamic relocation section because otherwise the dynamic loader will try to relocate the unreadable memory of the other partitions and cause a SIGSEGV. Similarly, we need to split the EH sections because unwinders will generally expect to be able to find the unwind info for a function by enumerating PT_LOADs to map an address onto a DSO and then using that DSO's PT_ARM_EXIDX/PT_GNU_EH_FRAME to find the unwind info. See for example what libunwind does:
https://github.com/llvm/llvm-project/blob/e739ac0e255597d818c907223034ddf3bc18a593/libunwind/src/AddressSpace.hpp#L523

As you point out, the latter part could vary based on the runtime, but I don't see a strong reason to do it another way.

I could imagine a different approach where the main executable contains everything except some non-relocatable read-only sections, and you just write a small “loader” which just mmaps the raw text/rodata sections into the right spot when they’re necessary. But that makes sense.

It looks like this doesn’t include a proposal for the corresponding LLVM IR extension?  I think it might be sort of complicated to define correctly… specifically, in terms of what it means to “use” a function or global from a different partition (so the program doesn’t try to speculatively access something which isn’t loaded).  This could come up even without LTO if you have C++ inline functions, since all functions with weak linkage have to be in the first partition.  (At least, I think they do, unless you invent a new kind of “partition” visibility for this.)

The idea here is that for code to "use" a function or global is exactly the same thing as having a relocation pointing to it. This is the same principle that is used to implement --gc-sections.

So for a program to end up accessing a section (speculatively or otherwise) there needs to be a chain of relocations referring to it from the entry points. That would force the section into either the main partition or the same partition as the referent.

Another way to think about it is: when I load the main partition into memory, I have loaded all code that is reachable from the main partition's entry points. Now I dynamically load a feature partition. I've now loaded all code that is reachable from the combination of the main partition and the feature partition's entry points. That's pretty much the same thing as having first loaded a conventional ELF DSO linked with --gc-sections with just the main partition's entry points, and then replacing it with a second DSO linked with --gc-sections with the main partition + feature partition's entry points, except that none of the addresses in the main partition happen to have changed. So if --gc-sections works, this should work too.

You might be wondering: what happens if I directly reference one of the feature partition's entry points from the main partition? Well, something interesting will happen. The feature partition's dynamic symbol table will contain an entry for the entry point, but the entry's address will point inside the main partition. This should work out just fine because the main partition is guaranteed to be loaded if the feature partition is also loaded. (Which is the same reason why direct pc-relative references from the feature partition to the main partition will also work.)

I don't think any significant IR extensions are necessary here, except perhaps for the part involving attaching the -fsymbol-partition names to globals, but I think that part is mostly trivial and it would probably end up looking like the custom section name field.

I'm not sure I understand how weak linkage is impacted here. With this nothing special happens inside the linker until we start handling --gc-sections, and by that time weak/strong resolution has already happened. In ELF, dynamic loaders do not care about symbol bindings (except for weak undefined symbols), so we get the same result whether the symbols are weak or not.

Oh, that model is simpler than what I was thinking.  I was expecting that you were partitioning the code based on certain marked entry points, regardless of how those entry points were actually used.  But if code in the main partition can’t directly refer to code in any other partition, how do you actually call code in other partitions?  dlsym?

-Eli

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190227/e5f4c757/attachment.html>