<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019 at 3:42 PM Eli Friedman <<a href="mailto:efriedma@quicinc.com" target="_blank">efriedma@quicinc.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div lang="EN-US">

<div class="m_8183048889993952929gmail-m_9174425107466191899WordSection1">

<p class="MsoNormal">Comments inline<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal" style="margin-left:0.5in"><b>From:</b> Peter Collingbourne <<a href="mailto:peter@pcc.me.uk" target="_blank">peter@pcc.me.uk</a>>

<br>

<b>Sent:</b> Tuesday, February 26, 2019 7:48 PM<br>

<b>To:</b> Eli Friedman <<a href="mailto:efriedma@quicinc.com" target="_blank">efriedma@quicinc.com</a>><br>

<b>Cc:</b> llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>; <a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>; George Rimar <<a href="mailto:grimar@accesssoftek.com" target="_blank">grimar@accesssoftek.com</a>><br>

<b>Subject:</b> [EXT] Re: [cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries<u></u><u></u></p>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

<div>

<div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">On Tue, Feb 26, 2019 at 6:41 PM Eli Friedman <<a href="mailto:efriedma@quicinc.com" target="_blank">efriedma@quicinc.com</a>> wrote:<u></u><u></u></p>

</div>

<div>

<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">

<div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">

This seems like a very complicated approach… do you have some numbers to give some idea how much of an improvement we’re talking about here over a more conventional solution involving shared libraries?  Or have you not gotten that far?<u></u><u></u></p>

</div>

</div>

</blockquote>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">I can talk to my internal customer to see what kind of overhead they were seeing. But I do know that at the start of the project they did evaluate using regular dynamic linking for the feature partitions, and that

 was quickly rejected in favour of other approaches due to the code size and maintenance overhead. And with control flow integrity the binary size of the cross-DSO metadata dwarfed the binary size savings that they were hoping to gain by splitting their program

 in two.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">Furthermore, there are things that you simply cannot do with a more conventional approach, such as optimizations relying on whole-program information (like whole-program devirtualization, which helps significantly

 in my customer's program).<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Okay.<u></u><u></u></p>

</div>

<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">

<div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">

What’s the tradeoff involved in the specific sections you chose to split?  It seems like it would be possible to, for example, split the GOT, or avoid splitting the relocation/EH/etc. sections.  Some variation would require different runtime support, I guess.<u></u><u></u></p>

</div>

</div>

</blockquote>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">We could certainly consider having multiple GOTs which are allocated to partitions in the same way as sections are. This might be useful if for example one of the partitions references a DSO that is unused by the

 main program and we need to avoid having the main program depend on the DSO. But I consider this an optimization over the proposed approach and not something that would be strictly required for correctness. I chose to omit this for now for the sake of simplicity

 and because my customer does not require it for now.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">I think we need to split the dynamic relocation section because otherwise the dynamic loader will try to relocate the unreadable memory of the other partitions and cause a SIGSEGV. Similarly, we need to split the

 EH sections because unwinders will generally expect to be able to find the unwind info for a function by enumerating PT_LOADs to map an address onto a DSO and then using that DSO's PT_ARM_EXIDX/PT_GNU_EH_FRAME to find the unwind info. See for example what

 libunwind does:<u></u><u></u></p>

</div>

<div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><a href="https://github.com/llvm/llvm-project/blob/e739ac0e255597d818c907223034ddf3bc18a593/libunwind/src/AddressSpace.hpp#L523" target="_blank">https://github.com/llvm/llvm-project/blob/e739ac0e255597d818c907223034ddf3bc18a593/libunwind/src/AddressSpace.hpp#L523</a><u></u><u></u></p>

</div>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">As you point out, the latter part could vary based on the runtime, but I don't see a strong reason to do it another way.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">I could imagine a different approach where the main executable contains everything except some non-relocatable read-only sections, and you just write a small “loader” which just mmaps the raw text/rodata sections into the right spot when

 they’re necessary. But that makes sense.</p></div></div></div></div></div></div></blockquote><div><br></div><div>That might work too, I suppose, and it might be worth considering as an alternative model if the system loader cannot be changed. But you wouldn't be able to dlsym the partitions (unless you parse the dynsym yourself, but that costs binary size and wouldn't be compatible with other programs that might expect to be able to use the system loader's dlsym, or I suppose you could have dynsym just in the main partition, but that also costs binary size and seems more error prone), and if the system loader is involved you'd also need a proper ELF header and unwinding phdr... and at that point you might as well not leave the additional binary size gains of moving the relocatable sections on the table.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div lang="EN-US"><div class="m_8183048889993952929gmail-m_9174425107466191899WordSection1"><div><div><div><div><p class="MsoNormal"><u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

</div>

<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">

<div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">

It looks like this doesn’t include a proposal for the corresponding LLVM IR extension?  I think it might be sort of complicated to define correctly… specifically, in terms of what it means to “use” a function or global from a different partition (so the program

 doesn’t try to speculatively access something which isn’t loaded).  This could come up even without LTO if you have C++ inline functions, since all functions with weak linkage have to be in the first partition.  (At least, I think they do, unless you invent

 a new kind of “partition” visibility for this.)<u></u><u></u></p>

</div>

</div>

</blockquote>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">The idea here is that for code to "use" a function or global is exactly the same thing as having a relocation pointing to it. This is the same principle that is used to implement --gc-sections.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">So for a program to end up accessing a section (speculatively or otherwise) there needs to be a chain of relocations referring to it from the entry points. That would force the section into either the main partition

 or the same partition as the referent.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">Another way to think about it is: when I load the main partition into memory, I have loaded all code that is reachable from the main partition's entry points. Now I dynamically load a feature partition. I've now

 loaded all code that is reachable from the combination of the main partition and the feature partition's entry points. That's pretty much the same thing as having first loaded a conventional ELF DSO linked with --gc-sections with just the main partition's

 entry points, and then replacing it with a second DSO linked with --gc-sections with the main partition + feature partition's entry points, except that none of the addresses in the main partition happen to have changed. So if --gc-sections works, this should

 work too.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">You might be wondering: what happens if I directly reference one of the feature partition's entry points from the main partition? Well, something interesting will happen. The feature partition's dynamic symbol table

 will contain an entry for the entry point, but the entry's address will point inside the main partition. This should work out just fine because the main partition is guaranteed to be loaded if the feature partition is also loaded. (Which is the same reason

 why direct pc-relative references from the feature partition to the main partition will also work.)<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">I don't think any significant IR extensions are necessary here, except perhaps for the part involving attaching the -fsymbol-partition names to globals, but I think that part is mostly trivial and it would probably

 end up looking like the custom section name field.<u></u><u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in"><u></u> <u></u></p>

</div>

<div>

<p class="MsoNormal" style="margin-left:0.5in">I'm not sure I understand how weak linkage is impacted here. With this nothing special happens inside the linker until we start handling --gc-sections, and by that time weak/strong resolution has already happened.

 In ELF, dynamic loaders do not care about symbol bindings (except for weak undefined symbols), so we get the same result whether the symbols are weak or not.<u></u><u></u></p>

<p class="MsoNormal"><u></u> <u></u></p>

<p class="MsoNormal">Oh, that model is simpler than what I was thinking.  I was expecting that you were partitioning the code based on certain marked entry points, regardless of how those entry points were actually used.  But if code in the main partition can’t

 directly refer to code in any other partition, how do you actually call code in other partitions?  dlsym?</p></div></div></div></div></div></div></blockquote><div><br></div><div>Yes, dlsym is the intended usage model.</div></div><div><br></div><div>Thanks,</div>-- <br><div dir="ltr" class="m_8183048889993952929gmail_signature"><div dir="ltr">-- <div>Peter</div></div></div></div>