<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature">On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <span dir="ltr"><<a href="mailto:sjc@immunant.com" target="_blank">sjc@immunant.com</a>></span> wrote:<br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I don't have performance measurements for the new LTO version of<br>

pagerando yet. I'll definitely be thoroughly measuring performance<br>

once the current prototype is finished before moving forward, and will<br>

post results when I have them.<br>

<br>

I'm definitely curious about your work and its performance impact.<br>

Were you randomizing the layout of functions during linking by<br>

reordering function sections? Or did just enabling -ffunction-sections<br>

tank performance?<br>

<br>

Thanks,<br>

Stephen<br></blockquote><div><br></div><div>-ffunction-sections plus randomization of text section order in the linker was a huge performance hit. It may well be different with only randomizing 4k groupings of sections instead.</div><div><br></div><div>- Michael Spencer<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<div class="gmail-HOEnZb"><div class="gmail-h5"><br>

On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <<a href="mailto:davide@freebsd.org">davide@freebsd.org</a>> wrote:<br>

> On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <<a href="mailto:davide@freebsd.org">davide@freebsd.org</a>> wrote:<br>

>> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev<br>

>> <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br>

>>> This RFC describes pagerando, an improvement upon ASLR for shared<br>

>>> libraries. We're planning to submit this work for upstreaming and<br>

>>> would appreciate feedback before we get to the patch submission stage.<br>

>>><br>

>>> Pagerando randomizes the location of individual memory pages (ASLR<br>

>>> only randomizes the library base address). This increases security<br>

>>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.<br>

>>> Pagerando splits libraries into page-aligned bins at compile time. At<br>

>>> load time, each bin is mapped to a random address. The code in each<br>

>>> bin is immutable and thus shared between processes.<br>

>>><br>

>>> To implement pagerando, the compiler and linker need to build shared<br>

>>> libraries with text segments split into page-aligned (and ideally<br>

>>> page-sized) bins. All inter-bin references are indirected through a<br>

>>> table initialized by the dynamic loader that holds the absolute<br>

>>> address of each bin. At load time the loader randomly chooses an<br>

>>> address for each bin and maps the bin pages from disk into memory.<br>

>>><br>

>>> We're focusing on ARM and AArch64 initially, although there is nothing<br>

>>> particularly target specific that precludes support for other LLVM<br>

>>> backends.<br>

>>><br>

>>> ## Design Goals<br>

>>><br>

>>> 1. Improve security over ASLR. The randomization granularity<br>

>>> determines how much information a single code pointer leaks. A pointer<br>

>>> to a page reveals less about the location of other code than a pointer<br>

>>> into a contiguous library would.<br>

>>> 2. Avoid randomizing files on disk. Modern operating systems provide<br>

>>> verified boot techniques to detect tampering with files. Randomizing<br>

>>> the on-disk layout of system libraries would interfere with the<br>

>>> trusted boot process. Randomizing libraries at compile or link time<br>

>>> would also needlessly complicate deployment and provisioning.<br>

>>> 3. Preserve code page sharing. The OS reduces memory usage by mapping<br>

>>> shared file pages to the same physical memory in each process and<br>

>>> locates these pages at different virtual addresses with ASLR. To<br>

>>> preserve sharing of code pages, we cannot modify the contents of<br>

>>> file-mapped pages at load time and are restricted to changing their<br>

>>> ordering and placement in the virtual address space.<br>

>>> 4. Backwards compatibility. Randomized code must interoperate<br>

>>> transparently with existing, unmodified executables and shared<br>

>>> libraries. Calls into randomized code must work as-is according to the<br>

>>> normal ABI.<br>

>>> 5. Compatibility with other mitigations. Enabling randomization must<br>

>>> not preclude deploying other mitigations such as control-flow<br>

>>> integrity as well.<br>

>>><br>

>>> ## Pagerando Design<br>

>>><br>

>>> Pagerando requires a platform-specific extension to the dynamic<br>

>>> loading ABI for compatible libraries to opt-in to. In order to<br>

>>> decouple the address of each code bin (segment) from that of other<br>

>>> bins and global data, we must disallow relative addressing between<br>

>>> different bin segments as well as between legacy segments and bin<br>

>>> segments.<br>

>>><br>

>>> To prepare a library for pagerando, the compiler must first allocate<br>

>>> functions into page-aligned bins corresponding to segments in the<br>

>>> final ELF file. Since these bins will be independently positioned, the<br>

>>> compiler must redirect all inter-bin references through an indirection<br>

>>> table – the Page Offset Table (POT) – which stores the virtual address<br>

>>> of each bin in the library. Indices of POT entries and bin offsets are<br>

>>> statically determined at link time so code will not require any<br>

>>> dynamic relocations to reference functions in another bin or globals<br>

>>> outside of bins. We reserve a register in pagerando-compatible code to<br>

>>> hold the address of the POT. This register is initialized on entry to<br>

>>> the shared library. At load time the dynamic loader maps code bins at<br>

>>> independent, random addresses and updates the dynamic relocations in<br>

>>> the POT.<br>

>>><br>

>>> Reserving a register to hold the POT address changes the internal ABI<br>

>>> calling convention and requires that the POT register be correctly<br>

>>> initialized when entering a library from external code. To initialize<br>

>>> the register, the compiler emits entry wrappers which save the old<br>

>>> contents of the POT register if necessary, initialize the POT<br>

>>> register, and call the target function. Each externally visible<br>

>>> function (conservatively including all address taken functions) needs<br>

>>> an entry wrapper which replaces the function for all external uses.<br>

>>><br>

>>> To optimally pack functions into bins and avoid new static<br>

>>> relocations, we propose using (traditional) LTO. With new static<br>

>>> relocations (i.e. linker cooperation), LTO would not be necessary, but<br>

>>> it is still desirable for more efficient bin packing.<br>

>>><br>

>>> The design of pagerando is based on the mitigations proposed by Backes<br>

>>> and Nürnberger [1], with improvements for compatibility and<br>

>>> deployability. The present design is a refinement of our first<br>

>>> pagerando prototype [2].<br>

>>><br>

>>> ## LLVM Changes<br>

>>><br>

>>> To implement pagerando, we propose the following LLVM changes:<br>

>>><br>

>>> New module pass to create entry wrapper functions. This pass will<br>

>>> create entry wrappers as described above and replace exported function<br>

>>> names and all address taken uses with the wrapper. This pass will only<br>

>>> be run when pagerando is enabled.<br>

>>><br>

>>> Instruction Lowering. Pagerando-compatible code must access all global<br>

>>> values (including functions) through the POT since PC-relative memory<br>

>>> addressing is not allowed between a bin and another segment. We<br>

>>> propose that when pagerando is enabled, all global variable accesses<br>

>>> from functions marked as pagerando-compatible must be lowered into<br>

>>> GOT-relative accesses and added to the GOT address loaded from the POT<br>

>>> (currently stored in the first POT entry). Lowering of direct function<br>

>>> calls targeting pagerando-compatible code is slightly more complicated<br>

>>> because we need to determine the POT index of the bin containing the<br>

>>> target function if the target is not in the same bin. However, we<br>

>>> can't properly allocate functions to bins before they are lowered and<br>

>>> an approximate size is available. Therefore, during lowering we should<br>

>>> assume that all function calls must be made indirectly through the POT<br>

>>> with the computation of the POT index and bin offset of the target<br>

>>> function postponed until assembly printing.<br>

>>><br>

>>> New machine module LTO pass to allocate functions into bins. This pass<br>

>>> relies on targets implementing TargetInstrInfo::<wbr>getInstSizeInBytes<br>

>>> (MachineInstr) so that it knows (approximately) how large the final<br>

>>> function code will be. Functions can also be packed in such a way that<br>

>>> the number of inter-bin calls are minimized by taking the function<br>

>>> call graph and/or execution profiles into account while packing. This<br>

>>> pass only needs to run when pagerando is enabled.<br>

>>><br>

>>> Code Emission. After functions are assigned to bins, we create an<br>

>>> individual MCSection for each bin. These MCSections will map to<br>

>>> independent segments during linking. The AsmPrinter is responsible for<br>

>>> emitting the POT entries during code emission. We cannot easily<br>

>>> represent the POT as a standard IR object because it needs to contain<br>

>>> bin (MCSection) addresses. The AsmPrinter instead can query the<br>

>>> MCContext for the list of bin symbols and emit these symbols directly<br>

>>> into a global POT array.<br>

>>><br>

>>> Gold Plugin Interface. If using LTO to build the module, LLVM can<br>

>>> generate the complete POT for the module and instrument all references<br>

>>> that need to use the POT. However, we must still ensure that bin<br>

>>> sections are each placed into an independent segment so that the<br>

>>> dynamic loader can map each bin separately. The gold plugin interface<br>

>>> currently provides support to assign sections to unique output<br>

>>> segments. However, it does not yet provide plugins an opportunity to<br>

>>> call this interface for new, plugin-created input files. Gold requires<br>

>>> that the plugin provide the file handle of the input section to assign<br>

>>> a section to a unique segment. We will need to upstream a small patch<br>

>>> for gold that provides a new callback to the LTO plugin when gold<br>

>>> receives a new, plugin-generated input file. This would allow the<br>

>>> plugin to obtain the new file’s handle and map its sections to unique<br>

>>> segments. The linker must mark pagerando bin segments in such a way<br>

>>> that the dynamic loader knows that it can randomize each bin segment<br>

>>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can<br>

>>> communicate this for each compatible segment. The compiler and/or<br>

>>> linker must add this flag to compatible segments for the loader to<br>

>>> recognize and randomize the relevant segments.<br>

>>><br>

>>> ## Target-Specific Details<br>

>>><br>

>>> We will initially support pagerando for ARM and AArch64, so several<br>

>>> details are worth considering on those targets. For ARM/AArch64, the<br>

>>> r9 register is a platform-specific register that can be used as the<br>

>>> static base register, which is similar in many ways to pagerando. When<br>

>>> not specified by the platform, r9 is a callee-saved general-purpose<br>

>>> register. Thus, using r9 as the POT register will be backwards<br>

>>> compatible when calling out of pagerando code into either legacy code<br>

>>> or a different module; the callee will preserve r9 for use after<br>

>>> returning to pagerando code. In AArch64, r18 is designated as a<br>

>>> platform-specific register, however, it is not specified as<br>

>>> callee-saved when not reserved by the target platform. Thus, to<br>

>>> interoperate with unmodified legacy AArch64 software, we would need to<br>

>>> save r18 in pagerando code before calling into any external code. When<br>

>>> using LTO, the compiler will see the entire module and therefore be<br>

>>> able to identify calls into external vs internal code. Without LTO, it<br>

>>> will likely be more efficient to use a callee-saved register to avoid<br>

>>> the need to save the POT register before each call. We will experiment<br>

>>> with both caller- and callee-saved registers to determine which is<br>

>>> most efficient.<br>

>>><br>

>>><br>

>>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory<br>

>>> randomization practical by allowing code sharing. In USENIX Security<br>

>>> Symposium, 2014. <a href="https://www.usenix.org/node/184466" rel="noreferrer" target="_blank">https://www.usenix.org/node/<wbr>184466</a><br>

>>><br>

>>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t<br>

>>> we solved this problem yet? In IEEE Cybersecurity Development<br>

>>> Conference (SecDev), 2016.<br>

>>> <a href="http://www.ics.uci.edu/~perl/sd16_pagerando.pdf" rel="noreferrer" target="_blank">http://www.ics.uci.edu/~perl/<wbr>sd16_pagerando.pdf</a><br>

>>> ______________________________<wbr>_________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

>>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

>><br>

>> Out of curiosity, Did you measure what's the impact on performances<br>

>> of the generated executable? We tried something akin to your proposal<br>

>> in the past (i.e. randomizing ELF sections layout) and it turned out to be a<br>

>> sledgehammer for performances (in some cases, i.e. when<br>

>> -ffunction-sections/-fdata-<wbr>sections was specified the performances of<br>

>> the runtime executable dropped by > 10% [cc:ing Michael as he did the<br>

>> measurements]).<br>

>><br>

><br>

> To clarify, I read your paper and I see some benchmarks see<br>

> substantial degradations (6.5%), but in your "future work" section you<br>

> describe techniques to mitigate the drop, and I wonder if you ever got<br>

> to implement them and got new measurements.<br>

><br>

> Thanks,<br>

><br>

> --<br>

> Davide<br>

><br>

> "There are no solved problems; there are only problems that are more<br>

> or less solved" -- Henri Poincare<br>

</div></div></blockquote></div><br></div></div>