[llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Mon Jun 12 14:13:47 PDT 2017

On Mon, Jun 12, 2017 at 1:03 PM, Stephen Crane <sjc at immunant.com> wrote:

> I don't have performance measurements for the new LTO version of
> pagerando yet. I'll definitely be thoroughly measuring performance
> once the current prototype is finished before moving forward, and will
> post results when I have them.
>
> I'm definitely curious about your work and its performance impact.
> Were you randomizing the layout of functions during linking by
> reordering function sections? Or did just enabling -ffunction-sections
> tank performance?
>
> Thanks,
> Stephen
>

-ffunction-sections plus randomization of text section order in the linker
was a huge performance hit. It may well be different with only randomizing
4k groupings of sections instead.

- Michael Spencer

>
> On Sat, Jun 10, 2017 at 8:39 PM, Davide Italiano <davide at freebsd.org>
> wrote:
> > On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <davide at freebsd.org>
> wrote:
> >> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
> >> <llvm-dev at lists.llvm.org> wrote:
> >>> This RFC describes pagerando, an improvement upon ASLR for shared
> >>> libraries. We're planning to submit this work for upstreaming and
> >>> would appreciate feedback before we get to the patch submission stage.
> >>>
> >>> Pagerando randomizes the location of individual memory pages (ASLR
> >>> only randomizes the library base address). This increases security
> >>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
> >>> Pagerando splits libraries into page-aligned bins at compile time. At
> >>> load time, each bin is mapped to a random address. The code in each
> >>> bin is immutable and thus shared between processes.
> >>>
> >>> To implement pagerando, the compiler and linker need to build shared
> >>> libraries with text segments split into page-aligned (and ideally
> >>> page-sized) bins. All inter-bin references are indirected through a
> >>> table initialized by the dynamic loader that holds the absolute
> >>> address of each bin. At load time the loader randomly chooses an
> >>> address for each bin and maps the bin pages from disk into memory.
> >>>
> >>> We're focusing on ARM and AArch64 initially, although there is nothing
> >>> particularly target specific that precludes support for other LLVM
> >>> backends.
> >>>
> >>> ## Design Goals
> >>>
> >>> 1. Improve security over ASLR. The randomization granularity
> >>> determines how much information a single code pointer leaks. A pointer
> >>> to a page reveals less about the location of other code than a pointer
> >>> into a contiguous library would.
> >>> 2. Avoid randomizing files on disk. Modern operating systems provide
> >>> verified boot techniques to detect tampering with files. Randomizing
> >>> the on-disk layout of system libraries would interfere with the
> >>> trusted boot process. Randomizing libraries at compile or link time
> >>> would also needlessly complicate deployment and provisioning.
> >>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
> >>> shared file pages to the same physical memory in each process and
> >>> locates these pages at different virtual addresses with ASLR. To
> >>> preserve sharing of code pages, we cannot modify the contents of
> >>> file-mapped pages at load time and are restricted to changing their
> >>> ordering and placement in the virtual address space.
> >>> 4. Backwards compatibility. Randomized code must interoperate
> >>> transparently with existing, unmodified executables and shared
> >>> libraries. Calls into randomized code must work as-is according to the
> >>> normal ABI.
> >>> 5. Compatibility with other mitigations. Enabling randomization must
> >>> not preclude deploying other mitigations such as control-flow
> >>> integrity as well.
> >>>
> >>> ## Pagerando Design
> >>>
> >>> Pagerando requires a platform-specific extension to the dynamic
> >>> loading ABI for compatible libraries to opt-in to. In order to
> >>> decouple the address of each code bin (segment) from that of other
> >>> bins and global data, we must disallow relative addressing between
> >>> different bin segments as well as between legacy segments and bin
> >>> segments.
> >>>
> >>> To prepare a library for pagerando, the compiler must first allocate
> >>> functions into page-aligned bins corresponding to segments in the
> >>> final ELF file. Since these bins will be independently positioned, the
> >>> compiler must redirect all inter-bin references through an indirection
> >>> table – the Page Offset Table (POT) – which stores the virtual address
> >>> of each bin in the library. Indices of POT entries and bin offsets are
> >>> statically determined at link time so code will not require any
> >>> dynamic relocations to reference functions in another bin or globals
> >>> outside of bins. We reserve a register in pagerando-compatible code to
> >>> hold the address of the POT. This register is initialized on entry to
> >>> the shared library. At load time the dynamic loader maps code bins at
> >>> independent, random addresses and updates the dynamic relocations in
> >>> the POT.
> >>>
> >>> Reserving a register to hold the POT address changes the internal ABI
> >>> calling convention and requires that the POT register be correctly
> >>> initialized when entering a library from external code. To initialize
> >>> the register, the compiler emits entry wrappers which save the old
> >>> contents of the POT register if necessary, initialize the POT
> >>> register, and call the target function. Each externally visible
> >>> function (conservatively including all address taken functions) needs
> >>> an entry wrapper which replaces the function for all external uses.
> >>>
> >>> To optimally pack functions into bins and avoid new static
> >>> relocations, we propose using (traditional) LTO. With new static
> >>> relocations (i.e. linker cooperation), LTO would not be necessary, but
> >>> it is still desirable for more efficient bin packing.
> >>>
> >>> The design of pagerando is based on the mitigations proposed by Backes
> >>> and Nürnberger [1], with improvements for compatibility and
> >>> deployability. The present design is a refinement of our first
> >>> pagerando prototype [2].
> >>>
> >>> ## LLVM Changes
> >>>
> >>> To implement pagerando, we propose the following LLVM changes:
> >>>
> >>> New module pass to create entry wrapper functions. This pass will
> >>> create entry wrappers as described above and replace exported function
> >>> names and all address taken uses with the wrapper. This pass will only
> >>> be run when pagerando is enabled.
> >>>
> >>> Instruction Lowering. Pagerando-compatible code must access all global
> >>> values (including functions) through the POT since PC-relative memory
> >>> addressing is not allowed between a bin and another segment. We
> >>> propose that when pagerando is enabled, all global variable accesses
> >>> from functions marked as pagerando-compatible must be lowered into
> >>> GOT-relative accesses and added to the GOT address loaded from the POT
> >>> (currently stored in the first POT entry). Lowering of direct function
> >>> calls targeting pagerando-compatible code is slightly more complicated
> >>> because we need to determine the POT index of the bin containing the
> >>> target function if the target is not in the same bin. However, we
> >>> can't properly allocate functions to bins before they are lowered and
> >>> an approximate size is available. Therefore, during lowering we should
> >>> assume that all function calls must be made indirectly through the POT
> >>> with the computation of the POT index and bin offset of the target
> >>> function postponed until assembly printing.
> >>>
> >>> New machine module LTO pass to allocate functions into bins. This pass
> >>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
> >>> (MachineInstr) so that it knows (approximately) how large the final
> >>> function code will be. Functions can also be packed in such a way that
> >>> the number of inter-bin calls are minimized by taking the function
> >>> call graph and/or execution profiles into account while packing. This
> >>> pass only needs to run when pagerando is enabled.
> >>>
> >>> Code Emission. After functions are assigned to bins, we create an
> >>> individual MCSection for each bin. These MCSections will map to
> >>> independent segments during linking. The AsmPrinter is responsible for
> >>> emitting the POT entries during code emission. We cannot easily
> >>> represent the POT as a standard IR object because it needs to contain
> >>> bin (MCSection) addresses. The AsmPrinter instead can query the
> >>> MCContext for the list of bin symbols and emit these symbols directly
> >>> into a global POT array.
> >>>
> >>> Gold Plugin Interface. If using LTO to build the module, LLVM can
> >>> generate the complete POT for the module and instrument all references
> >>> that need to use the POT. However, we must still ensure that bin
> >>> sections are each placed into an independent segment so that the
> >>> dynamic loader can map each bin separately. The gold plugin interface
> >>> currently provides support to assign sections to unique output
> >>> segments. However, it does not yet provide plugins an opportunity to
> >>> call this interface for new, plugin-created input files. Gold requires
> >>> that the plugin provide the file handle of the input section to assign
> >>> a section to a unique segment. We will need to upstream a small patch
> >>> for gold that provides a new callback to the LTO plugin when gold
> >>> receives a new, plugin-generated input file. This would allow the
> >>> plugin to obtain the new file’s handle and map its sections to unique
> >>> segments. The linker must mark pagerando bin segments in such a way
> >>> that the dynamic loader knows that it can randomize each bin segment
> >>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
> >>> communicate this for each compatible segment. The compiler and/or
> >>> linker must add this flag to compatible segments for the loader to
> >>> recognize and randomize the relevant segments.
> >>>
> >>> ## Target-Specific Details
> >>>
> >>> We will initially support pagerando for ARM and AArch64, so several
> >>> details are worth considering on those targets. For ARM/AArch64, the
> >>> r9 register is a platform-specific register that can be used as the
> >>> static base register, which is similar in many ways to pagerando. When
> >>> not specified by the platform, r9 is a callee-saved general-purpose
> >>> register. Thus, using r9 as the POT register will be backwards
> >>> compatible when calling out of pagerando code into either legacy code
> >>> or a different module; the callee will preserve r9 for use after
> >>> returning to pagerando code. In AArch64, r18 is designated as a
> >>> platform-specific register, however, it is not specified as
> >>> callee-saved when not reserved by the target platform. Thus, to
> >>> interoperate with unmodified legacy AArch64 software, we would need to
> >>> save r18 in pagerando code before calling into any external code. When
> >>> using LTO, the compiler will see the entire module and therefore be
> >>> able to identify calls into external vs internal code. Without LTO, it
> >>> will likely be more efficient to use a callee-saved register to avoid
> >>> the need to save the POT register before each call. We will experiment
> >>> with both caller- and callee-saved registers to determine which is
> >>> most efficient.
> >>>
> >>>
> >>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
> >>> randomization practical by allowing code sharing. In USENIX Security
> >>> Symposium, 2014. https://www.usenix.org/node/184466
> >>>
> >>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
> >>> we solved this problem yet? In IEEE Cybersecurity Development
> >>> Conference (SecDev), 2016.
> >>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
> >> Out of curiosity, Did you measure what's the impact on performances
> >> of the generated executable? We tried something akin to your proposal
> >> in the past (i.e. randomizing ELF sections layout) and it turned out to
> be a
> >> sledgehammer for performances (in some cases, i.e. when
> >> -ffunction-sections/-fdata-sections was specified the performances of
> >> the runtime executable dropped by > 10% [cc:ing Michael as he did the
> >> measurements]).
> >>
> >
> > To clarify, I read your paper and I see some benchmarks see
> > substantial degradations (6.5%), but in your "future work" section you
> > describe techniques to mitigate the drop, and I wonder if you ever got
> > to implement them and got new measurements.
> >
> > Thanks,
> >
> > --
> > Davide
> >
> > "There are no solved problems; there are only problems that are more
> > or less solved" -- Henri Poincare
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170612/e582e5a8/attachment-0001.html>