[llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Davide Italiano via llvm-dev llvm-dev at lists.llvm.org
Sat Jun 10 20:39:36 PDT 2017


On Sat, Jun 10, 2017 at 4:09 PM, Davide Italiano <davide at freebsd.org> wrote:
> On Tue, Jun 6, 2017 at 10:55 AM, Stephen Crane via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> This RFC describes pagerando, an improvement upon ASLR for shared
>> libraries. We're planning to submit this work for upstreaming and
>> would appreciate feedback before we get to the patch submission stage.
>>
>> Pagerando randomizes the location of individual memory pages (ASLR
>> only randomizes the library base address). This increases security
>> against code-reuse attacks (such as ROP) by tolerating pointer leaks.
>> Pagerando splits libraries into page-aligned bins at compile time. At
>> load time, each bin is mapped to a random address. The code in each
>> bin is immutable and thus shared between processes.
>>
>> To implement pagerando, the compiler and linker need to build shared
>> libraries with text segments split into page-aligned (and ideally
>> page-sized) bins. All inter-bin references are indirected through a
>> table initialized by the dynamic loader that holds the absolute
>> address of each bin. At load time the loader randomly chooses an
>> address for each bin and maps the bin pages from disk into memory.
>>
>> We're focusing on ARM and AArch64 initially, although there is nothing
>> particularly target specific that precludes support for other LLVM
>> backends.
>>
>> ## Design Goals
>>
>> 1. Improve security over ASLR. The randomization granularity
>> determines how much information a single code pointer leaks. A pointer
>> to a page reveals less about the location of other code than a pointer
>> into a contiguous library would.
>> 2. Avoid randomizing files on disk. Modern operating systems provide
>> verified boot techniques to detect tampering with files. Randomizing
>> the on-disk layout of system libraries would interfere with the
>> trusted boot process. Randomizing libraries at compile or link time
>> would also needlessly complicate deployment and provisioning.
>> 3. Preserve code page sharing. The OS reduces memory usage by mapping
>> shared file pages to the same physical memory in each process and
>> locates these pages at different virtual addresses with ASLR. To
>> preserve sharing of code pages, we cannot modify the contents of
>> file-mapped pages at load time and are restricted to changing their
>> ordering and placement in the virtual address space.
>> 4. Backwards compatibility. Randomized code must interoperate
>> transparently with existing, unmodified executables and shared
>> libraries. Calls into randomized code must work as-is according to the
>> normal ABI.
>> 5. Compatibility with other mitigations. Enabling randomization must
>> not preclude deploying other mitigations such as control-flow
>> integrity as well.
>>
>> ## Pagerando Design
>>
>> Pagerando requires a platform-specific extension to the dynamic
>> loading ABI for compatible libraries to opt-in to. In order to
>> decouple the address of each code bin (segment) from that of other
>> bins and global data, we must disallow relative addressing between
>> different bin segments as well as between legacy segments and bin
>> segments.
>>
>> To prepare a library for pagerando, the compiler must first allocate
>> functions into page-aligned bins corresponding to segments in the
>> final ELF file. Since these bins will be independently positioned, the
>> compiler must redirect all inter-bin references through an indirection
>> table – the Page Offset Table (POT) – which stores the virtual address
>> of each bin in the library. Indices of POT entries and bin offsets are
>> statically determined at link time so code will not require any
>> dynamic relocations to reference functions in another bin or globals
>> outside of bins. We reserve a register in pagerando-compatible code to
>> hold the address of the POT. This register is initialized on entry to
>> the shared library. At load time the dynamic loader maps code bins at
>> independent, random addresses and updates the dynamic relocations in
>> the POT.
>>
>> Reserving a register to hold the POT address changes the internal ABI
>> calling convention and requires that the POT register be correctly
>> initialized when entering a library from external code. To initialize
>> the register, the compiler emits entry wrappers which save the old
>> contents of the POT register if necessary, initialize the POT
>> register, and call the target function. Each externally visible
>> function (conservatively including all address taken functions) needs
>> an entry wrapper which replaces the function for all external uses.
>>
>> To optimally pack functions into bins and avoid new static
>> relocations, we propose using (traditional) LTO. With new static
>> relocations (i.e. linker cooperation), LTO would not be necessary, but
>> it is still desirable for more efficient bin packing.
>>
>> The design of pagerando is based on the mitigations proposed by Backes
>> and Nürnberger [1], with improvements for compatibility and
>> deployability. The present design is a refinement of our first
>> pagerando prototype [2].
>>
>> ## LLVM Changes
>>
>> To implement pagerando, we propose the following LLVM changes:
>>
>> New module pass to create entry wrapper functions. This pass will
>> create entry wrappers as described above and replace exported function
>> names and all address taken uses with the wrapper. This pass will only
>> be run when pagerando is enabled.
>>
>> Instruction Lowering. Pagerando-compatible code must access all global
>> values (including functions) through the POT since PC-relative memory
>> addressing is not allowed between a bin and another segment. We
>> propose that when pagerando is enabled, all global variable accesses
>> from functions marked as pagerando-compatible must be lowered into
>> GOT-relative accesses and added to the GOT address loaded from the POT
>> (currently stored in the first POT entry). Lowering of direct function
>> calls targeting pagerando-compatible code is slightly more complicated
>> because we need to determine the POT index of the bin containing the
>> target function if the target is not in the same bin. However, we
>> can't properly allocate functions to bins before they are lowered and
>> an approximate size is available. Therefore, during lowering we should
>> assume that all function calls must be made indirectly through the POT
>> with the computation of the POT index and bin offset of the target
>> function postponed until assembly printing.
>>
>> New machine module LTO pass to allocate functions into bins. This pass
>> relies on targets implementing TargetInstrInfo::getInstSizeInBytes
>> (MachineInstr) so that it knows (approximately) how large the final
>> function code will be. Functions can also be packed in such a way that
>> the number of inter-bin calls are minimized by taking the function
>> call graph and/or execution profiles into account while packing. This
>> pass only needs to run when pagerando is enabled.
>>
>> Code Emission. After functions are assigned to bins, we create an
>> individual MCSection for each bin. These MCSections will map to
>> independent segments during linking. The AsmPrinter is responsible for
>> emitting the POT entries during code emission. We cannot easily
>> represent the POT as a standard IR object because it needs to contain
>> bin (MCSection) addresses. The AsmPrinter instead can query the
>> MCContext for the list of bin symbols and emit these symbols directly
>> into a global POT array.
>>
>> Gold Plugin Interface. If using LTO to build the module, LLVM can
>> generate the complete POT for the module and instrument all references
>> that need to use the POT. However, we must still ensure that bin
>> sections are each placed into an independent segment so that the
>> dynamic loader can map each bin separately. The gold plugin interface
>> currently provides support to assign sections to unique output
>> segments. However, it does not yet provide plugins an opportunity to
>> call this interface for new, plugin-created input files. Gold requires
>> that the plugin provide the file handle of the input section to assign
>> a section to a unique segment. We will need to upstream a small patch
>> for gold that provides a new callback to the LTO plugin when gold
>> receives a new, plugin-generated input file. This would allow the
>> plugin to obtain the new file’s handle and map its sections to unique
>> segments. The linker must mark pagerando bin segments in such a way
>> that the dynamic loader knows that it can randomize each bin segment
>> independently. We propose a new ELF segment flag PF_RAND_ADDR that can
>> communicate this for each compatible segment. The compiler and/or
>> linker must add this flag to compatible segments for the loader to
>> recognize and randomize the relevant segments.
>>
>> ## Target-Specific Details
>>
>> We will initially support pagerando for ARM and AArch64, so several
>> details are worth considering on those targets. For ARM/AArch64, the
>> r9 register is a platform-specific register that can be used as the
>> static base register, which is similar in many ways to pagerando. When
>> not specified by the platform, r9 is a callee-saved general-purpose
>> register. Thus, using r9 as the POT register will be backwards
>> compatible when calling out of pagerando code into either legacy code
>> or a different module; the callee will preserve r9 for use after
>> returning to pagerando code. In AArch64, r18 is designated as a
>> platform-specific register, however, it is not specified as
>> callee-saved when not reserved by the target platform. Thus, to
>> interoperate with unmodified legacy AArch64 software, we would need to
>> save r18 in pagerando code before calling into any external code. When
>> using LTO, the compiler will see the entire module and therefore be
>> able to identify calls into external vs internal code. Without LTO, it
>> will likely be more efficient to use a callee-saved register to avoid
>> the need to save the POT register before each call. We will experiment
>> with both caller- and callee-saved registers to determine which is
>> most efficient.
>>
>>
>> [1] M. Backes and S. Nürnberger. Oxymoron - making fine-grained memory
>> randomization practical by allowing code sharing. In USENIX Security
>> Symposium, 2014. https://www.usenix.org/node/184466
>>
>> [2] S. Crane, A. Homescu, and P. Larsen. Code randomization: Haven’t
>> we solved this problem yet? In IEEE Cybersecurity Development
>> Conference (SecDev), 2016.
>> http://www.ics.uci.edu/~perl/sd16_pagerando.pdf
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> Out of curiosity, Did you measure what's the impact on performances
> of the generated executable? We tried something akin to your proposal
> in the past (i.e. randomizing ELF sections layout) and it turned out to be a
> sledgehammer for performances (in some cases, i.e. when
> -ffunction-sections/-fdata-sections was specified the performances of
> the runtime executable dropped by > 10% [cc:ing Michael as he did the
> measurements]).
>

To clarify, I read your paper and I see some benchmarks see
substantial degradations (6.5%), but in your "future work" section you
describe techniques to mitigate the drop, and I wonder if you ever got
to implement them and got new measurements.

Thanks,

-- 
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare


More information about the llvm-dev mailing list