[llvm-dev] [RFC] Pagerando: Page-granularity code randomization

Wed Oct 10 19:37:50 PDT 2018

Thanks for the feedback, Kostya! Sorry I haven’t replied sooner, I’m
currently on vacation but wanted to write up a general summary of the
project status before getting into the specifics you brought up.

On Wed, Oct 3, 2018 at 4:12 PM Kostya Serebryany <kcc at google.com> wrote:
> * Huge complexity. This is not just the compiler, but also the rest of the toolchain and run-times (linkers, debuggers, unwinders, symbolizers).

Agreed, pagerando adds complexity to the toolchain. Changes will be
required to debuggers, unwinders, etc., but from our experience, these
tools require fairly small changes to work with pagerando. E.g., the
DSBT ABI uses a table similar to the POT to store the dynamic address
of each segment, and gdb already supports this ABI with a
target-specific handler for DSOs. Modifying this ABI has a minimal
impact on the rest of the debugger.

The decision to take on most of the toolchain complexity is one that
will rest on the platform deploying pagerando, which depends on how
amenable the target platform is to a pervasive change in shared
library layout handling.

> I'd like to hear from some offensive security experts here their comparison of PageRando vs CFI-like schemes (that are much cheaper, and are already available)

Pagerando is an improvement over ASLR; it is certainly not intended as
a replacement for CFI. Pagerando instead complements CFI as a defense
in depth by making it harder to reliably exploit unconstrained (legacy
code w/o CFI) and weakly-constrained (e.g. those that require many
targets w/CFI) branches.

> * Spilling the POT register may reveal the secret and make all this pointless. If we want to mix instrumented and non-instrumented code (and still have the protection)
> we'll need to at least recompile all the non-instrumented code with x18 reserved so that we don't need to spill it. That's the same problem we have with the ShadowCallStack though.

Agreed. We’ve bounced around a few ways to mitigate this leakage, but
we don’t have a great solution yet. It’s not trivial to exploit this
weakness, but it is a concern in mixed code. We intend to focus, at
least initially, on privileged processes where the only non-pagerando
code is the main binary which can reserve the necessary register. We
must still preserve compatibility with other heterogeneous processes,
and allowing the callee-saved register to spill is the simplest way to
do this. Ideally, we would integrate ShadowCallStack and pagerando
register usage so we only need a single register rather than two for
the combination. Any solution we use for one would benefit the other.

> * 20-30% code size overhead is a no-go for the majority of large apps (speaking from my Chrome experience) and thus this will remain a niche tool.

Pagerando is only applicable to system-wide shared libraries. These
are mostly small so I’m not as concerned about code size overhead as I
would be for large binaries like Chrome. However, this is still a
valid overhead concern. We’ve been working to reduce it by lowering
the number of entry wrappers needed. I suspect we can shave off a bit
more code size by optimizing the inter-bin calls, but I’m not counting
on that.

After initially enabling pagerando for the subset of system libraries
used in privileged processes, we plan to expand that set to larger
libraries as we constrain external APIs to reduce the number of entry
wrappers. For the limited subset, we have a significantly smaller
impact on disk and memory usage.

> * 3%-6% CPU overhead is also a concern for this kind of benefit, and I'm afraid that the overhead grows super-linearly with the binary size (more cache lines are touched by POT)

We think it is better to let users make decisions on a case-by-case
basis since the averages hide substantial variance. On most Android
workloads, we see no runtime performance overheads and we have made
progress on the outlier cases as noted earlier in this thread.

If POT-induced cache pressure is indeed a problem on particular
workloads, we can bound the size of the POT by increasing the bin
granularity (e.g. 8K pages vs. 4K pages). In fact, for the unified POT
optimization I touched on in the summary, we bound the POT for each
library to a single 4K page for simplicity, which requires a handful
of large libraries to use larger bin sizes.

> * adding so many new indirect calls in the post-spectre world sounds scary, and so far I haven't seen any evaluation from spectre exerts on this thread.

I’m not a Spectre expert either but I think that randomizing the
entire victim process’s address space may make it more difficult to
perform a branch target injection attack since it requires a known
target address to train the predictor. Additionally, many of those
indirect calls are at randomized addresses, which makes the attack
more difficult since the BTB uses these addresses in its lookup
algorithm.

Moreover, it appears that Spectre-V2 is being addressed through OS and
firmware updates. ARM is committed to making its future Cortex
processors resilient at the hardware level (Cortex-A76 is already
resilient to variants 2 and 3). The Linux kernel now has support for
invalidating the ARM and AArch64 BTB on context switch, which also
mitigates variant 2.

Overall, I agree with the concerns you raise. At the same time, I
think that the cost/benefit decision must be made on a case-by-case
basis according to the users’ operational constraints. I hope that we
can have a shot at maturing pagerando in tree leading to eventual
deployment.

Thanks,
stephen