[llvm-dev] RFC: Using link-time optimization to eliminate retpolines

Tue Jan 23 16:44:42 PST 2018

The proposed mitigation for variant 2 of CVE-2017-5715, “branch target
injection”, is to send all indirect branches through an instruction
sequence known as a retpoline. Because the purpose of a retpoline is to
prevent attacker-controlled speculation, we also end up losing the benefits
of benign speculation, which can lead to a measurable loss of performance.

We can regain some of those benefits if we know that the set of possible
branch targets is fixed (this is sometimes known to be the case when using
whole-program devirtualization or CFI -- see
https://clang.llvm.org/docs/LTOVisibility.html). In that case, we can
construct a so-called “branch funnel” that selects one of the possible
targets by performing a binary search on an address associated with the
indirect branch (for virtual calls, this is the address of the vtable, and
for indirect calls via a function pointer, this is the function pointer
itself), eventually directly branching to the selected target. As a result,
the processor is free to speculatively execute the virtual call, but it can
only speculatively branch to addresses of valid implementations of the
virtual function, as opposed to arbitrary addresses.

For example, suppose that we have the following class hierarchy, which is
known to be closed:

struct Base { virtual void f() = 0; };
struct A : Base { virtual void f(); };
struct B : Base { virtual void f(); };
struct C : Base { virtual void f(); };

We can lay out the vtables for the derived classes in the order A, B, C,
and produce an instruction sequence that directs execution to one of the
targets A::f, B::f and C::f depending on the vtable address. In x86_64
assembly, a branch funnel would look like this:

lea B::vtable+16(%rip), %r11
cmp %r11, %r10
jb A::f
je B::f
jmp C::f

A caller performs a virtual call by loading the vtable address into
register r10, setting up the other registers for the virtual call and
directly calling the branch funnel as if it were a regular function.
Because the branch funnel enforces control flow integrity by itself, we can
also avoid emitting CFI checks at call sites that use branch funnels when
CFI is enabled.

To control the layout of vtables and function pointers, we can extend
existing mechanisms for controlling layout that are used to implement CFI
(see https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html) so that
they are also used whenever a branch funnel needs to be created.

The compiler will only use branch funnels when both the retpoline
mitigation (-mretpoline) and whole-program devirtualization
(-fwhole-program-vtables) features are enabled (the former is on the
assumption that in general a regular indirect call will be less expensive
than a branch funnel, and the latter provides the necessary guarantee that
the type hierarchy is closed). Even when retpolines are enabled, there is
still a cost associated with executing a branch funnel that needs to be
balanced against the cost of a regular CFI check and retpoline, so branch
funnels are only used when there are <=10 targets (this number has not been
tuned yet). Because the implementation uses some of the same mechanisms
that are used to implement CFI and whole-program devirtualization, it
requires LTO (it is compatible with both full LTO and ThinLTO).

To measure the performance impact of branch funnels, I ran a selection of
Chrome benchmark suites on Chrome binaries built with CFI, CFI + retpoline
and CFI + retpoline + branch funnels, and measured the median impact over
all benchmarks in each suite. The numbers are presented below. I should
preface these numbers by saying that these are largely microbenchmarks, so
the impact of retpoline on its own is unlikely to be characteristic of real
workloads. The numbers to focus on should be the impact of retpoline +
branch funnels relative to the impact of retpoline, where there is a median
5.7% regression as compared to the median 8% regression associated with
retpoline.

Benchmark suite

CFI + retpoline impact

(relative to CFI)

CFI + retpoline + BF impact

(relative to CFI)

blink_perf.bindings

0.9% improvement

9.8% improvement

blink_perf.dom

20.4% regression

17.5% regression

blink_perf.layout

17.4% regression

14.3% regression

blink_perf.parser

3.8% regression

5.7% regression

blink_perf.svg

8.0% regression

5.4% regression
Future workImplementation of branch funnels for architectures other than
x86_64.

Implementation of branch funnels for indirect calls via a function pointer
(currently only implemented for virtual calls). This will probably require
an implementation of whole-program “devirtualization” for indirect calls.

Use profile data to order the comparisons in the branch funnel by
frequency, to minimise the number of comparisons required for frequent
virtual calls.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180123/6fffac9c/attachment.html>