[llvm-dev] RFC: Using link-time optimization to eliminate retpolines

Fri Jan 26 09:33:11 PST 2018

Wouldn't a branch funnel open the door to a type 1 attack?

E.g. if the code looks like this, then a branch funnel basically turns into
a standard type 1 pattern AFAICT:

struct Base {
    virtual int f(long) = 0;
};

struct A : Base {
    int f(long x) override {
        return 0;
    };
};

struct B : Base {
    int f(long x) override {
        // As in listing 1 in https://spectreattack.com/spectre.pdf
        return array2[array1[x] * 256];
    }
};

-- Sean Silva

On Tue, Jan 23, 2018 at 4:44 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> The proposed mitigation for variant 2 of CVE-2017-5715, “branch target
> injection”, is to send all indirect branches through an instruction
> sequence known as a retpoline. Because the purpose of a retpoline is to
> prevent attacker-controlled speculation, we also end up losing the benefits
> of benign speculation, which can lead to a measurable loss of performance.
>
> We can regain some of those benefits if we know that the set of possible
> branch targets is fixed (this is sometimes known to be the case when using
> whole-program devirtualization or CFI -- see https://clang.llvm.org/docs/
> LTOVisibility.html). In that case, we can construct a so-called “branch
> funnel” that selects one of the possible targets by performing a binary
> search on an address associated with the indirect branch (for virtual
> calls, this is the address of the vtable, and for indirect calls via a
> function pointer, this is the function pointer itself), eventually directly
> branching to the selected target. As a result, the processor is free to
> speculatively execute the virtual call, but it can only speculatively
> branch to addresses of valid implementations of the virtual function, as
> opposed to arbitrary addresses.
>
> For example, suppose that we have the following class hierarchy, which is
> known to be closed:
>
> struct Base { virtual void f() = 0; };
> struct A : Base { virtual void f(); };
> struct B : Base { virtual void f(); };
> struct C : Base { virtual void f(); };
>
> We can lay out the vtables for the derived classes in the order A, B, C,
> and produce an instruction sequence that directs execution to one of the
> targets A::f, B::f and C::f depending on the vtable address. In x86_64
> assembly, a branch funnel would look like this:
>
> lea B::vtable+16(%rip), %r11
> cmp %r11, %r10
> jb A::f
> je B::f
> jmp C::f
>
> A caller performs a virtual call by loading the vtable address into
> register r10, setting up the other registers for the virtual call and
> directly calling the branch funnel as if it were a regular function.
> Because the branch funnel enforces control flow integrity by itself, we can
> also avoid emitting CFI checks at call sites that use branch funnels when
> CFI is enabled.
>
> To control the layout of vtables and function pointers, we can extend
> existing mechanisms for controlling layout that are used to implement CFI
> (see https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html) so that
> they are also used whenever a branch funnel needs to be created.
>
> The compiler will only use branch funnels when both the retpoline
> mitigation (-mretpoline) and whole-program devirtualization
> (-fwhole-program-vtables) features are enabled (the former is on the
> assumption that in general a regular indirect call will be less expensive
> than a branch funnel, and the latter provides the necessary guarantee that
> the type hierarchy is closed). Even when retpolines are enabled, there is
> still a cost associated with executing a branch funnel that needs to be
> balanced against the cost of a regular CFI check and retpoline, so branch
> funnels are only used when there are <=10 targets (this number has not been
> tuned yet). Because the implementation uses some of the same mechanisms
> that are used to implement CFI and whole-program devirtualization, it
> requires LTO (it is compatible with both full LTO and ThinLTO).
>
> To measure the performance impact of branch funnels, I ran a selection of
> Chrome benchmark suites on Chrome binaries built with CFI, CFI + retpoline
> and CFI + retpoline + branch funnels, and measured the median impact over
> all benchmarks in each suite. The numbers are presented below. I should
> preface these numbers by saying that these are largely microbenchmarks, so
> the impact of retpoline on its own is unlikely to be characteristic of real
> workloads. The numbers to focus on should be the impact of retpoline +
> branch funnels relative to the impact of retpoline, where there is a median
> 5.7% regression as compared to the median 8% regression associated with
> retpoline.
>
> Benchmark suite
>
> CFI + retpoline impact
>
> (relative to CFI)
>
> CFI + retpoline + BF impact
>
> (relative to CFI)
>
> blink_perf.bindings
>
> 0.9% improvement
>
> 9.8% improvement
>
> blink_perf.dom
>
> 20.4% regression
>
> 17.5% regression
>
> blink_perf.layout
>
> 17.4% regression
>
> 14.3% regression
>
> blink_perf.parser
>
> 3.8% regression
>
> 5.7% regression
>
> blink_perf.svg
>
> 8.0% regression
>
> 5.4% regression
> Future workImplementation of branch funnels for architectures other than
> x86_64.
>
> Implementation of branch funnels for indirect calls via a function pointer
> (currently only implemented for virtual calls). This will probably require
> an implementation of whole-program “devirtualization” for indirect calls.
>
> Use profile data to order the comparisons in the branch funnel by
> frequency, to minimise the number of comparisons required for frequent
> virtual calls.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180126/f852f3d0/attachment.html>