[PATCH] D41723: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..

Fri Jan 5 01:18:02 PST 2018

(Sorry I missed your comments while i was uploading some updates and then
ran off Rafael! But seems you and Rui have already handled most of this...)

On Thu, Jan 4, 2018 at 9:54 PM Rafael Avila de Espindola via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Chandler Carruth via Phabricator via llvm-commits
> <llvm-commits at lists.llvm.org> writes:
>
> > The goal is simple: avoid generating code which contains an indirect
> > branch that could have its prediction poisoned by an attacker. In many
> > cases, the compiler can simply use directed conditional branches and
> > a small search tree. LLVM already has support for lowering switches in
> > this way and the first step of this patch is to disable jump-table
> > lowering of switches and introduce a pass to rewrite explicit indirectbr
> > sequences into a switch over integers.
>
> Amusingly that is what clang would do before r86256.
>

Indeed. =] Just wind back the clock... :: sigh ::

>
> > There is one other important source of indirect branches in x86 ELF
> > binaries: the PLT. These patches also include support for LLD to
> > generate PLT entries that perform a retpoline-style indirection.
>
> I see that we now generate similar code for -fno-plt. We should have a
> test for that (nonlazybind at the llvm level).
>

? The LLVM code generation for PLT-calls shouldn't be impacted by retpoline
at all? It should just be the contents of the PLT section itself? Maybe I'm
just misunderstanding...

If you're wondering whether LLVM will correctly retpoline a -fno-plt call
that it directly emits, I believe it uses the same instruction patterns as
any other indirect call and so should be covered already.

> > The only other indirect branches remaining that we are aware of are from
> > precompiled runtimes (such as crt0.o and similar). The ones we have
> > found are not really attackable, and so we have not focused on them
> > here, but eventually these runtimes should also be replicated for
> > retpoline-ed configurations for completeness.
> >
> > For kernels or other freestanding or fully static executables, the
> > compiler switch `-mretpoline` is sufficient to fully mitigate this
> > particular attack. For dynamic executables, you must compile *all*
> > libraries with `-mretpoline` and additionally link the dynamic
> > executable and all shared libraries with LLD and pass `-z retpolineplt`
> > (or use similar functionality from some other linker). We strongly
> > recommend also using `-z now` as non-lazy binding allows the
> > retpoline-mitigated PLT to be substantially smaller.
>
> Since we have to rebuild everything and lazy binding is more expensive,
> -fno-plt should be the best mitigation, no?
>

I think both -fno-plt and -z now are useful ways to minimize the impact
here. But there is still some tradeoff between them: with -fno-plt we would
see a small space impact on each call site. If there are enough call sites
this space overhead may be a significant issue.

My general sense is that the *hot* calls through the PLT you'll end up
moving out of a separate DSO entirely. If you call `memcpy` inside of
libc`s DSO on a retpolined executable, whether with PLT and `-z now` or
with `-fno-plt`, you're going to have a Very Bad Time w.r.t. to
performance. Crazy bad. Same for other similarly hot calls.

Once you've dealt with this somehow, my suspicion is that the code size
savings of the PLT are probably worth it and the direct jump preceding the
PLT stub will stop having any measurable perf hit (as the calls will be
not-quite-so-hot, and the cost of retpoline dwarfs this regardless of
where).

> > +Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef
> Name) {
> > +  LLVMContext &Ctx = M.getContext();
> > +  auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
> > +  Function *F =
> > +      Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name, &M);
> > +  F->setVisibility(GlobalValue::HiddenVisibility);
>
> We should probably put this in a comdat so the linker keeps only one.
>

Is LinkOnceODR not sufficient? We don't need a comdat *group* here. I
thought you didn't need explicit comdats unless you're grouping multiple
globals....

>
> > Index: llvm/lib/CodeGen/IndirectBrExpandPass.cpp
> > ===================================================================
> > --- /dev/null
> > +++ llvm/lib/CodeGen/IndirectBrExpandPass.cpp
> > @@ -0,0 +1,187 @@
> > +//===- IndirectBrExpandPass.cpp - Expand indirectbr to switch
> -------------===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> >
> +//===----------------------------------------------------------------------===//
> > +/// \file
> > +///
> > +/// Implements an expansion pass to turn `indirectbr` instructions in
> the IR
> > +/// into `switch` instructions. This works by enumerating the basic
> blocks in
> > +/// a dense range of integers, replacing each `blockaddr` constant with
> the
> > +/// corresponding integer constant, and then building a switch that
> maps from
> > +/// the integers to the actual blocks.
> > +///
> > +/// While this is generically useful if a target is unable to codegen
> > +/// `indirectbr` natively, it is primarily useful when there is some
> desire to
> > +/// get the builtin non-jump-table lowering of a switch even when the
> input
> > +/// source contained an explicit indirect branch construct.
>
> This pass converts each indirectbr to a switch. One feature we had in
> the old clang expansion of computed gotos is that each function would
> have a single switch. This would avoid code explosion in direct threaded
> interpreters where after each opcode there is a jump that can go any
> other opcode.
>

I'll see if this is easy to implement, but if it's going to take some time
I might suggest we defer this to a follow-up patch.

I think Rui got the rest of your comments, but give a shout if I've missed
anything and thanks for review!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180105/fb89204a/attachment.html>