[PATCH] D41723: Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre..

Thu Jan 4 21:39:42 PST 2018

On Fri, Jan 5, 2018 at 11:54 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

> Chandler Carruth via Phabricator via llvm-commits
> <llvm-commits at lists.llvm.org> writes:
>
> > The goal is simple: avoid generating code which contains an indirect
> > branch that could have its prediction poisoned by an attacker. In many
> > cases, the compiler can simply use directed conditional branches and
> > a small search tree. LLVM already has support for lowering switches in
> > this way and the first step of this patch is to disable jump-table
> > lowering of switches and introduce a pass to rewrite explicit indirectbr
> > sequences into a switch over integers.
>
> Amusingly that is what clang would do before r86256.
>
> > There is one other important source of indirect branches in x86 ELF
> > binaries: the PLT. These patches also include support for LLD to
> > generate PLT entries that perform a retpoline-style indirection.
>
> I see that we now generate similar code for -fno-plt. We should have a
> test for that (nonlazybind at the llvm level).
>
> > The only other indirect branches remaining that we are aware of are from
> > precompiled runtimes (such as crt0.o and similar). The ones we have
> > found are not really attackable, and so we have not focused on them
> > here, but eventually these runtimes should also be replicated for
> > retpoline-ed configurations for completeness.
> >
> > For kernels or other freestanding or fully static executables, the
> > compiler switch `-mretpoline` is sufficient to fully mitigate this
> > particular attack. For dynamic executables, you must compile *all*
> > libraries with `-mretpoline` and additionally link the dynamic
> > executable and all shared libraries with LLD and pass `-z retpolineplt`
> > (or use similar functionality from some other linker). We strongly
> > recommend also using `-z now` as non-lazy binding allows the
> > retpoline-mitigated PLT to be substantially smaller.
>
> Since we have to rebuild everything and lazy binding is more expensive,
> -fno-plt should be the best mitigation, no?
>
> > +Function *X86RetpolineThunks::createThunkFunction(Module &M, StringRef
> Name) {
> > +  LLVMContext &Ctx = M.getContext();
> > +  auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
> > +  Function *F =
> > +      Function::Create(Type, GlobalValue::LinkOnceODRLinkage, Name,
> &M);
> > +  F->setVisibility(GlobalValue::HiddenVisibility);
>
> We should probably put this in a comdat so the linker keeps only one.
>
> > Index: llvm/lib/CodeGen/IndirectBrExpandPass.cpp
> > ===================================================================
> > --- /dev/null
> > +++ llvm/lib/CodeGen/IndirectBrExpandPass.cpp
> > @@ -0,0 +1,187 @@
> > +//===- IndirectBrExpandPass.cpp - Expand indirectbr to switch
> -------------===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +//===------------------------------------------------------
> ----------------===//
> > +/// \file
> > +///
> > +/// Implements an expansion pass to turn `indirectbr` instructions in
> the IR
> > +/// into `switch` instructions. This works by enumerating the basic
> blocks in
> > +/// a dense range of integers, replacing each `blockaddr` constant with
> the
> > +/// corresponding integer constant, and then building a switch that
> maps from
> > +/// the integers to the actual blocks.
> > +///
> > +/// While this is generically useful if a target is unable to codegen
> > +/// `indirectbr` natively, it is primarily useful when there is some
> desire to
> > +/// get the builtin non-jump-table lowering of a switch even when the
> input
> > +/// source contained an explicit indirect branch construct.
>
> This pass converts each indirectbr to a switch. One feature we had in
> the old clang expansion of computed gotos is that each function would
> have a single switch. This would avoid code explosion in direct threaded
> interpreters where after each opcode there is a jump that can go any
> other opcode.
>
> > Index: lld/test/ELF/x86-64-retpoline.s
> > ===================================================================
> > --- /dev/null
> > +++ lld/test/ELF/x86-64-retpoline.s
> > @@ -0,0 +1,51 @@
> > +// REQUIRES: x86
> > +// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t1.o
> > +// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux
> %p/Inputs/shared.s -o %t2.o
> > +// RUN: ld.lld -shared %t2.o -o %t2.so
> > +
> > +// RUN: ld.lld -shared %t1.o %t2.so -o %t.exe -z retpolineplt
> > +// RUN: llvm-objdump -d %t.exe | FileCheck %s
>
> Please add a CHECK line before the following ones to show what section
> is being disassembled.
>
> You also want to check the address of the .got.plt to show what some of
> the magic numbers in the CHECK are.
>
> The same issue is present in other tests.
>

Will do.

>
> > +template <class ELFT>
> > +void Retpoline<ELFT>::writeGotPlt(uint8_t *Buf, const Symbol &S) const
> {
> > +  write32le(Buf, S.getPltVA() + 21);
>
> Add a comment about where the 21 is from.
>

It's now 16, and I think the comment isn't needed here because it would be
a copy-n-paste of the previous comment for X86<ELFT>::writeGotPlt.

> +template <class ELFT> void Retpoline<ELFT>::writePltHeader(uint8_t *Buf)
> const {
> > +  const uint8_t Insn[] = {
> > +      0xff, 0x35, 0, 0, 0, 0,       //   pushq GOTPLT+8(%rip)
> > +      0x4c, 0x8b, 0x1d, 0, 0, 0, 0, //   mov GOTPLT+16(%rip), %r11
> > +      0xe8, 0x0e, 0x00, 0x00, 0x00, //   callq next
> > +      0xf3, 0x90,                   // loop: pause
> > +      0xeb, 0xfc,                   //   jmp loop
> > +      0x0f, 0x1f, 0x44, 0x00, 0x00, //   nop
> > +      0x0f, 0x1f, 0x44, 0x00, 0x00, //   nop; .align 16
>
> The nops are never executed, right? Could we use traps?
>

Will use int3 instead of nop.

> > +template <class ELFT>
> > +void Retpoline<ELFT>::writePlt(uint8_t *Buf, uint64_t GotPltEntryAddr,
> > +                               uint64_t PltEntryAddr, int32_t Index,
> > +                               unsigned RelOff) const {
> > +  const uint8_t Insn[] = {
> > +      0x4c, 0x8b, 0x1d, 0, 0, 0, 0, //   mov foo at GOTPLT(%rip), %r11
> > +      0xe8, 0x04, 0x00, 0x00, 0x00, //   callq next
> > +      0xf3, 0x90,                   // loop: pause
> > +      0xeb, 0xfc,                   //   jmp loop; .align 16
> > +      0x4c, 0x89, 0x1c, 0x24,       // next: mov %r11, (%rsp)
> > +      0xc3,                         //   ret
>
> This code sequence (mov + ret) exists in the header. Can't we jump there
> and reduce the size a bit?
>
> > +void RetpolinePic::writeGotPlt(uint8_t *Buf, const Symbol &S) const {
> > +  write32le(Buf, S.getPltVA() + 16);
>
> Add a comment saying why 16.
>
> Cheers,
> Rafael
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180105/617762de/attachment.html>