[llvm-dev] retpoline mitigation and 6.0

Fri Feb 9 00:45:59 PST 2018

On Fri, Feb 9, 2018 at 12:26 AM David Woodhouse <dwmw2 at infradead.org> wrote:

>
>
> On Fri, 2018-02-09 at 02:21 +0000, David Woodhouse wrote:
> > On Fri, 2018-02-09 at 01:18 +0000, David Woodhouse wrote:
> > >
> > >
> > > For now I'm just going to attempt to work around it like this in the
> > > kernel, so I can concentrate on the retpoline bits:
> > >  http://david.woodhou.se/clang-percpu-hack.patch
> >
> > 32-bit doesn't boot. Built without CONFIG_RETPOLINE and with Clang 5.0
> > (and the above patch) it does. I'm rebuilding a Release build of
> > llvm/clang so that experimental kernel builds hopefully take less than
> > an hour, and will prod further in the morning.
>
> What is the intended ABI of __x86_indirect_thunk which I have been
> calling the "ret-equivalent" retpoline? I see this happening
> (I ♥ 'qemu -d in_asm')...
>
> ----------------
> IN:
> 0xc136feea:  89 d8                    movl     %ebx, %eax
> 0xc136feec:  89 f2                    movl     %esi, %edx
> 0xc136feee:  8b 75 f0                 movl     -0x10(%ebp), %esi
> 0xc136fef1:  89 f1                    movl     %esi, %ecx
> 0xc136fef3:  ff 75 e0                 pushl    -0x20(%ebp)
> 0xc136fef6:  e8 c5 f3 58 00           calll    0xc18ff2c0 #
> __x86_indirect_thunk
>
> ----------------
> IN:
> 0xc18ff2c0:  c3                       retl     # Early boot, so it hasn't
> been turned into a proper retpoline yet
>
> ----------------
> IN:
> 0xc136fefb:  8d 34 7e                 leal     (%esi, %edi, 2), %esi
>
>
> (gdb) list *0xc136fef6
> 0xc136fef6 is in sort (lib/sort.c:87).
> 82                              if (c < n - size &&
> 83                                              cmp_func(base + c, base +
> c + size) < 0)
> 84                                      c += size;
> 85                              if (cmp_func(base + r, base + c) >= 0)
> 86                                      break;
> 87                              swap_func(base + r, base + c, size);
> 88                      }
> 89              }
> 90
> 91              /* sort */
>
> You're pushing the target (-0x20(%ebp)) onto the stack and then
> *calling* __x86_indirect_thunk. So it looks like you're expecting
> __x86_indirect_thunk to do something like
>
>   call *4(%esp)
>   ret
>
> ... except that final 'ret' still leaves the target address on the
> stack, so there would also need to be a complicated dance, without
> using any registers, to pop that too.
>

Yeah, we expect a complicated dance to re-order the stack to get the
correct return address into the correct place.

You can see the sequence in the comments here:
https://github.com/llvm-project/llvm-project-20170507/blob/master/llvm/lib/Target/X86/X86RetpolineThunks.cpp#L179-L194

>
> I expected the emitted code for a *call* using the thunk to look more
> like
>
>    jmp 2f
> 1: pushl -0x20(%ebp)        # cmp_func
>    jmp __x86_thunk_indirect # jmp, not call
> 2: call 1b                  # set up address for cmp_func to return to
>

Yeah, the specific goal was to minimize the code size footprint at the call
site even though it means a few more instructions in the thunk. Our pattern
also has a minor reduction in the dynamic branches taken at the cost of the
push/pop churn.

There was briefly a discussion of a different instruction sequence to
minimize push/pop churn but it didn't end up happening.

Anyways, it appears that we have the first case where my suspicions were
borne out and we have somewhat reasonably different ABIs for some of the
thunks.

How should we name them to distinguish things?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/b5e64b88/attachment-0001.html>