[PATCH] D49655: [x86/SLH] Negative result (not planned for submission): Introduce an alternative way of embedding and extracting the predicate state in the stack pointer.

Sun Jul 22 20:53:21 PDT 2018

chandlerc created this revision.
chandlerc added reviewers: echristo, craig.topper.
Herald added subscribers: hiraditya, mcrosier, sanjoy.

Previously, we were only placing it in the high bits of the stack
pointer to allow pops and pushes to not disturb the state. Instead,
switch to or-ing the entire predicate state over the stack pointer and
extract it be checking to see if the stack pointer is near to -1.

This is a confounding tradeoff to make. I'm really not sure this is
worth pursuing.

The main advantage of this new approach is that we use one fewer
instruction in the `ret` path. This is one fewer instruction in the
critical path of the `ret` as well so it does seem reasonably valuable.

However, it has a bunch of down sides that make me unsure this would
ever be useful:

First, for this to continue to mitigate variant 1.1 (BCBS + return), it
is essential that the OS actually *unmap* the memory at address `0` (the
low page). It is *not sufficient* to change this memory's PTEs to
protect the memory in some way. Protections are not enforced in time
(see GPZv3/Meltdown and variant 1.2 which abuse this fact). It is
essential that the address from which the return address would be loaded
(and which an attacker would need to store over to accomplish a variant
1.1 attack) be *unmapped* whin misspeculating. This means that even as
the stack pointer is incremented by successive speculative returns, and
walks from `-1` up through the first page of memory, it cannot hit on
a virtual address with a page table entry at all. With KPTI, this is
generally true on Linux and BSDs for large magnitude negative addresses
such as those formed with the prior technique. To make this technique
secure would likely require OS changes.

Second, we have to have a reliable way to extract the predicate state
back from the stack pointer. This is a bit trickier with this approach,
and in fact the current version of the patch doesn't really work... The
current version of the patch assumes that poisoned stack pointers are
always in the range of [0, 4096), and valid stack pointers never are in
that range. This is conservatively correct (valid stack pointers aren't
in that range on commonly deployed ABIs), but can miss important cases.
For example, a return immediately followed by several pushes or calls
will end up with a stack pointer that is a small magnitude negative
number which will then be assumed to be 'not poisoned' incorrectly.

Third, the instruction sequence for #2 is often longer than with the old
pattern. The question is -- does the shorter instruction sequence for
the `ret` path adequately pay for this cost? Very unclear.

I could simply conduct measurements to evaluate the third challenge, but
the first two challenges seem ... real obstacles to pursuing this path.

So I'm just mailing this out as FYI and a negative result. I've
regenerated the tests so you can see the precise instruction shifts with
my vague implementation of #2, but it would need a better, real
implementation to go anywhere.

I don't plan to pursue this further or commit this patch unless someone
comes up with compelling ways to address the first two issues. If so,
then I can work on benchmarking the results.

Repository:
  rL LLVM

https://reviews.llvm.org/D49655

Files:
  llvm/lib/Target/X86/X86SpeculativeLoadHardening.cpp
  llvm/test/CodeGen/X86/speculative-load-hardening-gather.ll
  llvm/test/CodeGen/X86/speculative-load-hardening.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49655.156720.patch
Type: text/x-patch
Size: 71891 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180723/37468a85/attachment.bin>