[PATCH] D41761: Introduce llvm.nospeculateload intrinsic

Fri Jan 26 18:41:52 PST 2018

gromer added a comment.

In https://reviews.llvm.org/D41761#989706, @efriedma wrote:

> > Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what speculationsafeload does?
>
> Prohibit what, exactly?  According to current LangRef rules, it's legal to introduce a dead load to an arbitrary pointer (even if the compiler can't prove it's dereferencable).

I'm basically just asking if we should have some form of assurance stronger than "most transforms don't understand this intrinsic well enough to violate the invariants it relies on". I have no idea what form that assurance would take, since I don't know how LangRef handles such matters.

> I guess I'll describe the uninitialized pointer problem in a little more detail.  The idea is that you have code roughly like this:
> 
>   // Code in a function; g is a global int.
>   bool b = f1();
>   uint8_t* p;
>   if (b)
>       p = &g;
>   g = 10;
>   if (b)
>       f4(user_array[*p]);
> 
> 
> If b is true, we load user_array[10].  If b is false, we speculate `user_array[*p]`, where p is uninitialized (i.e. user-controlled, if you're unlucky).  You now have a variant-1 attack.

Hmm. If I understand correctly, the `ProtectFromSpeculation()` API I mentioned earlier could guard against this, by including `p` in the variadic list of clobbers.

> There's a lot of potential variants of this.  For example, instead of user_array, we have another speculatively-uninitialized pointer.  Or the load which leaks the data to the user could actually be a speculationsafeload, intended to stop a different variant-1 attack.  Or the code could be spread over multiple functions.  Or the if statement might not be an if statement (there are lots of ways to get a conditional branch in assembly).  Or "p" might be a pointer to a constant pool, so the load-from-undef isn't written in the source code at all.

How can two variant-1 attacks be "different" enough that a `speculationsafeload` would protect against one but not the other, when both exploit the same load operation? The only possibility I'm coming up with is that the load has additional safety requirements besides the `speculationsafeload` bounds check. If so, that might argue for an API that lets you express arbitrary predicates (like `ProtectFromSpeculation`), rather than just upper/lower bounds checks.

I don't see how the code being spread over multiple functions matters- all that matters are the load, and the branch (or nested branches) that actually guard that load, not any prior branches on the same condition. As for the if-statement not being an if-statement, that's true to the extent that in principle it could be a switch or a loop, but it has to be some kind of conditional control flow that's explicit in user code. This attack is only a platform-level issue because the attacker is exploiting features of the platform to observe the effects of a load that the application-level logic says cannot happen. If the application logic doesn't explicitly prevent any of the loads the attacker is exploiting, that's a plain old application-level vulnerability that LLVM neither can nor should fix.

So it seems to me that all the variations you identify are either application-level vulnerabilities, or can be straightforwardly blocked by something like `ProtectFromSpeculation` (I'm not quite sure about `speculationsafeload`).

https://reviews.llvm.org/D41761