[PATCH] D41761: Introduce llvm.nospeculateload intrinsic

Fri Jan 26 13:19:30 PST 2018

gromer added a comment.

(I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)

The proposed API for `nospeculateload` seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a `failval` sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside `nospeculateload`, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually //is// inefficient, that will create pressure for optimization passes to "peek inside" `nospeculateload` in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.

Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:

  int64_t SecureBoundedOffset(int64_t offset, int64_t bound);

At the abstract machine level, this function just returns `offset`, and has the precondition that 0 <= `offset` < `bound`, but it also ensures that for speculative executions, `offset` will always appear to be within those bounds. There are also variants for `uint64_t`, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.

  template <typename T, typename... ZeroArgs>
  bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);

This function returns whether `pointer` is between `begin` and `end`, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat `pointer` and `args...` as zero (all `ZeroArgs` must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on `pointer` or `args...`; I'm not sure if that's true of `nospeculateload` or `SecureBoundedOffset`.

Notice that by not handling the load itself, these APIs avoid the `ptr`/`cmpptr` awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas `nospeculateload` can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.

Is there any way we could implement an interface like those efficiently on ARM?

In https://reviews.llvm.org/D41761#980755, @efriedma wrote:

> I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).

Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what `speculationsafeload` does?

> More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable.  For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value.  Without an IR/MIR model for speculation, we have no way to prevent this from happening.

When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by `speculationsafeload`", or "code that doesn't appear to need `speculationsafeload` in the first place"? If the former, could you give a more concrete example?

It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to `pointer` and `args..` in the case of an API like `IsPointerInRange`).

https://reviews.llvm.org/D41761