[PATCH] D41761: Introduce llvm.nospeculateload intrinsic

Fri Feb 2 14:55:39 PST 2018

chandlerc added a comment.

In https://reviews.llvm.org/D41761#991887, @kristof.beyls wrote:

> In https://reviews.llvm.org/D41761#989477, @gromer wrote:
>
> > (I hope you can bear with me while I come up to speed; I understand there's some time pressure here, so I'm erring on the side of speaking sooner)
> >
> > The proposed API for `nospeculateload` seems like it could be problematic for programmers, because it gives them no way to tell whether the bounds check passed, unless they are able to identify a `failval` sentinel that can never be the result of a successful load (and yet is safe to use as the speculative result of the load). Thus, it seems likely that in a lot of cases, the bounds check will be duplicated: once inside `nospeculateload`, and once in user code. That will make code using this API "feel" inefficient, which will tend to discourage programmers from using it. Furthermore, if it actually //is// inefficient, that will create pressure for optimization passes to "peek inside" `nospeculateload` in order to eliminate the duplication, and that seems like a can of worms we really don't want to open.
> >
> > Another way of putting it is that we probably want this API to be as primitive as possible, because the more logic we put inside the intrinsic, the greater the risk that some parts of it will be unsuited for some users. Consequently, we've been experimenting with APIs that are concerned solely with the bounds check, rather than with the subsequent load:
> >
> >   int64_t SecureBoundedOffset(int64_t offset, int64_t bound);
> >
> >
> > At the abstract machine level, this function just returns `offset`, and has the precondition that 0 <= `offset` < `bound`, but it also ensures that for speculative executions, `offset` will always appear to be within those bounds. There are also variants for `uint64_t`, and variants that take the base-2 log of the bound, for greater efficiency when the bound is a power of two.
> >
> >   template <typename T, typename... ZeroArgs>
> >   bool IsPointerInRange(T*& pointer, T* begin, T* end, ZeroArgs... args);
> >
> >
> > This function returns whether `pointer` is between `begin` and `end`, and also guarantees that if the function returns false, then any speculative execution which assumes it to be true will treat `pointer` and `args...` as zero (all `ZeroArgs` must be integers or pointers). Notice that this API allows the optimizer to hoist loads past the branch, so long as the loads don't depend on `pointer` or `args...`; I'm not sure if that's true of `nospeculateload` or `SecureBoundedOffset`.
> >
> > Notice that by not handling the load itself, these APIs avoid the `ptr`/`cmpptr` awkwardness, as well as the need for the user to designate a sentinel value. One tradeoff is that whereas `nospeculateload` can be thought of as a conditional load, plus some "security magic", these APIs are harder to understand without explicitly thinking about speculative execution. However, I'm not sure that's much of a problem in practice-- if you don't want to think about speculative execution, you probably shouldn't be using this API in the first place.
> >
> > Is there any way we could implement an interface like those efficiently on ARM?
> >
> > In https://reviews.llvm.org/D41761#980755, @efriedma wrote:
> >
> > > I don't think it's likely the compiler would intentionally introduce a load using the same pointer as the operand to a speculationsafeload; given most transforms don't understand what speculationsafeload does, the compiler has no reason to introduce code like that (even if it isn't technically prohibited by LangRef).
> >
> >
> > Do we need to explicitly prohibit it in LangRef so that future transformations don't start understanding too much about what `speculationsafeload` does?
> >
> > > More practically, I'm worried about the possibility that code which doesn't appear to be vulnerable at the source-code level will get lowered to assembly which is vulnerable.  For example, the compiler could produce code where the CPU speculates a load from an uninitialized pointer value.  Without an IR/MIR model for speculation, we have no way to prevent this from happening.
> >
> > When you say "code which doesn't appear to be vulnerable at the source-code level", do you mean "code that is explicitly protected by `speculationsafeload`", or "code that doesn't appear to need `speculationsafeload` in the first place"? If the former, could you give a more concrete example?
> >
> > It seems to me that we ought to be able to specify this intrinsic without having an explicit model of CPU speculation in the IR, because at the IR level we can just constrain the types of transformations that can be performed on this intrinsic. That way we only need to talk about speculation when we're specifying how this intrinsic is used to generate code for a specific platform that happens to feature CPU-level branch speculation. Very tentatively, I think the specific constraints on transformations that are needed here are that the intrinsic has unmodeled side-effects (so it can't be eliminated or executed speculatively), and that it should be treated as writing to all memory (or only to `pointer` and `args..` in the case of an API like `IsPointerInRange`).
>
>
> Thanks very much for sharing this, Geoff!
>  I have a few immediate questions/thoughts, and wonder what you and others here think about them:
>
> 1. Lowering to various instruction sets.
>
>   I think one of the questions to ask is whether the APIs here can be lowered well to different instruction sets. I believe that may be possible for Arm, but I'm still looking into it. It would be useful for us to have a few examples of how these APIs are envisioned to be used in practice, to make sure we understand the proposal well enough. E.g. maybe a few examples in the same spirit as under “More complex cases” at https://developer.arm.com/support/security-update/compiler-support-for-mitigations? Do you happen to have a suggestion of how this intrinsic would best be lowered on some instruction sets other than Arm? There was quite a bit of debate about the ability to efficiently lower the intrinsic in our proposal on the gcc mailing list, e.g. see https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01136.html.

Focusing on ProtectFromSpeculation API style, it can be lowered *very* efficiently on x86 at least:

    xor  %zero_reg, %zero_reg
    ...
    cmpq %a, 42
    jb below
    ja above
  equal:
    cmovne %zero_reg, %arg0
    cmovne %zero_reg, %arg1
    ...
    cmovne %zero_reg, %argN
    ...

  below:
    cmovnb %zero_reg, %arg0
    cmovnb %zero_reg, %arg1
    ...
    cmovnb %zero_reg, %argN
    ...

  above:
    cmovna %zero_reg, %arg0
    cmovna %zero_reg, %arg1
    ...
    cmovna %zero_reg, %argN
    ...

Especially in the common case of 1 or 2 arguments needing to be zeroed this ends up being quite nice code generation using the existing structure of the conditional branch.

> 
> 
> 2. Is the API available in C too?
> 
>   IIUC, the SecureBoundOffset intrinsic is usable from both C and C++, but the IsPointerInRange intrinsic can only be used from C++? Do you have ideas around bringing similar functionality to C?

Yeah, this is part of why I suggest the much more generic `ProtectFromSpeculation` API which I think is easily applicable in C. The C version might use pointers or whatever, but this kind of API doesn't fundamentally require any interesting lang

> 
> 
> 3. For the variant with a general predicate (`bool ProtectFromSpeculation(bool predicate, ZeroArgs&... args);`); do you have ideas about how to make sure that the optimizers in the compiler don’t simplify the predicate too much? For example:
> 
>   ``` if (a>7) { x = a; if (ProtectFromSpeculation(a>5, x) { ... = v[x]; ... } }
> 
>   ``` how to prevent this from being optimized to: ``` if (a>7) { x = a; if (ProtectFromSpeculation(true, x) { ... = v[x]; ... } } ``` which leads to no longer giving protection.

No matter what, this will require deep compiler support to implement. Even without the example you give, these construct fundamentally violate the rules the optimizer uses: they are by definition no-ops for execution of the program!

This means we will have to work to build up specific an dedicated infrastructure in the compiler to model these as having special semantics. That exact infrastructure can provide whatever optimization barriers are necessary to get the desired behavior. For example, the code generation I suggest above for x86 cannot be implemented in LLVM using its IR alone (I've actually tried). We'll have to model this both in the IR and even in the code generator specially in order to produce the kind of specific pattern that is necessary.

But there is also the question of what burden do we want to place on the user of these intrinsics vs. what performance hit we're willing to accept due to optimization barriers. I could imagine two approaches here:

1. It is the programmers responsibility to correctly protect any predicates that their application is sensitive to. As a consequence, if the `a>5` predicate is sensitive for the application, so must the `a>7` predicate be, and it is the programmers responsibility to protect both of them. This allows the implementation to have the minimal set of optimization barriers, but may make it difficult for programmers to use correctly.

2. The predicate provided to these APIs is truly special and is forced to be a *dynamic* predicate. That is, we require the compiler to emit the predicate as if no preconditions existed. There are ways to model this in LLVM and I assume any compiler. As a trivial (but obviously bad) example: all references to variables within the predicate could be lowered by rinsing that SSA value through an opaque construct like inline asm.

There is clearly a "programmer ease / security" vs. "better optimization" tradeoff between the two. If one isn't *clearly* the correct choice in all cases, we could even expose both behind separate APIs that try to make it clear the extent of protections provided.

Does that make sense?

https://reviews.llvm.org/D41761