[LLVMdev] Proposal for Poison Semantics

David Majnemer david.majnemer at gmail.com
Tue Feb 3 09:36:23 PST 2015


On Tue, Feb 3, 2015 at 3:15 AM, Nuno Lopes <nuno.lopes at ist.utl.pt> wrote:

> Hi,
>
>
>
> Thanks David for putting up this proposal together!
>
> I like the idea of having poison values behave more like undef (i.e., per
> bit, with run-time behavior).
>
> One of the problems this proposal solves is speculation of 'a && b' into
> 'a & b'. Currently this is illegal (despite sometimes simplifycfg doing it
> anyway).
>
> It also fixes bugs like http://llvm.org/PR20997
>
>
>
> The proposal doesn't say anything about branching on a poison value. I
> assume this should stay as the current interpretation -- that such branches
> should be undefined behavior (since we cannot branch to multiple places at
> the same time -- even if they would compute the same values; that's already
> too hard for the compiler to analyze).
>

The RFC intended to make branching on poison values OK.  If branching on
poison wasn't OK, then we couldn't go from select to -> br/phi.


>
>
> There's another caveat: it *does* seem to fix the problem described by Dan
> in http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046152.html
>
> However, it introduces a potential performance penalty: we won't be able
> to speculate instructions with undefined behavior whose input may be poison.
>
>
>
> For example, take the following code:
>
> loop:
>
>   %add = add nsw %x, %y
>
>   %div = udiv %z, %add
>
>   … use %div only in the case that %add does not overflow and is non-zero
>
>
>
> We can move the %add outside of the loop, but we cannot move the division.
> With the reason being that if %add overflows, then %add is poison and
> therefore it can take any value (in particular, it can be 0), triggering
> undef behavior in %div.  Therefore, we cannot freely move %div, unless we
> can prove that %add will never be 0 nor poison.  This sounds hard for the
> compiler to do, and I guess we'll have some regressions (e.g., LICM has to
> be more conservative). Nevertheless, I'm all for fixing poison once and for
> all!
>

Believe it or not, I already fixed this bug (PR21412). :)


>
>
> BTW, would it help if I produced a version of Alive that implements the
> semantics being proposed?  (with no performance guarantees for this
> prototype).  The cool thing is that then we can run it through our database
> of 300+ InstCombine optimizations and see which ones would have to be
> removed/fixed.
>

I think such a thing would be great.  However, there is a problem that the
RFC wasn't aware of when it was written:

consider:
%S = select %A, %B, undef

without us knowing anything about %A or %B, we will replace all uses of %S
with %B.  This transform would be considered wrong with the RFC in mind.

If this transform was valid, there could not be any value or value-like
property in LLVM with semantics more powerful than undef.  This makes me
think that what LLVM *actually* implements is not poison or something like
it.

On the flip side, we could say that this transform is nonsense but I'd
rather not pessimize LLVM like that.


>
>
> Nuno
>
>
>
>
>
> *From:* David Majnemer
> *Sent:* 28 January 2015 02:50
> *To:* llvmdev at cs.uiuc.edu
> *Cc:* Sanjoy Das; Dan Gohman; John Regehr; Nuno Lopes
> *Subject:* RFC: Proposal for Poison Semantics
>
>
>
> Hello,
>
>
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
>
>
> --
>
> David
>
>
>
>
>
> # LLVM Poison Semantics
>
>
>
> Poison is an LLVM concept which exists solely to enable further
> optimization of LLVM IR. The exact behavior of poison has been, to say the
> least, confusing for users, researchers and engineers working with LLVM.
>
>
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
>
>
> ## A Quick Introduction to Poison
>
>
>
> Let's start with a concrete motivating example in C:
>
> ```
>
> int isSumGreater(int a, int b) {
>
>   return a + b > a;
>
> }
>
> ```
>
>
>
> The C specification permits us to optimize the comparison in
> `isSumGreater` to `b > 0` because signed overflow results in undefined
> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>
>
>
> ```
>
> define i32 @isSumGreater(i32 %a, i32 %b) {
>
> entry:
>
>   %add = add i32 %a, %b
>
>   %cmp = icmp sgt i32 %add, %a
>
>   %conv = zext i1 %cmp to i32
>
>   ret i32 %conv
>
> }
>
> ```
>
>
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
>
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
> for "no signed wrap", `nuw` is short for "no unsigned wrap".
>
>
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
>
> LLVM can take this into account when it is optimizing the `%cmp` and
> replace it with: `icmp sgt i32 %b, 0`.
>
>
>
> ## Differences Between LLVM and C/C++
>
>
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
>
>
> Perhaps chief among them is that evaluating an expression in C++ or C
> which results performs an overflow is undefined behavior. In LLVM,
> executing an instruction which is marked `nsw` but which violates signed
> overflow results in poison. Values which have no relationship with poisoned
> values are not effected by them.
>
>
>
> Let us take the following C program into consideration:
>
> ```
>
> int calculateImportantResult(int a, int b) {
>
>   int result = 0;
>
>   if (a) {
>
>     result = a + b;
>
>   }
>
>   return result;
>
> }
>
> ```
>
>
>
> A straightforward lowering to LLVM IR could be:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   br i1 %tobool, label %if.then, label %if.end
>
>
>
> if.then:
>
>   %add = add nsw i32 %a, %b
>
>   br label %if.end
>
>
>
> if.end:
>
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
>
> ```
>
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>
> entry:
>
>   %tobool = icmp ne i32 %a, 0
>
>   %add = add nsw i32 %a, %b
>
>   %result = select i1 %tobool, i32 0, i32 %add
>
>   ret i32 %result
>
> }
>
> ```
>
>
>
> In the original code, the calculation of `%add` was control dependent.
>
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
>
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
>
>
> # Computation Involving Poison Values
>
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
>
>
> Examples of this rule which will result in poison:
>
> ```
>
>   %add = add i32 %x, %always_poison
>
>   %sub = sub i32 %x, %always_poison
>
>   %xor = xor i32 %x, %always_poison
>
>   %mul = mul i32 %always_poison, 1
>
> ```
>
>
>
> Examples of this rule which do not result in poison:
>
> ```
>
>   %or  = or  i32 %always_poison, 2
>
>   %and = and i32 %always_poison, 2
>
>   %mul = mul i32 %always_poison, 0
>
> ```
>
>
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.  In this respect, poison is not different from `undef`.
>
>
>
> The following example is only poison if `%cond` is false:
>
> ```
>
>   %sel = select i1 %cond, i32 2, %always_poison
>
> ```
>
>
>
> ### Is it safe to have poison as a `call` argument?
>
>
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
>
>
> LLVM cannot forbid poison from entering `call` arguments without
> prohibiting an optimization pass from outlining code.
>
>
>
> ### Is it safe to store poison to memory?
>
>
>
> `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior. A subsequent load instruction like `%load = load i32* %mem` will
> result in `%load` being a poison value.
>
>
>
> ### Is it safe to load or store a poison memory location?
>
>
>
> No.  Poison works just like `undef` in this respect.
>
>
>
> ### Does comparing a poison value result in poison?
>
>
>
> It depends.  If the comparison couldn't solely be determined by looking at
> the other operand, the result is poison.
>
>
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
>
> However, `icmp i32 ne %always_poison, 7` is poison.
>
>
>
> ### What if the condition operand in a `select` is poison?
>
>
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel`
> is either `true` or `false`.  Because, `%sel` depends on `%always_poison`
> it too is poison.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/9a8703ad/attachment.html>


More information about the llvm-dev mailing list