[LLVMdev] RFC: Proposal for Poison Semantics

Tue Jan 27 19:23:12 PST 2015

Hi David,

I spent some time thinking about poison semantics this way, but here
is where I always get stuck:

Consider the IR fragment

  %x = zext i32 %maybe_poison to i64
  %y = lshr i64 %x 32
  %ptr = gep %global, %y
  store 42 to %ptr

If %maybe_poison is poison, then is %y poison?  For all i32 values of
%maybe_poison, %y is i64 0, so in some sense you can determine the
value %y without looking at %x.  Given that, the store in the above
fragment is not undefined behavior even if %maybe_poison is poison.
However, this means if "%maybe_poison" is "add nuw %m, %n" it cannot
be optimized to "add nuw (zext %m) (zext %n)" since that will change
program behavior in an otherwise well-defined program.

One way out of this is to say that zext and sext of a value that is
dependent on poison is poison.  We'll have to do something similar for
load shearing.

The above "problem" can also be seen in

> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by
> looking at the other operand, the result is poison.

This means we cannot do the C-style optimization "int a = ...; a <
(a + 1)" ==> "true".  In the case "a + 1" overflows, a is
INT_SIGNED_MAX.  But if a is INT_SIGNED_MAX, "a < X" is always false,
for all values of X.  So "a < a + 1" is defined; and it is true for "a
!= INT_SIGNED_MAX" but false for "a == INT_SIGNED_MAX".  Hence the
expression cannot be folded to true.

I think the reason why poison is hard to formalize is that it
fundamentally tries to give an N bit value behavior that cannot be
justified by _any_ N bit value.  It "breaks" llvm's type system.

-- Sanjoy

On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer
<david.majnemer at gmail.com> wrote:
> Hello,
>
> What follows is my attempt to describe how poison works.  Let me know what
> you think.
>
> --
> David
>
>
> # LLVM Poison Semantics
>
> Poison is an LLVM concept which exists solely to enable further optimization
> of LLVM IR. The exact behavior of poison has been, to say the least,
> confusing for users, researchers and engineers working with LLVM.
>
> This document hopes to clear up some of the confusion of poison and
> hopefully explain *why* it has its semantics.
>
> ## A Quick Introduction to Poison
>
> Let's start with a concrete motivating example in C:
> ```
> int isSumGreater(int a, int b) {
>   return a + b > a;
> }
> ```
>
> The C specification permits us to optimize the comparison in `isSumGreater`
> to `b > 0` because signed overflow results in undefined behavior.  A
> reasonable translation of `isSumGreater` to LLVM IR could be:
>
> ```
> define i32 @isSumGreater(i32 %a, i32 %b) {
> entry:
>   %add = add i32 %a, %b
>   %cmp = icmp sgt i32 %add, %a
>   %conv = zext i1 %cmp to i32
>   ret i32 %conv
> }
> ```
>
> However, LLVM cannot determine that `%cmp` should not consider cases where
> `%add` resulted in signed overflow.  We need a way to communicate this
> information to LLVM.
>
> This is where the `nsw` and `nuw` flags come into play.  `nsw` is short for
> "no signed wrap", `nuw` is short for "no unsigned wrap".
>
> With these, we can come up with a new formulation of `%add`: `add i32 nsw
> %a, %b`.
> LLVM can take this into account when it is optimizing the `%cmp` and replace
> it with: `icmp sgt i32 %b, 0`.
>
> ## Differences Between LLVM and C/C++
>
> There are some interesting differences between what C++ and C specify and
> how LLVM behaves with respect to performing an operationg which is not
> permitted to overflow.
>
> Perhaps chief among them is that evaluating an expression in C++ or C which
> results performs an overflow is undefined behavior. In LLVM, executing an
> instruction which is marked `nsw` but which violates signed overflow results
> in poison. Values which have no relationship with poisoned values are not
> effected by them.
>
> Let us take the following C program into consideration:
> ```
> int calculateImportantResult(int a, int b) {
>   int result = 0;
>   if (a) {
>     result = a + b;
>   }
>   return result;
> }
> ```
>
> A straightforward lowering to LLVM IR could be:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   br i1 %tobool, label %if.then, label %if.end
>
> if.then:
>   %add = add nsw i32 %a, %b
>   br label %if.end
>
> if.end:
>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>   ret i32 %result
> }
> ```
>
> Moving `%add` to the `%entry` block would be preferable and would allow
> further optimizations:
> ```
> define i32 @calculateImportantResult(i32 %a, i32 %b) {
> entry:
>   %tobool = icmp ne i32 %a, 0
>   %add = add nsw i32 %a, %b
>   %result = select i1 %tobool, i32 0, i32 %add
>   ret i32 %result
> }
> ```
>
> In the original code, the calculation of `%add` was control dependent.
> Now, `%add` might result in signed overflow in violation of the `nsw` flag.
> Despite this, the program should behave as it did before because the
> poisoned value is masked-out by the select. The next section will dive into
> this in greater detail.
>
> # Computation Involving Poison Values
> Poison in a computation results in poison if the result cannot be
> constrained by its non-poison operands.
>
> Examples of this rule which will result in poison:
> ```
>   %add = add i32 %x, %always_poison
>   %sub = sub i32 %x, %always_poison
>   %xor = xor i32 %x, %always_poison
>   %mul = mul i32 %always_poison, 1
> ```
>
> Examples of this rule which do not result in poison:
> ```
>   %or  = or  i32 %always_poison, 2
>   %and = and i32 %always_poison, 2
>   %mul = mul i32 %always_poison, 0
> ```
>
> In fact, it would be reasonable to optimize `%or` to `2` and `%and` to `0`.
> In this respect, poison is not different from `undef`.
>
> The following example is only poison if `%cond` is false:
> ```
>   %sel = select i1 %cond, i32 2, %always_poison
> ```
>
> ### Is it safe to have poison as a `call` argument?
>
> A `call` instruction may or may not result in poison depending on exactly
> how the callee  uses the supplied arguments, it is not necessarily the case
> that `call i32 @someFunction(i32 %always_poison)` results in poison.
>
> LLVM cannot forbid poison from entering `call` arguments without prohibiting
> an optimization pass from outlining code.
>
> ### Is it safe to store poison to memory?
>
> `store i32 %always_poison, i32* %mem` does not result in undefined behavior.
> A subsequent load instruction like `%load = load i32* %mem` will result in
> `%load` being a poison value.
>
> ### Is it safe to load or store a poison memory location?
>
> No.  Poison works just like `undef` in this respect.
>
> ### Does comparing a poison value result in poison?
>
> It depends.  If the comparison couldn't solely be determined by looking at
> the other operand, the result is poison.
>
> For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> poison.
> However, `icmp i32 ne %always_poison, 7` is poison.
>
> ### What if the condition operand in a `select` is poison?
>
> In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel` is
> either `true` or `false`.  Because, `%sel` depends on `%always_poison` it
> too is poison.