[LLVMdev] RFC: Proposal for Poison Semantics

David Majnemer david.majnemer at gmail.com
Tue Jan 27 19:44:24 PST 2015


On Tue, Jan 27, 2015 at 7:23 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
wrote:

> Hi David,
>
> I spent some time thinking about poison semantics this way, but here
> is where I always get stuck:
>
> Consider the IR fragment
>
>   %x = zext i32 %maybe_poison to i64
>   %y = lshr i64 %x 32
>   %ptr = gep %global, %y
>   store 42 to %ptr
>
> If %maybe_poison is poison, then is %y poison?  For all i32 values of
> %maybe_poison, %y is i64 0, so in some sense you can determine the
> value %y without looking at %x.


I agree, this makes sense.


> Given that, the store in the above
> fragment is not undefined behavior even if %maybe_poison is poison.
> However, this means if "%maybe_poison" is "add nuw %m, %n" it cannot
> be optimized to "add nuw (zext %m) (zext %n)" since that will change
> program behavior in an otherwise well-defined program.
>

Hmm, I'm not so sure this is right.

Are we talking about transforming:
%add = add nuw i32 %x, %y
%res = zext i32 %add to i64

to:
%z1 = zext i32 %x to i64
%z2 = zext i32 %y to i64
%res = add nuw i64 %z1, %z2

?

This is OK because performing a zext by that many bits cannot result in a
NUW violation.


>
> One way out of this is to say that zext and sext of a value that is
> dependent on poison is poison.  We'll have to do something similar for
> load shearing.
>

sext must be dependent on the underlying value because it splats the sign
bit.


>
> The above "problem" can also be seen in
>
> > ### Does comparing a poison value result in poison?
> >
> > It depends.  If the comparison couldn't solely be determined by
> > looking at the other operand, the result is poison.
>
> This means we cannot do the C-style optimization "int a = ...; a <
> (a + 1)" ==> "true".  In the case "a + 1" overflows, a is
> INT_SIGNED_MAX.  But if a is INT_SIGNED_MAX, "a < X" is always false,
> for all values of X.  So "a < a + 1" is defined; and it is true for "a
> != INT_SIGNED_MAX" but false for "a == INT_SIGNED_MAX".  Hence the
> expression cannot be folded to true.
>

This part sounds tricky, I'm not immediately sure what to do.


>
>
> I think the reason why poison is hard to formalize is that it
> fundamentally tries to give an N bit value behavior that cannot be
> justified by _any_ N bit value.  It "breaks" llvm's type system.
>
> -- Sanjoy
>
>
> On Tue, Jan 27, 2015 at 6:50 PM, David Majnemer
> <david.majnemer at gmail.com> wrote:
> > Hello,
> >
> > What follows is my attempt to describe how poison works.  Let me know
> what
> > you think.
> >
> > --
> > David
> >
> >
> > # LLVM Poison Semantics
> >
> > Poison is an LLVM concept which exists solely to enable further
> optimization
> > of LLVM IR. The exact behavior of poison has been, to say the least,
> > confusing for users, researchers and engineers working with LLVM.
> >
> > This document hopes to clear up some of the confusion of poison and
> > hopefully explain *why* it has its semantics.
> >
> > ## A Quick Introduction to Poison
> >
> > Let's start with a concrete motivating example in C:
> > ```
> > int isSumGreater(int a, int b) {
> >   return a + b > a;
> > }
> > ```
> >
> > The C specification permits us to optimize the comparison in
> `isSumGreater`
> > to `b > 0` because signed overflow results in undefined behavior.  A
> > reasonable translation of `isSumGreater` to LLVM IR could be:
> >
> > ```
> > define i32 @isSumGreater(i32 %a, i32 %b) {
> > entry:
> >   %add = add i32 %a, %b
> >   %cmp = icmp sgt i32 %add, %a
> >   %conv = zext i1 %cmp to i32
> >   ret i32 %conv
> > }
> > ```
> >
> > However, LLVM cannot determine that `%cmp` should not consider cases
> where
> > `%add` resulted in signed overflow.  We need a way to communicate this
> > information to LLVM.
> >
> > This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
> for
> > "no signed wrap", `nuw` is short for "no unsigned wrap".
> >
> > With these, we can come up with a new formulation of `%add`: `add i32 nsw
> > %a, %b`.
> > LLVM can take this into account when it is optimizing the `%cmp` and
> replace
> > it with: `icmp sgt i32 %b, 0`.
> >
> > ## Differences Between LLVM and C/C++
> >
> > There are some interesting differences between what C++ and C specify and
> > how LLVM behaves with respect to performing an operationg which is not
> > permitted to overflow.
> >
> > Perhaps chief among them is that evaluating an expression in C++ or C
> which
> > results performs an overflow is undefined behavior. In LLVM, executing an
> > instruction which is marked `nsw` but which violates signed overflow
> results
> > in poison. Values which have no relationship with poisoned values are not
> > effected by them.
> >
> > Let us take the following C program into consideration:
> > ```
> > int calculateImportantResult(int a, int b) {
> >   int result = 0;
> >   if (a) {
> >     result = a + b;
> >   }
> >   return result;
> > }
> > ```
> >
> > A straightforward lowering to LLVM IR could be:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   br i1 %tobool, label %if.then, label %if.end
> >
> > if.then:
> >   %add = add nsw i32 %a, %b
> >   br label %if.end
> >
> > if.end:
> >   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
> >   ret i32 %result
> > }
> > ```
> >
> > Moving `%add` to the `%entry` block would be preferable and would allow
> > further optimizations:
> > ```
> > define i32 @calculateImportantResult(i32 %a, i32 %b) {
> > entry:
> >   %tobool = icmp ne i32 %a, 0
> >   %add = add nsw i32 %a, %b
> >   %result = select i1 %tobool, i32 0, i32 %add
> >   ret i32 %result
> > }
> > ```
> >
> > In the original code, the calculation of `%add` was control dependent.
> > Now, `%add` might result in signed overflow in violation of the `nsw`
> flag.
> > Despite this, the program should behave as it did before because the
> > poisoned value is masked-out by the select. The next section will dive
> into
> > this in greater detail.
> >
> > # Computation Involving Poison Values
> > Poison in a computation results in poison if the result cannot be
> > constrained by its non-poison operands.
> >
> > Examples of this rule which will result in poison:
> > ```
> >   %add = add i32 %x, %always_poison
> >   %sub = sub i32 %x, %always_poison
> >   %xor = xor i32 %x, %always_poison
> >   %mul = mul i32 %always_poison, 1
> > ```
> >
> > Examples of this rule which do not result in poison:
> > ```
> >   %or  = or  i32 %always_poison, 2
> >   %and = and i32 %always_poison, 2
> >   %mul = mul i32 %always_poison, 0
> > ```
> >
> > In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
> `0`.
> > In this respect, poison is not different from `undef`.
> >
> > The following example is only poison if `%cond` is false:
> > ```
> >   %sel = select i1 %cond, i32 2, %always_poison
> > ```
> >
> > ### Is it safe to have poison as a `call` argument?
> >
> > A `call` instruction may or may not result in poison depending on exactly
> > how the callee  uses the supplied arguments, it is not necessarily the
> case
> > that `call i32 @someFunction(i32 %always_poison)` results in poison.
> >
> > LLVM cannot forbid poison from entering `call` arguments without
> prohibiting
> > an optimization pass from outlining code.
> >
> > ### Is it safe to store poison to memory?
> >
> > `store i32 %always_poison, i32* %mem` does not result in undefined
> behavior.
> > A subsequent load instruction like `%load = load i32* %mem` will result
> in
> > `%load` being a poison value.
> >
> > ### Is it safe to load or store a poison memory location?
> >
> > No.  Poison works just like `undef` in this respect.
> >
> > ### Does comparing a poison value result in poison?
> >
> > It depends.  If the comparison couldn't solely be determined by looking
> at
> > the other operand, the result is poison.
> >
> > For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
> > poison.
> > However, `icmp i32 ne %always_poison, 7` is poison.
> >
> > ### What if the condition operand in a `select` is poison?
> >
> > In the example `%sel = select i1 %always_poison, i1 true, false`, `%sel`
> is
> > either `true` or `false`.  Because, `%sel` depends on `%always_poison` it
> > too is poison.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/106edad0/attachment.html>


More information about the llvm-dev mailing list