[LLVMdev] RFC: Proposal for Poison Semantics

Sanjoy Das sanjoy at playingwithpointers.com
Wed Jan 28 21:33:05 PST 2015


If we follow the RFC, the following program has UB if X is poison but is
well defined if X is undef:

%loc = select i1 X, %global0, %global1
store 42 to %loc

(Assuming both global0 and global1 can be legally stored to and are
different locations.)

-- Sanjoy

Sent from a mobile device, please excuse typos.
On Jan 28, 2015 9:02 PM, "Philip Reames" <listmail at philipreames.com> wrote:

>  On 01/28/2015 07:02 AM, Sean Silva wrote:
>
> Could you maybe provide an example where replacing `%always_poison` with
> `undef` will change the meaning? At least for me, the thing that I'm most
> unclear about is how poison differs from undef.
>
> I will second this request for much the same reason.
>
>
>  -- Sean Silva
>
> On Wed, Jan 28, 2015 at 2:50 AM, David Majnemer <david.majnemer at gmail.com>
> wrote:
>
>> Hello,
>>
>>  What follows is my attempt to describe how poison works.  Let me know
>> what you think.
>>
>>  --
>> David
>>
>>
>>  # LLVM Poison Semantics
>>
>>  Poison is an LLVM concept which exists solely to enable further
>> optimization of LLVM IR. The exact behavior of poison has been, to say the
>> least, confusing for users, researchers and engineers working with LLVM.
>>
>>  This document hopes to clear up some of the confusion of poison and
>> hopefully explain *why* it has its semantics.
>>
>>  ## A Quick Introduction to Poison
>>
>>  Let's start with a concrete motivating example in C:
>> ```
>> int isSumGreater(int a, int b) {
>>   return a + b > a;
>> }
>> ```
>>
>>  The C specification permits us to optimize the comparison in
>> `isSumGreater` to `b > 0` because signed overflow results in undefined
>> behavior.  A reasonable translation of `isSumGreater` to LLVM IR could be:
>>
>>  ```
>> define i32 @isSumGreater(i32 %a, i32 %b) {
>> entry:
>>   %add = add i32 %a, %b
>>   %cmp = icmp sgt i32 %add, %a
>>   %conv = zext i1 %cmp to i32
>>   ret i32 %conv
>> }
>> ```
>>
>>  However, LLVM cannot determine that `%cmp` should not consider cases
>> where `%add` resulted in signed overflow.  We need a way to communicate
>> this information to LLVM.
>>
>>  This is where the `nsw` and `nuw` flags come into play.  `nsw` is short
>> for "no signed wrap", `nuw` is short for "no unsigned wrap".
>>
>>  With these, we can come up with a new formulation of `%add`: `add i32
>> nsw %a, %b`.
>> LLVM can take this into account when it is optimizing the `%cmp` and
>> replace it with: `icmp sgt i32 %b, 0`.
>>
>>  ## Differences Between LLVM and C/C++
>>
>>  There are some interesting differences between what C++ and C specify
>> and how LLVM behaves with respect to performing an operationg which is not
>> permitted to overflow.
>>
>>  Perhaps chief among them is that evaluating an expression in C++ or C
>> which results performs an overflow is undefined behavior. In LLVM,
>> executing an instruction which is marked `nsw` but which violates signed
>> overflow results in poison. Values which have no relationship with poisoned
>> values are not effected by them.
>>
>>  Let us take the following C program into consideration:
>> ```
>> int calculateImportantResult(int a, int b) {
>>   int result = 0;
>>   if (a) {
>>     result = a + b;
>>   }
>>   return result;
>> }
>> ```
>>
>>  A straightforward lowering to LLVM IR could be:
>> ```
>> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>> entry:
>>   %tobool = icmp ne i32 %a, 0
>>   br i1 %tobool, label %if.then, label %if.end
>>
>>  if.then:
>>   %add = add nsw i32 %a, %b
>>   br label %if.end
>>
>>  if.end:
>>   %result = phi i32 [ %add, %if.then ], [ 0, %entry ]
>>   ret i32 %result
>> }
>> ```
>>
>>  Moving `%add` to the `%entry` block would be preferable and would allow
>> further optimizations:
>> ```
>> define i32 @calculateImportantResult(i32 %a, i32 %b) {
>> entry:
>>   %tobool = icmp ne i32 %a, 0
>>   %add = add nsw i32 %a, %b
>>   %result = select i1 %tobool, i32 0, i32 %add
>>   ret i32 %result
>> }
>> ```
>>
>>  In the original code, the calculation of `%add` was control dependent.
>> Now, `%add` might result in signed overflow in violation of the `nsw`
>> flag.
>> Despite this, the program should behave as it did before because the
>> poisoned value is masked-out by the select. The next section will dive into
>> this in greater detail.
>>
>>  # Computation Involving Poison Values
>> Poison in a computation results in poison if the result cannot be
>> constrained by its non-poison operands.
>>
>>  Examples of this rule which will result in poison:
>> ```
>>   %add = add i32 %x, %always_poison
>>   %sub = sub i32 %x, %always_poison
>>   %xor = xor i32 %x, %always_poison
>>   %mul = mul i32 %always_poison, 1
>> ```
>>
>>  Examples of this rule which do not result in poison:
>> ```
>>   %or  = or  i32 %always_poison, 2
>>   %and = and i32 %always_poison, 2
>>   %mul = mul i32 %always_poison, 0
>> ```
>>
>>  In fact, it would be reasonable to optimize `%or` to `2` and `%and` to
>> `0`.  In this respect, poison is not different from `undef`.
>>
>>  The following example is only poison if `%cond` is false:
>> ```
>>   %sel = select i1 %cond, i32 2, %always_poison
>> ```
>>
>>  ### Is it safe to have poison as a `call` argument?
>>
>>  A `call` instruction may or may not result in poison depending on
>> exactly how the callee  uses the supplied arguments, it is not necessarily
>> the case that `call i32 @someFunction(i32 %always_poison)` results in
>> poison.
>>
>>  LLVM cannot forbid poison from entering `call` arguments without
>> prohibiting an optimization pass from outlining code.
>>
>>  ### Is it safe to store poison to memory?
>>
>>  `store i32 %always_poison, i32* %mem` does not result in undefined
>> behavior. A subsequent load instruction like `%load = load i32* %mem` will
>> result in `%load` being a poison value.
>>
>>  ### Is it safe to load or store a poison memory location?
>>
>>  No.  Poison works just like `undef` in this respect.
>>
>>  ### Does comparing a poison value result in poison?
>>
>>  It depends.  If the comparison couldn't solely be determined by looking
>> at the other operand, the result is poison.
>>
>>  For example, `icmp i32 ule %always_poison, 4294967295` is `true`, not
>> poison.
>> However, `icmp i32 ne %always_poison, 7` is poison.
>>
>>  ### What if the condition operand in a `select` is poison?
>>
>>  In the example `%sel = select i1 %always_poison, i1 true, false`,
>> `%sel` is either `true` or `false`.  Because, `%sel` depends on
>> `%always_poison` it too is poison.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/fb366fc5/attachment.html>


More information about the llvm-dev mailing list