[LLVMdev] RFC: Proposal for Poison Semantics

Sun Feb 1 03:20:36 PST 2015

Let's consider Sanjoy's earlier example,

We start with the following program:

%x = add nuw i32 %m, %n
%y = zext i32 %x to i64
%s = lshr i64 %y, 32

If %m and %n are 2**32 - 1, then using your model: %x is entirely poison,
%y's top 32-bits would be zero and bottom 32-bits are poison and %s would
be enitrely zero.

This would mean that we couldn't transform the program into:

%m.wide = zext i32 %m to i64
%n.wide = zext i32 %n to i64
%z = add nuw i64 %m.wide, %n.wide
%s = lshr i64 %z, 32

Because we would have: %m.wide and %n.wide are 2**32 - 1, %z is 2**33 - 2,
%s is 1

That being said, we don't perform this transform today and I don't see why
we would want to.  I'd happily give up this flexibility if it meant that we
could keep everything else.

On Sun, Feb 1, 2015 at 2:44 AM, Bruce Hoult <bruce at hoult.org> wrote:

> I don't know how things work at the moment, but it seems to me that you
> can do lots of sensible things, and avoid lots of silly things, if you keep
> track of four possible values for each bit:
>
> - undef (the default)
> - poison
> - known to be 0
> - known to be 1
>
> This makes both David's and Chandler's examples work nicely if you assume:
>
> - ZEXT makes all the new bits known 0
> - SEXT makes all the new bits the same as the high bit
> - AND clears unknown and poison bits to known 0 if the other input is
> known 0
> - OR sets unknown and poison bits to known 1 if the other input is known 1
>
> Also things such as ZEXTing a poison i32 to i64 and then right shifting by
> 32 will result in all known 0 bits.
>
>
> On Sun, Feb 1, 2015 at 11:19 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>>
>> On Sun, Feb 1, 2015 at 1:57 AM, David Majnemer <david.majnemer at gmail.com>
>> wrote:
>>
>>>
>>> On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das <
>>> sanjoy at playingwithpointers.com> wrote:
>>>
>>>> > Ah, yes.  You are right, we cannot always assume that %y would be
>>>> zero in
>>>> > the second case.
>>>> > This wouldn't be the first time we've lost information that we could
>>>> use to
>>>> > optimize a program by transforming it.
>>>> >
>>>> > Do you think this result would be problematic?  It seems consistent
>>>> with the
>>>> > RFC and LLVM's current behavior.
>>>> >
>>>>
>>>> The problem is not that we're losing information, the problem is that
>>>> we're changing the behavior of a well-defined program.
>>>>
>>>> I'll try to put the whole argument in one place:
>>>>
>>>> We start with
>>>>
>>>>   %x = add nuw i32 %m, %n
>>>>   %y = zext i32 %x to i64
>>>>   %s = lshr i64 %y, 32
>>>>   %addr = gep %some_global, %s
>>>>   store i32 42, i32* %addr
>>>>
>>>> In the above program, for all values of %x, %s is 0.  This means the
>>>> program is well-defined when %x is poison (since you don't need to
>>>> look at %x to determine the value of %addr, in the same sense as you
>>>> don't need to look at X to determine the value of "and X, 0"); and it
>>>> stores 42 to &(%some_global)[0].  Specifically, the above program is
>>>> well defined for "%m = %n = 2^32-1".
>>>>
>>>> Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw
>>>> (zext X) (zext Y)" then we get
>>>>
>>>>   %m.wide = zext i32 %m to i64
>>>>   %n.wide = zext i32 %n to i64
>>>>   %z = add nuw i64 %m.wide, %n.wide
>>>>   %s = lshr i64 %y, 32
>>>>   %addr = gep %some_global, %s
>>>>   store i32 42, i32* %addr
>>>>
>>>> The new program does *not* have the same behavior as the old program
>>>> for "%m = %n = 2^32-1".  We have changed the behavior of a
>>>> well-defined program by doing the "zext (add nuw X Y)" => "add nuw
>>>> (zext X) (zext Y)" transform.
>>>>
>>>
>>> After some pondering and combing through LLVM's implementation, I think
>>> we must conclude that zexting a value with any poison bits creates poison
>>> in every new bit.
>>>
>>> Considering the following program:
>>>
>>> %zext = zext i32 %x to i64
>>> %icmp = icmp i64 %zext, i64 1
>>>
>>> we'd like to transform this to:
>>>
>>> %icmp = icmp i32 %x, i32 1
>>>
>>> Is it reasonable to say that '%icmp' in the before case is not poison
>>> but '%icmp' in the after case is poison?  LLVM assumes it can remove casts
>>> with impunity, I think this is a useful property to maintain.
>>>
>>
>> FWIW, I agree with your statement.
>>
>> Here is the line of reasoning that I find troubling.
>>
>> If we accept the above, we have a surprising result (using small
>> bit-width integers to make it easier to read)
>>
>> %zext = zext i1 %x to i2
>> %and = and i2 %zext, 1
>>
>> We cannot replace %and with %zext because the %and might be removing
>> poison.
>>
>> Perhaps this restriction is OK though. I just find it somewhat troubling.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150201/9200055a/attachment.html>