[LLVMdev] RFC: Proposal for Poison Semantics

Sanjoy Das sanjoy at playingwithpointers.com
Thu Jan 29 22:14:45 PST 2015


Here's an idea for a slightly unusual framework for poison semantics:
we do it in two steps --


1. for every bit in the program, we define a second "shadow bit",
is-poison.  We define the semantics of LLVM IR using this is-poison
relation.  So, for instance, we could say if there is a bit 'b'in
address 'a' such that if is-poison['b'], then "store X to 'a'" is
undefined behavior.

2. we prove that there is no need to track is-poison for a
well-defined program.  IOW, any program whose evaluation will actually
require us to do the book-keeping for is-poison will eventually run
into undefined behavior.

I think such an approach will explicitly address the issue that we're
using an N bit value to track 2^N+1 possibilities.  And this is also a
sanity check -- since the is-poison relation is clearly made up and
does not *really exist*, if we cannot prove (2) then we did something
wrong in specifying step (1).

-- Sanjoy


On Thu, Jan 29, 2015 at 10:05 PM, Sanjoy Das
<sanjoy at playingwithpointers.com> wrote:
> On Thu, Jan 29, 2015 at 10:01 PM, Matthias Braun <matze at braunis.de> wrote:
>> But
>> (Poison > INT_MAX) <=> poison
>> contradicts
>> (X > INT_MAX) <=> false
>>
>> and I don't think you want to abandon the second rule just because x might be poison.
>
> Maybe we could define poison in such a way that it is safe to pretend
> it "is" false, as per our convenience.  In this sense, could be
> defined to be very similar to undef.
>
> -- Sanjoy
>
>>
>> - Matthias
>>
>>> On Jan 29, 2015, at 9:43 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>>
>>> One way around this is to say that there are some special
>>> instructions, icmp, sext and zext which produce a value solely
>>> composed of poison bits if any of their input bits is poison.  So
>>> `<poison> icmp X` is poison for any value of X, including INT_MAX.
>>> This is one way poison could be fundamentally different from undef.
>>>
>>> -- Sanjoy
>>>
>>>> On Thu, Jan 29, 2015 at 8:05 PM, Matthias Braun <matze at braunis.de> wrote:
>>>> Having though about this some more I think optimizing
>>>>
>>>> (x+1 > x) <=> true
>>>>
>>>> and at the same time modeling undefined behavior as a posion value is impossible. This is because we also want the following rule:
>>>>
>>>> (x > INT_MAX) <=> false
>>>>
>>>> Now if poison is a value, then the second replacement tells us (poison > INT_MAX) == false which contradicts the first rule.
>>>>
>>>>
>>>>
>>>> The only way out of this while still allowing (x+1>x)<=>true I can see at the moment is defining that add nsw does produce actual undefined behavior allowing us to “freely rewrite" the following > to true in the UB cases. Of course having add produce actual undefined behaviour greatly limits us in the way we can actually move the instruction around without changing program semantics.
>>>>
>>>> The only way allowing moving instructions and having add produce real UB I can see is that as soon as we start moving instructions around (specifically moving the add to a place it does not dominate or moving any other instruction over the add) we change the add to an instruction that does not produce real UB anymore; something like add swu = “add signed wrap gives undef”...
>>>>
>>>> - Matthias
>>>>
>>>>>> On Jan 29, 2015, at 11:29 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>>>>>
>>>>>> I've been discussing a model with David that might steer poison back towards
>>>>>> something that simply supports algebraic simplification. If we have a math
>>>>>> operation that cannot wrap, then it notionally produces as many bits of
>>>>>> undef as the operation could possibly produce. For example, "add nsw i8" can
>>>>>> produce an i9 undef, and "mul nsw i8" can produce an undefined 16 bit
>>>>>> bitpattern. This is strong enough to do things like "a + 1 > a  -->  true",
>>>>>> because on overflow of "a + 1" we can choose an poison value of "MAX_INT +
>>>>>> 1", even though that is not a valid i8 bit pattern.
>>>>>>
>>>>>> So, a short version would be that poison is like undef, except you get to
>>>>>> include the overflow bits of the computation in your undef value.
>>>>>
>>>>> I  suspect it will be hard to justify reg2mem is meaning preserving in
>>>>> this scheme.  But if this can be made to work, then I agree that this
>>>>> is the right thing to do -- an i32 poison effectively has the
>>>>> semantics of a wider type, and it makes sense to be explicit in that.
>>>>>
>>>>> -- Sanjoy
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list