[LLVMdev] RFC: Proposal for Poison Semantics

Thu Jan 29 22:59:53 PST 2015

This is actually how i tend to think of poison -- it is undef, plus a
shadow bit indicating the bits are poison. I think the important this is
defining how the various operations propagate poison, why, and why not.

On Thu, Jan 29, 2015 at 10:14 PM, Sanjoy Das <sanjoy at playingwithpointers.com
> wrote:

> Here's an idea for a slightly unusual framework for poison semantics:
> we do it in two steps --
>
>
> 1. for every bit in the program, we define a second "shadow bit",
> is-poison.  We define the semantics of LLVM IR using this is-poison
> relation.  So, for instance, we could say if there is a bit 'b'in
> address 'a' such that if is-poison['b'], then "store X to 'a'" is
> undefined behavior.
>
> 2. we prove that there is no need to track is-poison for a
> well-defined program.  IOW, any program whose evaluation will actually
> require us to do the book-keeping for is-poison will eventually run
> into undefined behavior.
>
> I think such an approach will explicitly address the issue that we're
> using an N bit value to track 2^N+1 possibilities.  And this is also a
> sanity check -- since the is-poison relation is clearly made up and
> does not *really exist*, if we cannot prove (2) then we did something
> wrong in specifying step (1).
>
> -- Sanjoy
>
>
> On Thu, Jan 29, 2015 at 10:05 PM, Sanjoy Das
> <sanjoy at playingwithpointers.com> wrote:
> > On Thu, Jan 29, 2015 at 10:01 PM, Matthias Braun <matze at braunis.de>
> wrote:
> >> But
> >> (Poison > INT_MAX) <=> poison
> >> contradicts
> >> (X > INT_MAX) <=> false
> >>
> >> and I don't think you want to abandon the second rule just because x
> might be poison.
> >
> > Maybe we could define poison in such a way that it is safe to pretend
> > it "is" false, as per our convenience.  In this sense, could be
> > defined to be very similar to undef.
> >
> > -- Sanjoy
> >
> >>
> >> - Matthias
> >>
> >>> On Jan 29, 2015, at 9:43 PM, Sanjoy Das <
> sanjoy at playingwithpointers.com> wrote:
> >>>
> >>> One way around this is to say that there are some special
> >>> instructions, icmp, sext and zext which produce a value solely
> >>> composed of poison bits if any of their input bits is poison.  So
> >>> `<poison> icmp X` is poison for any value of X, including INT_MAX.
> >>> This is one way poison could be fundamentally different from undef.
> >>>
> >>> -- Sanjoy
> >>>
> >>>> On Thu, Jan 29, 2015 at 8:05 PM, Matthias Braun <matze at braunis.de>
> wrote:
> >>>> Having though about this some more I think optimizing
> >>>>
> >>>> (x+1 > x) <=> true
> >>>>
> >>>> and at the same time modeling undefined behavior as a posion value is
> impossible. This is because we also want the following rule:
> >>>>
> >>>> (x > INT_MAX) <=> false
> >>>>
> >>>> Now if poison is a value, then the second replacement tells us
> (poison > INT_MAX) == false which contradicts the first rule.
> >>>>
> >>>>
> >>>>
> >>>> The only way out of this while still allowing (x+1>x)<=>true I can
> see at the moment is defining that add nsw does produce actual undefined
> behavior allowing us to “freely rewrite" the following > to true in the UB
> cases. Of course having add produce actual undefined behaviour greatly
> limits us in the way we can actually move the instruction around without
> changing program semantics.
> >>>>
> >>>> The only way allowing moving instructions and having add produce real
> UB I can see is that as soon as we start moving instructions around
> (specifically moving the add to a place it does not dominate or moving any
> other instruction over the add) we change the add to an instruction that
> does not produce real UB anymore; something like add swu = “add signed wrap
> gives undef”...
> >>>>
> >>>> - Matthias
> >>>>
> >>>>>> On Jan 29, 2015, at 11:29 AM, Sanjoy Das <
> sanjoy at playingwithpointers.com> wrote:
> >>>>>>
> >>>>>> I've been discussing a model with David that might steer poison
> back towards
> >>>>>> something that simply supports algebraic simplification. If we have
> a math
> >>>>>> operation that cannot wrap, then it notionally produces as many
> bits of
> >>>>>> undef as the operation could possibly produce. For example, "add
> nsw i8" can
> >>>>>> produce an i9 undef, and "mul nsw i8" can produce an undefined 16
> bit
> >>>>>> bitpattern. This is strong enough to do things like "a + 1 > a
> -->  true",
> >>>>>> because on overflow of "a + 1" we can choose an poison value of
> "MAX_INT +
> >>>>>> 1", even though that is not a valid i8 bit pattern.
> >>>>>>
> >>>>>> So, a short version would be that poison is like undef, except you
> get to
> >>>>>> include the overflow bits of the computation in your undef value.
> >>>>>
> >>>>> I  suspect it will be hard to justify reg2mem is meaning preserving
> in
> >>>>> this scheme.  But if this can be made to work, then I agree that this
> >>>>> is the right thing to do -- an i32 poison effectively has the
> >>>>> semantics of a wider type, and it makes sense to be explicit in that.
> >>>>>
> >>>>> -- Sanjoy
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/3363f529/attachment.html>