[LLVMdev] RFC: Proposal to Remove Poison

David Majnemer david.majnemer at gmail.com
Sun Feb 8 14:31:02 PST 2015


On Sun, Feb 8, 2015 at 1:57 PM, Owen Anderson <resistor at mac.com> wrote:

> Hi David,
>
> I have serious doubts about this approach.  This is essentially how undef
> was originally implemented in LLVM, back in the mists of time.  It led to a
> lot of bugs that Dan (and others) fixed in the 2008-2010 timeframe,
> culminating in the modern definition of undef, and poison as a
> complementary concept.  Unfortunately, I don’t have pointers to those bugs,
> but I imagine some codebase archeology could turn them up.
>

Having different notions of undef and poison are problematic as well.

This proposal is motivated from the observation that, in practice, we
transform:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %add = add nsw i32 %x, %y
  %sel = select i1 %C, i32 %add, i32 undef
  %cmp = icmp sgt i32 %sel, %x
  ret i1 %cmp
}

into:
define i1 @f(i32 %x, i32 %y, i1 %C) {
  %cmp = icmp sgt i32 %y, 0
  ret i1 %cmp
}

If undef isn't as strong as poison, this transformation is not valid.

Why is this the case? Consider that when %C is false, %sel will be undef.
This leaves us with:
  %cmp = icmp sgt i32 undef, %x

%x might be INT_MAX which means that %cmp must be optimized to false.


> —Owen
>
> > On Feb 8, 2015, at 2:23 AM, David Majnemer <david.majnemer at gmail.com>
> wrote:
> >
> > Hello,
> >
> > I'd like to offer an alternative solution to the "poison problem":
> remove it.
> >
> > What follows is rather informal.  I'd happily write up a nicer document
> if this RFC stands up to scrutiny.
> >
> > The idea was born from two observations:
> > - undef was introduced to model a load of uninitialized memory, a form
> of undefined behavior.
> > - poison was introduced to model integer overflow, another form of
> undefined behavior.
> >
> > I find it awkward that one form of uninitialized behavior is more or
> less powerful than another.
> >
> > We rely on the following properties for undef:
> > - undef is permitted to be an arbitrary bit-pattern.
> > - Instructions may or may not be value dependent on their undef operands.
> >
> > We rely on the following properties for poison:
> > - Instructions which have poison operands are capable of providing
> results which are not consistent with a two's complement operand.
> >
> > Here is a case where we believed distinguishing between poison and undef
> is essential:
> > %add = add nsw i32 %a, %b
> > %cmp = icmp sgt i32 %add, %a
> >
> > If %a is INT_MAX and %b is 1, then %add results in overflow.
> > If overflow results in undef, then %cmp would be equivalent to:
> > %cmp = icmp sgt i32 undef, INT_MIN
> >
> > There is no bit-pattern we could substitute for undef which would make
> %cmp true, this means that we cannot optimize %cmp to 'icmp sgt i32 %b, 0'.
> >
> > Poison was created to allow us to produce whatever value we'd like for
> %cmp if %add resulted in overflow.
> >
> > What I would like to do is to remove poison and replace it with undef
> while still optimizing comparison instructions.  The quick-and-dirty way of
> stating this is: an icmp with an undef operand produces undef regardless of
> its other operand.
> >
> > I believe this would allow us to exploit and optimize overflow in
> arithmetic flowing into comparison operations.
> >
> > Could this be problematic?
> >
> > Whether we intended to or not, LLVM's implementation of undef's
> semantics already allows us to say that a memory location is not equivalent
> to any possible bit-pattern.
> >
> > Consider the following program:
> > define i1 @f() {
> >   %mem = alloca i1
> >   %load = load i1* %mem
> >   %cmp = icmp eq i1 %load, false
> >   ret i1 %cmp
> > }
> >
> > Because it is possible for %load's undef to be either true or false,
> %cmp must be equivalent to undef as well.  This is completely consistent
> with the above rules.
> >
> > The following program is a little more interesting:
> > define i1 @g() {
> >   %mem = alloca i1
> >   %load = load i1* %mem
> >   %cmp0 = icmp eq i1 %load, 0
> >   %cmp1 = icmp eq i1 %load, 1
> >   %and = xor i1 %cmp0, %cmp1
> >   ret i1 %and
> > }
> >
> > The intent of this program is to compare a memory location against all
> possible values the location might have.
> >
> > If we ran LLVM's InstCombine pass then %and would have been replaced
> with the "expected" result of true.
> >
> > If we instead ran SROA over @g, we would be left with:
> >   %cmp0 = icmp eq i1 undef, false
> >   %cmp1 = icmp eq i1 undef, true
> >   %and = xor i1 %cmp0, %cmp1
> >
> > Now that %cmp0 and %cmp1 have different undef values, %and is now undef.
> > This result is sufficient to say that the contents of %mem are neither
> true nor false!
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150208/2d437e81/attachment.html>


More information about the llvm-dev mailing list