[LLVMdev] RFC: Proposal to Remove Poison

Owen Anderson resistor at mac.com
Sun Feb 8 13:57:05 PST 2015


Hi David,

I have serious doubts about this approach.  This is essentially how undef was originally implemented in LLVM, back in the mists of time.  It led to a lot of bugs that Dan (and others) fixed in the 2008-2010 timeframe, culminating in the modern definition of undef, and poison as a complementary concept.  Unfortunately, I don’t have pointers to those bugs, but I imagine some codebase archeology could turn them up.

—Owen

> On Feb 8, 2015, at 2:23 AM, David Majnemer <david.majnemer at gmail.com> wrote:
> 
> Hello,
> 
> I'd like to offer an alternative solution to the "poison problem": remove it.
> 
> What follows is rather informal.  I'd happily write up a nicer document if this RFC stands up to scrutiny.
> 
> The idea was born from two observations:
> - undef was introduced to model a load of uninitialized memory, a form of undefined behavior.
> - poison was introduced to model integer overflow, another form of undefined behavior.
> 
> I find it awkward that one form of uninitialized behavior is more or less powerful than another.
> 
> We rely on the following properties for undef:
> - undef is permitted to be an arbitrary bit-pattern.
> - Instructions may or may not be value dependent on their undef operands.
> 
> We rely on the following properties for poison:
> - Instructions which have poison operands are capable of providing results which are not consistent with a two's complement operand.
> 
> Here is a case where we believed distinguishing between poison and undef is essential:
> %add = add nsw i32 %a, %b
> %cmp = icmp sgt i32 %add, %a
> 
> If %a is INT_MAX and %b is 1, then %add results in overflow.
> If overflow results in undef, then %cmp would be equivalent to:
> %cmp = icmp sgt i32 undef, INT_MIN
> 
> There is no bit-pattern we could substitute for undef which would make %cmp true, this means that we cannot optimize %cmp to 'icmp sgt i32 %b, 0'.
> 
> Poison was created to allow us to produce whatever value we'd like for %cmp if %add resulted in overflow.
> 
> What I would like to do is to remove poison and replace it with undef while still optimizing comparison instructions.  The quick-and-dirty way of stating this is: an icmp with an undef operand produces undef regardless of its other operand.
> 
> I believe this would allow us to exploit and optimize overflow in arithmetic flowing into comparison operations.
> 
> Could this be problematic?
> 
> Whether we intended to or not, LLVM's implementation of undef's semantics already allows us to say that a memory location is not equivalent to any possible bit-pattern.
> 
> Consider the following program:
> define i1 @f() {
>   %mem = alloca i1
>   %load = load i1* %mem
>   %cmp = icmp eq i1 %load, false
>   ret i1 %cmp
> }
> 
> Because it is possible for %load's undef to be either true or false, %cmp must be equivalent to undef as well.  This is completely consistent with the above rules.
> 
> The following program is a little more interesting:
> define i1 @g() {
>   %mem = alloca i1
>   %load = load i1* %mem
>   %cmp0 = icmp eq i1 %load, 0
>   %cmp1 = icmp eq i1 %load, 1
>   %and = xor i1 %cmp0, %cmp1
>   ret i1 %and
> }
> 
> The intent of this program is to compare a memory location against all possible values the location might have.
> 
> If we ran LLVM's InstCombine pass then %and would have been replaced with the "expected" result of true.
> 
> If we instead ran SROA over @g, we would be left with:
>   %cmp0 = icmp eq i1 undef, false
>   %cmp1 = icmp eq i1 undef, true
>   %and = xor i1 %cmp0, %cmp1
> 
> Now that %cmp0 and %cmp1 have different undef values, %and is now undef.
> This result is sufficient to say that the contents of %mem are neither true nor false!
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





More information about the llvm-dev mailing list