<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 1, 2015 at 1:57 AM, David Majnemer <span dir="ltr"><<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="adM"><br></div><div class="gmail_quote"><div class="adM"><div class="">On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@playingwithpointers.com" target="_blank">sanjoy@playingwithpointers.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>> Ah, yes.  You are right, we cannot always assume that %y would be zero in<br>

> the second case.<br>

> This wouldn't be the first time we've lost information that we could use to<br>

> optimize a program by transforming it.<br>

><br>

> Do you think this result would be problematic?  It seems consistent with the<br>

> RFC and LLVM's current behavior.<br>

><br>

<br>

</span>The problem is not that we're losing information, the problem is that<br>

we're changing the behavior of a well-defined program.<br>

<br>

I'll try to put the whole argument in one place:<br>

<br>

We start with<br>

<br>

  %x = add nuw i32 %m, %n<br>

  %y = zext i32 %x to i64<br>

  %s = lshr i64 %y, 32<br>

  %addr = gep %some_global, %s<br>

  store i32 42, i32* %addr<br>

<br>

In the above program, for all values of %x, %s is 0.  This means the<br>

program is well-defined when %x is poison (since you don't need to<br>

look at %x to determine the value of %addr, in the same sense as you<br>

don't need to look at X to determine the value of "and X, 0"); and it<br>

stores 42 to &(%some_global)[0].  Specifically, the above program is<br>

well defined for "%m = %n = 2^32-1".<br>

<br>

Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw<br>

(zext X) (zext Y)" then we get<br>

<br>

  %m.wide = zext i32 %m to i64<br>

  %n.wide = zext i32 %n to i64<br>

  %z = add nuw i64 %m.wide, %n.wide<br>

  %s = lshr i64 %y, 32<br>

  %addr = gep %some_global, %s<br>

  store i32 42, i32* %addr<br>

<br>

The new program does *not* have the same behavior as the old program<br>

for "%m = %n = 2^32-1".  We have changed the behavior of a<br>

well-defined program by doing the "zext (add nuw X Y)" => "add nuw<br>

(zext X) (zext Y)" transform.<br></blockquote><div><br></div></div></div><div>After some pondering and combing through LLVM's implementation, I think we must conclude that zexting a value with any poison bits creates poison in every new bit.</div><div><br></div><div>Considering the following program:</div><div><br></div><div>%zext = zext i32 %x to i64</div><div>%icmp = icmp i64 %zext, i64 1</div><div><br></div><div>we'd like to transform this to:</div><div><br></div><div>%icmp = icmp i32 %x, i32 1</div><div><br></div><div>Is it reasonable to say that '%icmp' in the before case is not poison but '%icmp' in the after case is poison?  LLVM assumes it can remove casts with impunity, I think this is a useful property to maintain.</div></div></blockquote></div><br>FWIW, I agree with your statement.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Here is the line of reasoning that I find troubling.</div><div class="gmail_extra"><br></div><div class="gmail_extra">If we accept the above, we have a surprising result (using small bit-width integers to make it easier to read)</div><div class="gmail_extra"><br></div><div class="gmail_extra">%zext = zext i1 %x to i2</div><div class="gmail_extra">%and = and i2 %zext, 1</div><div class="gmail_extra"><br></div><div class="gmail_extra">We cannot replace %and with %zext because the %and might be removing poison.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Perhaps this restriction is OK though. I just find it somewhat troubling.</div></div>