<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 1, 2015 at 1:57 AM, David Majnemer <span dir="ltr"><<a href="mailto:david.majnemer@gmail.com" target="_blank">david.majnemer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="adM"><br></div><div class="gmail_quote"><div class="adM"><div class="">On Tue, Jan 27, 2015 at 8:58 PM, Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@playingwithpointers.com" target="_blank">sanjoy@playingwithpointers.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>> Ah, yes. You are right, we cannot always assume that %y would be zero in<br>
> the second case.<br>
> This wouldn't be the first time we've lost information that we could use to<br>
> optimize a program by transforming it.<br>
><br>
> Do you think this result would be problematic? It seems consistent with the<br>
> RFC and LLVM's current behavior.<br>
><br>
<br>
</span>The problem is not that we're losing information, the problem is that<br>
we're changing the behavior of a well-defined program.<br>
<br>
I'll try to put the whole argument in one place:<br>
<br>
We start with<br>
<br>
%x = add nuw i32 %m, %n<br>
%y = zext i32 %x to i64<br>
%s = lshr i64 %y, 32<br>
%addr = gep %some_global, %s<br>
store i32 42, i32* %addr<br>
<br>
In the above program, for all values of %x, %s is 0. This means the<br>
program is well-defined when %x is poison (since you don't need to<br>
look at %x to determine the value of %addr, in the same sense as you<br>
don't need to look at X to determine the value of "and X, 0"); and it<br>
stores 42 to &(%some_global)[0]. Specifically, the above program is<br>
well defined for "%m = %n = 2^32-1".<br>
<br>
Now if we do the usual transform of "zext (add nuw X Y)" => "add nuw<br>
(zext X) (zext Y)" then we get<br>
<br>
%m.wide = zext i32 %m to i64<br>
%n.wide = zext i32 %n to i64<br>
%z = add nuw i64 %m.wide, %n.wide<br>
%s = lshr i64 %y, 32<br>
%addr = gep %some_global, %s<br>
store i32 42, i32* %addr<br>
<br>
The new program does *not* have the same behavior as the old program<br>
for "%m = %n = 2^32-1". We have changed the behavior of a<br>
well-defined program by doing the "zext (add nuw X Y)" => "add nuw<br>
(zext X) (zext Y)" transform.<br></blockquote><div><br></div></div></div><div>After some pondering and combing through LLVM's implementation, I think we must conclude that zexting a value with any poison bits creates poison in every new bit.</div><div><br></div><div>Considering the following program:</div><div><br></div><div>%zext = zext i32 %x to i64</div><div>%icmp = icmp i64 %zext, i64 1</div><div><br></div><div>we'd like to transform this to:</div><div><br></div><div>%icmp = icmp i32 %x, i32 1</div><div><br></div><div>Is it reasonable to say that '%icmp' in the before case is not poison but '%icmp' in the after case is poison? LLVM assumes it can remove casts with impunity, I think this is a useful property to maintain.</div></div></blockquote></div><br>FWIW, I agree with your statement.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Here is the line of reasoning that I find troubling.</div><div class="gmail_extra"><br></div><div class="gmail_extra">If we accept the above, we have a surprising result (using small bit-width integers to make it easier to read)</div><div class="gmail_extra"><br></div><div class="gmail_extra">%zext = zext i1 %x to i2</div><div class="gmail_extra">%and = and i2 %zext, 1</div><div class="gmail_extra"><br></div><div class="gmail_extra">We cannot replace %and with %zext because the %and might be removing poison.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Perhaps this restriction is OK though. I just find it somewhat troubling.</div></div>