<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks, Arnold, Sanjay!<div class=""><br class=""></div><div class="">I trimmed the testcase and committed the patch in r222451.</div><div class=""><br class=""></div><div class="">Michael</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Nov 20, 2014, at 9:33 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" class="">spatel@rotateright.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Thank you for the explanation! I'll abandon my patch on Phab.<br class=""><br class=""></div>My only suggestion would be to trim the test case if possible.<br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Nov 20, 2014 at 10:03 AM, Michael Zolotukhin <span dir="ltr" class=""><<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi Sanjay,<div class=""><br class=""></div><div class="">Indeed, we’ve been working on very similar problems, and actually your fix is included into my patch. However, along with the issue your fix is targeting, there is one more. It shows up when trip-count isn’t known at compile time (replace <span style="font-family:Menlo,Consolas,Monaco,monospace;font-size:10px;line-height:16px;white-space:pre-wrap;background-color:rgb(170,255,170)" class="">%cmp = icmp eq i32 %inc, 0 </span>with <span style="font-family:Menlo,Consolas,Monaco,monospace;font-size:10px;line-height:16px;white-space:pre-wrap;background-color:rgb(170,255,170)" class="">%cmp = icmp eq i32 %inc, %N</span> in your example) and we use unroll-factor=2. Using unroll-factor=2 is crucial to expose this bug, because for other cases the prologue is a loop, and we use this loop if at runtime we find out that the trip count overflows. To make this possible for unroll-factor=2 as well we need to stop unrolling the prologue if overflow is possible.</div><div class=""><br class=""></div><div class="">I included a test-case to my patch, and return values of @foo differs depending on whether the loop was unrolled or not when trunk compiler is used. You can inspect that by adding a @main function to print the return value.</div><div class=""><br class=""></div><div class="">Below you can find outputs with and without my change, along with my annotation:</div><div class=""><br class=""></div><div class=""><div class="">Without my change (please not that it’s actually not a loop):</div><div class=""><font style="font-size:11px" face="Menlo" class=""><div class="">entry:</div><div class=""> %0 = add i32 %N, 1 ; <<< If %N is -1, %0 overflows and becomes 0 here.</div><div class=""> %xtraiter = and i32 %0, 1</div><div class=""> %lcmp.mod = icmp ne i32 %xtraiter, 0</div><div class=""> %lcmp.overflow = icmp eq i32 %0, 0 ; <<< Here we actually detect the overflow, and our</div><div class=""> ; plan is to go to prologue loop to handle that.</div><div class=""> %lcmp.or = or i1 %lcmp.overflow, %lcmp.mod</div><div class=""> br i1 %lcmp.or, label %while.body.prol, label %entry.split ; <<< Here we jump to the prologue</div><div class=""><br class=""></div></font></div><div class=""><font style="font-size:11px" face="Menlo" class="">while.body.prol: ; preds = %entry</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %1 = bitcast i32 0 to float</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp.prol = fcmp oge float %1, 0.000000e+00</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp2.prol = fcmp ole float %1, 1.000000e+00</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %or.cond.prol = and i1 %cmp.prol, %cmp2.prol</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc.prol = zext i1 %or.cond.prol to i32</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc.count.0.prol = add i32 0, %inc.prol</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp4.prol = icmp eq i32 0, %N ; <<< But it’s not a loop any more!</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc7.prol = add i32 0, 1 ; <<< We execute a single iteration and proceed</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> br label %entry.split ; <<<</font></div></div><div class=""><font style="font-size:11px" face="Menlo" class=""><br class=""></font></div><div class=""><font style="font-size:11px" face="Menlo" class=""><div class="">entry.split: ; preds = %while.body.prol, %entry</div><div class=""> %inc.count.0.lcssa.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]</div><div class=""> %storemerge.unr = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol ]</div><div class=""> %count.0.unr = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]</div><div class=""> %2 = icmp ult i32 %0, 2</div><div class=""> br i1 %2, label %while.end, label %entry.split.split ; <<< And here we skip the unrolled loop and go to</div><div class=""> ; function exit</div><div class=""><br class=""></div><div class="">entry.split.split: ; preds = %entry.split</div><div class=""> br label %while.body</div><div class=""><br class=""></div><div class="">while.body: ; preds = %while.body, %entry.split.split</div><div class=""> %storemerge = phi i32 [ %storemerge.unr, %entry.split.split ], [ %inc7.1, %while.body ]</div><div class=""> %count.0 = phi i32 [ %count.0.unr, %entry.split.split ], [ %inc.count.0.1, %while.body ]</div><div class=""> %3 = bitcast i32 %storemerge to float</div><div class=""> %cmp = fcmp oge float %3, 0.000000e+00</div><div class=""> %cmp2 = fcmp ole float %3, 1.000000e+00</div><div class=""> %or.cond = and i1 %cmp, %cmp2</div><div class=""> %inc = zext i1 %or.cond to i32</div><div class=""> %inc.count.0 = add i32 %count.0, %inc</div><div class=""> %inc7 = add i32 %storemerge, 1</div><div class=""> %4 = bitcast i32 %inc7 to float</div><div class=""> %cmp.1 = fcmp oge float %4, 0.000000e+00</div><div class=""> %cmp2.1 = fcmp ole float %4, 1.000000e+00</div><div class=""> %or.cond.1 = and i1 %cmp.1, %cmp2.1</div><div class=""> %inc.1 = zext i1 %or.cond.1 to i32</div><div class=""> %inc.count.0.1 = add i32 %inc.count.0, %inc.1</div><div class=""> %cmp4.1 = icmp eq i32 %inc7, %N</div><div class=""> %inc7.1 = add i32 %inc7, 1</div><div class=""> br i1 %cmp4.1, label %while.end.unr-lcssa, label %while.body</div></font></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">With my change (the difference is only in <span style="font-family:Menlo" class="">while.body.prol</span> - it’s a loop now):</div><div class=""><div class=""><font face="Menlo" class=""><div class=""><br class=""></div><div class=""><span style="font-size:11px" class="">while.body.prol: ; preds = %while.body.prol, %entry</span></div></font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %storemerge.prol = phi i32 [ 0, %entry ], [ %inc7.prol, %while.body.prol ]</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %count.0.prol = phi i32 [ 0, %entry ], [ %inc.count.0.prol, %while.body.prol ]</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %prol.iter = phi i32 [ %xtraiter, %entry ], [ %prol.iter.sub, %while.body.prol ]</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %1 = bitcast i32 %storemerge.prol to float</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp.prol = fcmp oge float %1, 0.000000e+00</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp2.prol = fcmp ole float %1, 1.000000e+00</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %or.cond.prol = and i1 %cmp.prol, %cmp2.prol</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc.prol = zext i1 %or.cond.prol to i32</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc.count.0.prol = add i32 %count.0.prol, %inc.prol</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %cmp4.prol = icmp eq i32 %storemerge.prol, %N</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %inc7.prol = add i32 %storemerge.prol, 1</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %prol.iter.sub = sub i32 %prol.iter, 1</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> %prol.iter.cmp = icmp ne i32 %prol.iter.sub, 0</font></div><div class=""><font style="font-size:11px" face="Menlo" class=""> br i1 %prol.iter.cmp, label %while.body.prol, label %entry.split, !llvm.loop !0</font></div></div><div class=""><font face="Menlo" class=""><div class=""><br class=""></div></font></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><div class="h5"><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div class=""><blockquote type="cite" class=""><div class="">On Nov 20, 2014, at 8:19 AM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" target="_blank" class="">spatel@rotateright.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" class=""><div class="">Hi Michael,<br class=""><br class="">I have a similar patch to fix this here:<br class=""><a href="http://reviews.llvm.org/D6200" target="_blank" class="">http://reviews.llvm.org/D6200</a><br class=""><br class="">Based on this newer bug report:<br class=""><a href="http://llvm.org/bugs/show_bug.cgi?id=21409" target="_blank" class="">http://llvm.org/bugs/show_bug.cgi?id=21409</a><br class=""><br class=""></div>How does your change to the prologue check affect the transform? Can you post example code with and without that change? I minimized the test case in my patch, so it might be easier to explain using that as the example code.<br class=""><br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Wed, Nov 19, 2014 at 6:36 PM, Michael Zolotukhin <span dir="ltr" class=""><<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="">Hi,<div class=""><br class=""></div><div class="">This patch fixes an issue with possible integer overflow that might be caused by LoopUnroll. The problem was reported here: [1], and than it was partially solved (see [2]), but not completely.</div><div class=""><br class=""></div><div class="">This fix aims the following issue in LoopUnroll pass: currently LoopUnroll generates a prologue loop before the main loop body to execute first N%UnrollFactor iterations. When UnrollFactor is 2, then this loop performs a single iteration, and thus its back-edge is optimized away, and the loop actually becomes just a linear code. </div><div class=""><br class=""></div><div class="">That would work fine, but it turns out that we also want to use this un-unrolled loop as a last resort if for some reason we can’t use unrolled loop. One of such reasons is possible trip-count overflow.</div><div class=""><br class=""></div><div class="">The attached patch makes the condition when we fully unroll the prologue more strict, and thusly fix the issue in question.</div><div class=""><br class=""></div><div class="">Does it look ok?</div><div class=""><br class=""></div><div class=""></div></div><br class=""><div style="word-wrap:break-word" class=""><div class=""></div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""></div><div class="">[1]: <a href="http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out" target="_blank" class="">http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out</a></div><div class="">[2]: <a href="http://llvm.org/bugs/show_bug.cgi?id=19846" target="_blank" class="">http://llvm.org/bugs/show_bug.cgi?id=19846</a></div><div class=""><br class=""></div></div><br class="">_______________________________________________<br class="">
llvm-commits mailing list<br class="">
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank" class="">llvm-commits@cs.uiuc.edu</a><br class="">
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br class="">
<br class=""></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></div></div></div></blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></div></body></html>