<div dir="ltr"><div><div>Hi Sanjoy,</div><div><br></div><div>Could you let me know if you have had a chance to check the patch and the LLVM test corrections, please?<br></div><div><br></div><div>Thanks!</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Apr 28, 2016 at 11:51 PM, Oleg Ranevskyy <span dir="ltr"><<a href="mailto:llvm.mail.list@gmail.com" target="_blank">llvm.mail.list@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Sanjoy,</div><div><br></div><div>Attached is the patch that fixes the LLVM test regressions caused by this fix. It just adds <nsw> entries the fix has introduced.</div><div><br></div><div>Would you have a couple of minutes to check it, please?</div><div><br></div><div>I would also like to share two differences in the opt logs I noticed for these tests.</div><div><br></div><div>*1.* The fixed LLVM might split sext of a sum into a sum of two sext's while doing SCEV analysis. This does not affect the final IR however - it is the same for all the patched tests with and w/o your fix. E.g., this can be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix prints this:</div><div><br></div><div>--------</div><div>Printing analysis 'Scalar Evolution Analysis' for function 'test-sub-nsw':</div><div>Classifying expressions for: @test-sub-nsw</div><div>  ...</div><div>  %index64 = sext i32 %index32 to i64</div><div>  -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to i64),+,1}<nsw><%loop> U: <a href="tel:%5B-2147483648" value="+12147483648" target="_blank">[-2147483648</a>,6442450943) S: <a href="tel:%5B-2147483648" value="+12147483648" target="_blank">[-2147483648</a>,6442450943)<span style="white-space:pre-wrap">    </span>Exits: ((zext i32 (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 * %halfsub)<nsw> + %start) to i64))</div><div>--------</div><div><br></div><div>The fixed one prints a sum of sext's:</div><div><br></div><div>--------</div><div>  %index64 = sext i32 %index32 to i64</div><div>  -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S: [-3221225471,7516192767)<span style="white-space:pre-wrap">    </span>Exits: ((zext i32 (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to i64))</div><div>--------</div><div><br></div><div>*2.* The fixed LLVM seems to be more aggressive in extracting constants from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add function. W/o the fix:</div><div><br></div><div>--------</div><div>  %ptr = getelementptr inbounds float, float* %input, i32 %index32</div><div>  -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) + %input),+,4}<nsw><%loop> U: full-set S: full-set<span style="white-space:pre-wrap">    </span>Exits: ((4 * (zext i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to i64)) + %input)</div><div>--------</div><div><br></div><div>With the fix:</div><div><br></div><div>--------</div><div>  %ptr = getelementptr inbounds float, float* %input, i32 %index32</div><div>  -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) + %input),+,4}<nsw><%loop> U: full-set S: full-set<span style="white-space:pre-wrap">      </span>Exits: (4 + (4 * (zext i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64)) + %input)</div><div>--------</div><div><br></div><div>Hope this helps.</div></div><div class=""><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <span dir="ltr"><<a href="mailto:llvm.mail.list@gmail.com" target="_blank">llvm.mail.list@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Sanjoy,</div><div><br></div><div>Thank you for looking into this!</div><div>Yes, your patch does fix my larger test case too. My algorithm gets double performance improvement with the patch, as the loop now has a smaller instruction set and succeeds to unroll w/o any extra #pragma's.</div><div><br></div><div>I also ran the LLVM tests against the patch. There are 6 new failures:</div><div>    Analysis/LoopAccessAnalysis/number-of-memchecks.ll</div><div>    Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll</div><div>    Analysis/ScalarEvolution/flags-from-poison.ll</div><div>    Analysis/ScalarEvolution/nsw-offset-assume.ll</div><div>    Analysis/ScalarEvolution/nsw-offset.ll</div><div>    Analysis/ScalarEvolution/nsw.ll</div><div><br></div><div>I haven't inspected these failures in detail yet, but it's likely the tests merely need to be adjusted to handle the new no-wrap flags the patch introduced. I will double-check this soon.</div><div><br></div><div>Kind regards,</div><div>Oleg</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@playingwithpointers.com" target="_blank">sanjoy@playingwithpointers.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">Hi Oleg,<br>

<br>

I think the problem here is that SCEV forgets to propagate no-wrap<br>

flags when folding "{S,+,X}+T ==> {S+T,+,X}".<br>

<br>

I haven't carefully thought about the implications and whether the<br>

change is even correct, but the appended patch fixes the test case<br>

you've attached.  I'll give it some more thought and if it holds up<br>

I'll check it in in the next few days.  Meanwhile if you have a larger<br>

test case that you extracted indvar_test.cpp from, I'd be interested<br>

in hearing if this change works there as well.<br>

<br>

diff --git a/lib/Analysis/ScalarEvolution.cpp b/lib/Analysis/ScalarEvolution.cpp<br>

index 39ced1e..2e87902 100644<br>

--- a/lib/Analysis/ScalarEvolution.cpp<br>

+++ b/lib/Analysis/ScalarEvolution.cpp<br>

@@ -2274,19 +2274,19 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,<br>

       }<br>

<br>

     // If we found some loop invariants, fold them into the recurrence.<br>

     if (!LIOps.empty()) {<br>

       //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}<br>

       LIOps.push_back(AddRec->getStart());<br>

<br>

       SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),<br>

                                              AddRec->op_end());<br>

-      AddRecOps[0] = getAddExpr(LIOps);<br>

+      AddRecOps[0] = getAddExpr(LIOps, Flags);<br>

<br>

       // Build the new addrec. Propagate the NUW and NSW flags if both the<br>

       // outer add and the inner addrec are guaranteed to have no overflow.<br>

       // Always propagate NW.<br>

       Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));<br>

       const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);<br>

<br>

       // If all of the other operands were loop invariant, we are done.<br>

       if (Ops.size() == 1) return NewRec;<br>

<br>

Thanks!<span><font color="#888888"><br>

-- Sanjoy<br>

</font></span></blockquote></div><br></div>

</div></div></blockquote></div><br></div>

</div></div></blockquote></div><br></div></div>