[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Fri May 6 18:30:43 PDT 2016

Hi Oleg,

Sorry for letting this slide -- the easiest way to shame me into looking 
at a patch is to put it on Phabricator [1], and add me as a reviewer. 
That way the patch will keep showing up on on my "pending action" queue 
till I review it. :)

On the surface, the test corrections look fine to me, and I think the 
original fix is correct.  Do you mind taking all of this and putting it 
up on Phabricator?  Phabricator will also make it easy for me to more 
thoroughly check if the test fixes are correct.

[1]: http://llvm.org/docs/Phabricator.html

Thanks,
-- Sanjoy

Oleg Ranevskyy wrote:
> Hi Sanjoy,
>
> Could you let me know if you have had a chance to check the patch and
> the LLVM test corrections, please?

>
> Thanks!
>
> On Thu, Apr 28, 2016 at 11:51 PM, Oleg Ranevskyy
> <llvm.mail.list at gmail.com <mailto:llvm.mail.list at gmail.com>> wrote:
>
>     Hi Sanjoy,
>
>     Attached is the patch that fixes the LLVM test regressions caused by
>     this fix. It just adds <nsw> entries the fix has introduced.
>
>     Would you have a couple of minutes to check it, please?
>
>     I would also like to share two differences in the opt logs I noticed
>     for these tests.
>
>     *1.* The fixed LLVM might split sext of a sum into a sum of two
>     sext's while doing SCEV analysis. This does not affect the final IR
>     however - it is the same for all the patched tests with and w/o your
>     fix. E.g., this can be seen on
>     Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix
>     prints this:
>
>     --------
>     Printing analysis 'Scalar Evolution Analysis' for function
>     'test-sub-nsw':
>     Classifying expressions for: @test-sub-nsw
>        ...
>        %index64 = sext i32 %index32 to i64
>        -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to
>     i64),+,1}<nsw><%loop> U: [-2147483648
>     <tel:%5B-2147483648>,6442450943) S: [-2147483648
>     <tel:%5B-2147483648>,6442450943)Exits: ((zext i32 (-1 + (-1 *
>     %start) + %numIterations) to i64) + (sext i32 ((-1 * %halfsub)<nsw>
>     + %start) to i64))
>     --------
>
>     The fixed one prints a sum of sext's:
>
>     --------
>        %index64 = sext i32 %index32 to i64
>        -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start
>     to i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S:
>     [-3221225471,7516192767)Exits: ((zext i32 (-1 + (-1 * %start) +
>     %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) +
>     (sext i32 %start to i64))
>     --------
>
>     *2.* The fixed LLVM seems to be more aggressive in extracting
>     constants from sext. E.g., the same test flags-from-poison.ll, the
>     test-sub-with-add function. W/o the fix:
>
>     --------
>        %ptr = getelementptr inbounds float, float* %input, i32 %index32
>        -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
>     %input),+,4}<nsw><%loop> U: full-set S: full-setExits: ((4 * (zext
>     i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 *
>     %offset)) to i64)) + %input)
>     --------
>
>     With the fix:
>
>     --------
>        %ptr = getelementptr inbounds float, float* %input, i32 %index32
>        -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) +
>     %input),+,4}<nsw><%loop> U: full-set S: full-setExits: (4 + (4 *
>     (zext i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 *
>     %offset) to i64)) + %input)
>     --------
>
>     Hope this helps.
>
>     On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy
>     <llvm.mail.list at gmail.com <mailto:llvm.mail.list at gmail.com>> wrote:
>
>         Hi Sanjoy,
>
>         Thank you for looking into this!
>         Yes, your patch does fix my larger test case too. My algorithm
>         gets double performance improvement with the patch, as the loop
>         now has a smaller instruction set and succeeds to unroll w/o any
>         extra #pragma's.
>
>         I also ran the LLVM tests against the patch. There are 6 new
>         failures:
>              Analysis/LoopAccessAnalysis/number-of-memchecks.ll
>              Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
>              Analysis/ScalarEvolution/flags-from-poison.ll
>              Analysis/ScalarEvolution/nsw-offset-assume.ll
>              Analysis/ScalarEvolution/nsw-offset.ll
>              Analysis/ScalarEvolution/nsw.ll
>
>         I haven't inspected these failures in detail yet, but it's
>         likely the tests merely need to be adjusted to handle the new
>         no-wrap flags the patch introduced. I will double-check this soon.
>
>         Kind regards,
>         Oleg
>
>         On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das
>         <sanjoy at playingwithpointers.com
>         <mailto:sanjoy at playingwithpointers.com>> wrote:
>
>             Hi Oleg,
>
>             I think the problem here is that SCEV forgets to propagate
>             no-wrap
>             flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>
>             I haven't carefully thought about the implications and
>             whether the
>             change is even correct, but the appended patch fixes the
>             test case
>             you've attached.  I'll give it some more thought and if it
>             holds up
>             I'll check it in in the next few days.  Meanwhile if you
>             have a larger
>             test case that you extracted indvar_test.cpp from, I'd be
>             interested
>             in hearing if this change works there as well.
>
>             diff --git a/lib/Analysis/ScalarEvolution.cpp
>             b/lib/Analysis/ScalarEvolution.cpp
>             index 39ced1e..2e87902 100644
>             --- a/lib/Analysis/ScalarEvolution.cpp
>             +++ b/lib/Analysis/ScalarEvolution.cpp
>             @@ -2274,19 +2274,19 @@ const SCEV
>             *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
>                     }
>
>                   // If we found some loop invariants, fold them into
>             the recurrence.
>                   if (!LIOps.empty()) {
>                     //  NLI + LI + {Start,+,Step}  -->  NLI +
>             {LI+Start,+,Step}
>                     LIOps.push_back(AddRec->getStart());
>
>                     SmallVector<const SCEV *, 4>
>             AddRecOps(AddRec->op_begin(),
>
>             AddRec->op_end());
>             -      AddRecOps[0] = getAddExpr(LIOps);
>             +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>
>                     // Build the new addrec. Propagate the NUW and NSW
>             flags if both the
>                     // outer add and the inner addrec are guaranteed to
>             have no overflow.
>                     // Always propagate NW.
>                     Flags = AddRec->getNoWrapFlags(setFlags(Flags,
>             SCEV::FlagNW));
>                     const SCEV *NewRec = getAddRecExpr(AddRecOps,
>             AddRecLoop, Flags);
>
>                     // If all of the other operands were loop invariant,
>             we are done.
>                     if (Ops.size() == 1) return NewRec;
>
>             Thanks!
>             -- Sanjoy
>
>
>
>