[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Thu Apr 28 13:51:01 PDT 2016

Hi Sanjoy,

Attached is the patch that fixes the LLVM test regressions caused by this
fix. It just adds <nsw> entries the fix has introduced.

Would you have a couple of minutes to check it, please?

I would also like to share two differences in the opt logs I noticed for
these tests.

*1.* The fixed LLVM might split sext of a sum into a sum of two sext's
while doing SCEV analysis. This does not affect the final IR however - it
is the same for all the patched tests with and w/o your fix. E.g., this can
be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix
prints this:

--------
Printing analysis 'Scalar Evolution Analysis' for function 'test-sub-nsw':
Classifying expressions for: @test-sub-nsw
  ...
  %index64 = sext i32 %index32 to i64
  -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to i64),+,1}<nsw><%loop>
U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32
(-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 *
%halfsub)<nsw> + %start) to i64))
--------

The fixed one prints a sum of sext's:

--------
  %index64 = sext i32 %index32 to i64
  -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to
i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S:
[-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) +
%numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext
i32 %start to i64))
--------

*2.* The fixed LLVM seems to be more aggressive in extracting constants
from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add
function. W/o the fix:

--------
  %ptr = getelementptr inbounds float, float* %input, i32 %index32
  -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
%input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 * (zext i32 (-1
+ %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
%input)
--------

With the fix:

--------
  %ptr = getelementptr inbounds float, float* %input, i32 %index32
  -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) +
%input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4 * (zext i32
(-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64)) +
%input)
--------

Hope this helps.

On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com>
wrote:

> Hi Sanjoy,
>
> Thank you for looking into this!
> Yes, your patch does fix my larger test case too. My algorithm gets double
> performance improvement with the patch, as the loop now has a smaller
> instruction set and succeeds to unroll w/o any extra #pragma's.
>
> I also ran the LLVM tests against the patch. There are 6 new failures:
>     Analysis/LoopAccessAnalysis/number-of-memchecks.ll
>     Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
>     Analysis/ScalarEvolution/flags-from-poison.ll
>     Analysis/ScalarEvolution/nsw-offset-assume.ll
>     Analysis/ScalarEvolution/nsw-offset.ll
>     Analysis/ScalarEvolution/nsw.ll
>
> I haven't inspected these failures in detail yet, but it's likely the
> tests merely need to be adjusted to handle the new no-wrap flags the patch
> introduced. I will double-check this soon.
>
> Kind regards,
> Oleg
>
> On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <
> sanjoy at playingwithpointers.com> wrote:
>
>> Hi Oleg,
>>
>> I think the problem here is that SCEV forgets to propagate no-wrap
>> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>>
>> I haven't carefully thought about the implications and whether the
>> change is even correct, but the appended patch fixes the test case
>> you've attached.  I'll give it some more thought and if it holds up
>> I'll check it in in the next few days.  Meanwhile if you have a larger
>> test case that you extracted indvar_test.cpp from, I'd be interested
>> in hearing if this change works there as well.
>>
>> diff --git a/lib/Analysis/ScalarEvolution.cpp
>> b/lib/Analysis/ScalarEvolution.cpp
>> index 39ced1e..2e87902 100644
>> --- a/lib/Analysis/ScalarEvolution.cpp
>> +++ b/lib/Analysis/ScalarEvolution.cpp
>> @@ -2274,19 +2274,19 @@ const SCEV
>> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
>>        }
>>
>>      // If we found some loop invariants, fold them into the recurrence.
>>      if (!LIOps.empty()) {
>>        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
>>        LIOps.push_back(AddRec->getStart());
>>
>>        SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),
>>                                               AddRec->op_end());
>> -      AddRecOps[0] = getAddExpr(LIOps);
>> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>>
>>        // Build the new addrec. Propagate the NUW and NSW flags if both
>> the
>>        // outer add and the inner addrec are guaranteed to have no
>> overflow.
>>        // Always propagate NW.
>>        Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));
>>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);
>>
>>        // If all of the other operands were loop invariant, we are done.
>>        if (Ops.size() == 1) return NewRec;
>>
>> Thanks!
>> -- Sanjoy
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scev-tests.patch
Type: application/octet-stream
Size: 8858 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160428/f41485df/attachment.obj>