[llvm-dev] [IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

Fri May 6 16:47:06 PDT 2016

Hi Sanjoy,

Could you let me know if you have had a chance to check the patch and the
LLVM test corrections, please?

Thanks!

On Thu, Apr 28, 2016 at 11:51 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com>
wrote:

> Hi Sanjoy,
>
> Attached is the patch that fixes the LLVM test regressions caused by this
> fix. It just adds <nsw> entries the fix has introduced.
>
> Would you have a couple of minutes to check it, please?
>
> I would also like to share two differences in the opt logs I noticed for
> these tests.
>
> *1.* The fixed LLVM might split sext of a sum into a sum of two sext's
> while doing SCEV analysis. This does not affect the final IR however - it
> is the same for all the patched tests with and w/o your fix. E.g., this can
> be seen on Analysis/ScalarEvolution/flags-from-poison.ll. LLVM w/o the fix
> prints this:
>
> --------
> Printing analysis 'Scalar Evolution Analysis' for function 'test-sub-nsw':
> Classifying expressions for: @test-sub-nsw
>   ...
>   %index64 = sext i32 %index32 to i64
>   -->  {(sext i32 ((-1 * %halfsub)<nsw> + %start) to i64),+,1}<nsw><%loop>
> U: [-2147483648,6442450943) S: [-2147483648,6442450943) Exits: ((zext i32
> (-1 + (-1 * %start) + %numIterations) to i64) + (sext i32 ((-1 *
> %halfsub)<nsw> + %start) to i64))
> --------
>
> The fixed one prints a sum of sext's:
>
> --------
>   %index64 = sext i32 %index32 to i64
>   -->  {((sext i32 (-1 * %halfsub)<nsw> to i64) + (sext i32 %start to
> i64))<nsw>,+,1}<nsw><%loop> U: [-3221225471,7516192767) S:
> [-3221225471,7516192767) Exits: ((zext i32 (-1 + (-1 * %start) +
> %numIterations) to i64) + (sext i32 (-1 * %halfsub)<nsw> to i64) + (sext
> i32 %start to i64))
> --------
>
> *2.* The fixed LLVM seems to be more aggressive in extracting constants
> from sext. E.g., the same test flags-from-poison.ll, the test-sub-with-add
> function. W/o the fix:
>
> --------
>   %ptr = getelementptr inbounds float, float* %input, i32 %index32
>   -->  {((4 * (sext i32 (1 + (-1 * %offset)) to i64)) +
> %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: ((4 * (zext i32
> (-1 + %numIterations) to i64)) + (4 * (sext i32 (1 + (-1 * %offset)) to
> i64)) + %input)
> --------
>
> With the fix:
>
> --------
>   %ptr = getelementptr inbounds float, float* %input, i32 %index32
>   -->  {(4 + (4 * (sext i32 (-1 * %offset) to i64)) +
> %input),+,4}<nsw><%loop> U: full-set S: full-set Exits: (4 + (4 * (zext
> i32 (-1 + %numIterations) to i64)) + (4 * (sext i32 (-1 * %offset) to i64))
> + %input)
> --------
>
> Hope this helps.
>
> On Sat, Apr 23, 2016 at 4:48 PM, Oleg Ranevskyy <llvm.mail.list at gmail.com>
> wrote:
>
>> Hi Sanjoy,
>>
>> Thank you for looking into this!
>> Yes, your patch does fix my larger test case too. My algorithm gets
>> double performance improvement with the patch, as the loop now has a
>> smaller instruction set and succeeds to unroll w/o any extra #pragma's.
>>
>> I also ran the LLVM tests against the patch. There are 6 new failures:
>>     Analysis/LoopAccessAnalysis/number-of-memchecks.ll
>>     Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
>>     Analysis/ScalarEvolution/flags-from-poison.ll
>>     Analysis/ScalarEvolution/nsw-offset-assume.ll
>>     Analysis/ScalarEvolution/nsw-offset.ll
>>     Analysis/ScalarEvolution/nsw.ll
>>
>> I haven't inspected these failures in detail yet, but it's likely the
>> tests merely need to be adjusted to handle the new no-wrap flags the patch
>> introduced. I will double-check this soon.
>>
>> Kind regards,
>> Oleg
>>
>> On Fri, Apr 22, 2016 at 11:02 AM, Sanjoy Das <
>> sanjoy at playingwithpointers.com> wrote:
>>
>>> Hi Oleg,
>>>
>>> I think the problem here is that SCEV forgets to propagate no-wrap
>>> flags when folding "{S,+,X}+T ==> {S+T,+,X}".
>>>
>>> I haven't carefully thought about the implications and whether the
>>> change is even correct, but the appended patch fixes the test case
>>> you've attached.  I'll give it some more thought and if it holds up
>>> I'll check it in in the next few days.  Meanwhile if you have a larger
>>> test case that you extracted indvar_test.cpp from, I'd be interested
>>> in hearing if this change works there as well.
>>>
>>> diff --git a/lib/Analysis/ScalarEvolution.cpp
>>> b/lib/Analysis/ScalarEvolution.cpp
>>> index 39ced1e..2e87902 100644
>>> --- a/lib/Analysis/ScalarEvolution.cpp
>>> +++ b/lib/Analysis/ScalarEvolution.cpp
>>> @@ -2274,19 +2274,19 @@ const SCEV
>>> *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
>>>        }
>>>
>>>      // If we found some loop invariants, fold them into the recurrence.
>>>      if (!LIOps.empty()) {
>>>        //  NLI + LI + {Start,+,Step}  -->  NLI + {LI+Start,+,Step}
>>>        LIOps.push_back(AddRec->getStart());
>>>
>>>        SmallVector<const SCEV *, 4> AddRecOps(AddRec->op_begin(),
>>>                                               AddRec->op_end());
>>> -      AddRecOps[0] = getAddExpr(LIOps);
>>> +      AddRecOps[0] = getAddExpr(LIOps, Flags);
>>>
>>>        // Build the new addrec. Propagate the NUW and NSW flags if both
>>> the
>>>        // outer add and the inner addrec are guaranteed to have no
>>> overflow.
>>>        // Always propagate NW.
>>>        Flags = AddRec->getNoWrapFlags(setFlags(Flags, SCEV::FlagNW));
>>>        const SCEV *NewRec = getAddRecExpr(AddRecOps, AddRecLoop, Flags);
>>>
>>>        // If all of the other operands were loop invariant, we are done.
>>>        if (Ops.size() == 1) return NewRec;
>>>
>>> Thanks!
>>> -- Sanjoy
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160507/6f25149b/attachment.html>