[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
Friedman, Eli via llvm-dev
llvm-dev at lists.llvm.org
Wed Oct 12 17:25:27 PDT 2016
On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
> Hi all,
>
> Attached herewith is a simple vectorized function with loops
> performing a simple shuffle.
>
> I want all loops (inner and outer) to be unrolled by 2 and as such
> used -unroll-count=2
> The inner loops(with k as the induction variable and having constant
> trip counts) unroll fully, but the outer loop with (j) fails to unroll.
>
> The llvm code is also attached with inner loops fully unrolled.
>
> To inspect further, I added the following to the
> PassManagerBuilder.cpp to run some canonicalization routines and redo
> unrolling again. I have set partial unrolling on + have a huge
> threshold + allows expensive loop trip counts. Still it didn't unroll
> by 2.
>
> MPM.add(createLoopUnrollPass());
>
> MPM.add(createCFGSimplificationPass());
>
> MPM.add(createLoopSimplifyPass());
>
> MPM.add(createLoopRotatePass(SizeLevel == 2? 0: -1));
>
> MPM.add(createLCSSAPass());
>
> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
>
> MPM.add(createLoopUnrollPass());
>
>
>
> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder
> function, where it is unable to calculate the BackEdge taken amount.
>
> Can anybody explain what is need to get the outer loop unrolled by 2?
> It would be a great help.
Well, I can at least explain what is happening... runtime unrolling
needs to be able to symbolically compute the trip count to avoid
inserting a branch after every iteration. SCEV isn't able to prove that
your loop isn't an infinite loop (consider the case of
vectorizable_elements==SIZE_MAX), therefore it can't compute the trip
count. Therefore, we don't unroll.
There's a few different angles you could use to attack this: you could
teach the unroller to unroll loops with an uncomputable trip count, or
you can make the trip count of your loop computable somehow. Changing
the unroller is probably straightforward (see the recently committed
r284044). Making the trip count computable is more complicated... it's
probably possible to teach SCEV to reason about the overflow in the
pointer computation, or maybe you could version the loop.
-Eli
--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/bb6bc4ad/attachment.html>
More information about the llvm-dev
mailing list