[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

Charith Mendis via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 12 20:28:40 PDT 2016


Thanks for the explanation. But I am a little confused with the following
fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert
the loop to say;

for(unsigned i = 0; i < vectorizable_elements  ; i += 2){
    //main loop
}

for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
   //fix up
}

Why does it have to reason about the range of vectorizable_elements? Even
if vectorizable_elements == SIZE_MAX the above decomposition would work?

On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org>
wrote:

> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
>
> Hi all,
>
> Attached herewith is a simple vectorized function with loops performing a
> simple shuffle.
>
> I want all loops (inner and outer) to be unrolled by 2 and as such used
> -unroll-count=2
> The inner loops(with k as the induction variable and having constant trip
> counts) unroll fully, but the outer loop with (j) fails to unroll.
>
> The llvm code is also attached with inner loops fully unrolled.
>
> To inspect further, I added the following to the PassManagerBuilder.cpp to
> run some canonicalization routines and redo unrolling again. I have set
> partial unrolling on + have a huge threshold + allows expensive loop trip
> counts. Still it didn't unroll by 2.
>
> MPM.add(createLoopUnrollPass());
>
> MPM.add(createCFGSimplificationPass());
>
> MPM.add(createLoopSimplifyPass());
>
> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
>
> MPM.add(createLCSSAPass());
>
> MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
>
> MPM.add(createLoopUnrollPass());
>
>
> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder
> function, where it is unable to calculate the BackEdge taken amount.
>
> Can anybody explain what is need to get the outer loop unrolled by 2? It
> would be a great help.
>
>
> Well, I can at least explain what is happening... runtime unrolling needs
> to be able to symbolically compute the trip count to avoid inserting a
> branch after every iteration.  SCEV isn't able to prove that your loop
> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX),
> therefore it can't compute the trip count.  Therefore, we don't unroll.
>
> There's a few different angles you could use to attack this: you could
> teach the unroller to unroll loops with an uncomputable trip count, or you
> can make the trip count of your loop computable somehow.  Changing the
> unroller is probably straightforward (see the recently committed r284044).
> Making the trip count computable is more complicated... it's probably
> possible to teach SCEV to reason about the overflow in the pointer
> computation, or maybe you could version the loop.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>


-- 
Kind regards,
Charith Mendis

Graduate Student,
CSAIL,
Massachusetts Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/13e89dc9/attachment.html>


More information about the llvm-dev mailing list