[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Mon Jan 27 17:22:56 PST 2014

In r200270 I added support to unroll conditional stores in the loop vectorizer.

It is currently off pending further benchmarking and can be enabled with "-mllvm -vectorize-num-stores-pred=1”.

Furthermore, I added a heuristic to unroll until load/store ports are saturated “-mllvm enable-loadstore-runtime-unroll” instead of the pure size based heuristic.

Those two together with a patch that slightly changes the register heuristic and libquantum’s three hot loops will unroll and goodness will ensue (at least for libquantum).


commit 6b908b8b1084c97238cc642a3404a4285c21286f
Author: Arnold Schwaighofer <aschwaighofer at apple.com>
Date:   Mon Jan 27 13:21:55 2014 -0800

    Subtract one for loop induction variable. It is unlikely to be unrolled.

diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7867495..978c5a1 100644
--- a/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5142,8 +5142,8 @@ LoopVectorizationCostModel::selectUnrollFactor(bool OptForSize,
   // fit without causing spills. All of this is rounded down if necessary to be
   // a power of two. We want power of two unroll factors to simplify any
   // addressing operations or alignment considerations.
-  unsigned UF = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /
-                              R.MaxLocalUsers);
+  unsigned UF = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs - 1) /
+                              (R.MaxLocalUsers - 1));



On Jan 21, 2014, at 11:46 AM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> 
> On Jan 21, 2014, at 6:18 AM, Diego Novillo <dnovillo at google.com> wrote:
> 
>> On 16/01/2014, 23:47 , Andrew Trick wrote:
>>> 
>>> On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com> wrote:
>>> 
>>>> Chandler also pointed me at the vectorizer, which has its own
>>>> unroller. However, the vectorizer only unrolls enough to serve the
>>>> target, it's not as general as the runtime-triggered unroller. From
>>>> what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
>>>> avx targets). Additionally, the vectorizer only unrolls to aid
>>>> reduction variables. When I forced the vectorizer to unroll these
>>>> loops, the performance effects were nil.
>>> 
>>> Vectorization and partial unrolling (aka runtime unrolling) for ILP should to be the same pass. The profitability analysis required in each case is very closely related, and you never want to do one before or after the other. The analysis makes sense even for targets without vector units. The “vector unroller” has an extra restriction (unlike the LoopUnroll pass) in that it must be able to interleave operations across iterations. This is usually a good thing to check before unrolling, but the compiler’s dependence analysis may be too conservative in some cases.
>> 
>> In addition to tuning the cost model, I found that the vectorizer does not even choose to get that far into its analysis for some loops that I need unrolled. In this particular case, there are three loops that need to be unrolled to get the performance I'm looking for. Of the three, only one gets far enough in the analysis to decide whether we unroll it or not.
>> 
> 
> I assume the other two loops are quantum_cnot's <http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00054> and quantum_toffoli's <http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00082>.
> 
> The problem for the unroller in the loop vectorizer is that it wants to if-convert those loops. The conditional store prevents if-conversion because we can’t introduce a store on a path where there was none before: <http://llvm.org/docs/Atomics.html#optimization-outside-atomic>.
> 
> for (…)
>  if (A[i] & mask)
>    A[i] = val
> 
> If we wanted the unroller in the vectorizer to handle such loops we would have to teach it to leave the store behind an if:
> 
> 
> for (…)
>  if (A[i] & mask)
>    A[i] = val
> 
> =>
> 
> for ( … i+=2) {
>   pred<0,1> = A[i:i+1] & mask<0, 1>
>   val<0,1> = ...
>   if (pred<0>)
>       A[i]   = val<0>
>   if (pred<1>)
>       A[i+1] = val<1>
> }
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev