[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Tue Jan 21 11:46:37 PST 2014

On Jan 21, 2014, at 6:18 AM, Diego Novillo <dnovillo at google.com> wrote:

> On 16/01/2014, 23:47 , Andrew Trick wrote:
>> 
>> On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com> wrote:
>> 
>>> Chandler also pointed me at the vectorizer, which has its own
>>> unroller. However, the vectorizer only unrolls enough to serve the
>>> target, it's not as general as the runtime-triggered unroller. From
>>> what I've seen, it will get a maximum unroll factor of 2 on x86 (4 on
>>> avx targets). Additionally, the vectorizer only unrolls to aid
>>> reduction variables. When I forced the vectorizer to unroll these
>>> loops, the performance effects were nil.
>> 
>> Vectorization and partial unrolling (aka runtime unrolling) for ILP should to be the same pass. The profitability analysis required in each case is very closely related, and you never want to do one before or after the other. The analysis makes sense even for targets without vector units. The “vector unroller” has an extra restriction (unlike the LoopUnroll pass) in that it must be able to interleave operations across iterations. This is usually a good thing to check before unrolling, but the compiler’s dependence analysis may be too conservative in some cases.
> 
> In addition to tuning the cost model, I found that the vectorizer does not even choose to get that far into its analysis for some loops that I need unrolled. In this particular case, there are three loops that need to be unrolled to get the performance I'm looking for. Of the three, only one gets far enough in the analysis to decide whether we unroll it or not.
> 

I assume the other two loops are quantum_cnot's <http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00054> and quantum_toffoli's <http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00082>.

The problem for the unroller in the loop vectorizer is that it wants to if-convert those loops. The conditional store prevents if-conversion because we can’t introduce a store on a path where there was none before: <http://llvm.org/docs/Atomics.html#optimization-outside-atomic>.

for (…)
  if (A[i] & mask)
    A[i] = val

If we wanted the unroller in the vectorizer to handle such loops we would have to teach it to leave the store behind an if:

for (…)
  if (A[i] & mask)
    A[i] = val

=>

for ( … i+=2) {
   pred<0,1> = A[i:i+1] & mask<0, 1>
   val<0,1> = ...
   if (pred<0>)
       A[i]   = val<0>
   if (pred<1>)
       A[i+1] = val<1>
}