[PATCH] [LoopInterchange] Add support to interchange loops with reductions.

Renato Golin renato.golin at linaro.org
Sat Mar 21 08:25:49 PDT 2015

In http://reviews.llvm.org/D8314#141802, @karthikthecool wrote:

> Refactor some common code into functions. I have currently borrowed and modified some functions from loop vectorizer. Do i need to refactor them into a common utility as well? These functions such as AddReductionVar seems to be a bit tightly bound with loop vectorizer code.

Yes, they are, and I can see what the problem is. But there is a lot of duplication added by this patch and I'm still uncomfortable. I've added Nadav and Arnold, our loop vectorizer experts, to assist on what to do next.

I strongly suggest against duplication, and the only option I can think of is to spot the pattern while creating the reduction variable. You can create a function to iterate all containing loops and inspect all the ranges to make sure they match your pattern. Early exits should be made if the loop is not deep enough, or the outer loops don't iterate through any of the affected induction variables in your reduction.

> Second change is in PassManagerBuilder. Running SimplifyCFGPass after LoopInterchange is sufficient to merge and remove redundant basic blocks(blocks with just unconditional branch)  produced after loop interhcange.Update the code to reflect the same.

This is good news. Means that the pass is a lot less dramatic than you anticipated. :) This gives me hope that doing this inside the loop vectorizer can be managed.

> I ran few phoronix benchmarks and lnt benchamrks but unfortunetly didn't see any improvement/regression due to this patch.

I'd say "fortunately", since you haven't introduced any regressions, and that's a great thing!

> As mentioned in previous comments post this change code such as-


>   void matrixMult(int N, int M, int K) {

>     for(int i=0;i<N;i++)

>       for(int j=0;j<M;j++)

>         for(int k=0;k<K;k++)

>           A[i][j]+=B[i][k]*C[k][j];

>   }


> gets vectorized givinig some execution time improvement during large matrix multiplication.

It seems we don't have that kind of benchmark on our test suite, and it would be good to have one. I don't know one off the top of my head, but maybe Hal/Nadav/Arnold could help.





More information about the llvm-commits mailing list