[LLVMdev] [RFC] LoopInterchange Pass for llvm

Thu Feb 5 05:34:05 PST 2015

Hi All,
I have been working on a loop interchange pass for llvm. The motivation is to improve the cache hit to improve performance.
Would like to get your inputs on the same.

Currently this pass only handles loop of depth 2 other loops are ignored. Going forward we would like to fix this.
This opt is disabled by default.

Patch: http://reviews.llvm.org/D7432

LoopInterchange Pass is divided into 3 parts-
1) LoopInterchangeLegality
2) LoopInterchangeProfitability
3) LoopInterchangeTransform

LoopInterchangeLegality: 
This class checks all the memory instructions in the loop and uses DependenceAnalysis to conclude if we can interchange the loop or not.

LoopInterchangeLegality Functions:
  canInterchangeLoops - Checks if the loops can be interchanged.
  checkDependence - Called by canInterchangeLoops. Does the actual DependenceAnalysis to conclude if we can interchange the loops
  currentLimiations - This function marks loops are illegal due to current limitation in the way the transform is written. I intend to fix these issues.

LoopInterchangeProfitability:
This class checks if it is profitable to interchange the loops. Currently i use only 1 heuristic which is the order in which the array elements are accessed (i.e. row major/column major) and count the good and bad order. If we have bad order more than good order we interchange. We can improve the heuristics here later on.

LoopInterchangeProfitability Functions:
  isProfitable - Concludes if it is profitable to interchange.
  getInstrOrderCost - Calculates the array access order heuristics.

LoopInterchangeTransform:
This transforms the loop and interchanges the inner loop with the outer loop. I'm writing a loop optimization for the first time and have few doubts here.
The way we have interchanged the loop is - 
  1) Split the inner loop header and move all phi nodes into a seperate block. This will be the new outer header.
  2) Split the loop latch at the indvar.next instruction( This may need improvement as indicated in a TODO in the code) and this will be the new outer loop latch.
  3) Adjust the loop links by adjusting the branch instructions so that we move the inner loop outside.
  4) Fix PHi nodes due to the step 3 as previous basicblock from which it branched will have changed.

After this step the loop is interchanged. It gives the correct results for few of the sample loops which i checked but I'm not sure if this is the right way to perform interchange transform.
I suppose i will have to update the dom tree etc?  Could someone please guide me if this is the right way to go forward or we need to transform in some other way?

I checked the transform with the following and some other examples it seems to interchange the loops when legal and profitable.I checked the o/p with and without the transform. 
They seem to be the same in the cases which i have checked.

#include <iostream>
using namespace std;
int N,M;
int A[100][100],B[100][100];
int k;
int main() {

 cin >> N >> M;
 for(int i=0;i<N;i++)
   for(int j=0;j<M;j++)
    cin >> A[i][j];

 for(int j=0;j<N;j++) {
  for(int i=0;i<M;i++) {
        if(i%2)
          A[i+2][j+2] = A[i][j]+k;
        else
         A[i][j+1] = A[i][j]+k;
    }
  }

 for(int i=0;i<N;i++)
   for(int j=0;j<M;j++)
    cout << A[i][j] <<"\n";

  return 0;
}

Running loop-interchange-
  opt -basicaa -loop-interchange test.ll -o test.bc

Awaiting comments and inputs.

Thanks and Regards
Karthik Bhat