[LLVMdev] A question about GetElementPtr common subexpression elimination/loop invariant code motion

Mon Jan 29 01:29:57 PST 2007

Hello.

I have a problem which is quite basic for array optimization, amd I 
wonder whether I am missing something, but I could not
find the LLVM pass that does it.

Consider the following code snippet:

int test()
{
  int mat[7][7][7];
  int i,j,k,sum=0;
  for(i=0;i<7;i++){
    for(j=0;j<7;j++){
      for(k=0;k<7;k++){
        sum+=mat[i][j][k]^mat[i][j][k^1];
      }
    }
  }
  return sum;
}

When I run llvm on it (I am using a rather dated llvm 1.7, but I have 
the feeling this optimization is already there somewhere)
I get the following llvm code :

int %test() {
entry:
        %mat = alloca [7 x [7 x [7 x int]]], align 16           ; <[7 x 
[7 x [7 x int]]]*> [#uses=2]
        br label %cond_true

cond_true:              ; preds = %bb31, %bb22, %cond_true, %entry
        %j.1.2.ph = phi int [ 0, %entry ], [ %j.1.2.ph, %cond_true ], [ 
%tmp24, %bb22 ], [ 0, %bb31 ]           ; <int> [#uses=4]
        %i.0.0.ph = phi int [ 0, %entry ], [ %i.0.0.ph, %cond_true ], [ 
%i.0.0.ph, %bb22 ], [ %tmp33, %bb31 ]           ; <int> [#uses=5]
        %k.2.4 = phi int [ 0, %entry ], [ %tmp19, %cond_true ], [ 0, 
%bb22 ], [ 0, %bb31 ]              ; <int> [#uses=3]
        %sum.0.4 = phi int [ 0, %entry ], [ %tmp17, %cond_true ], [ 
%tmp17, %bb22 ], [ %tmp17, %bb31 ]          ; <int> [#uses=1]
        %tmp5 = getelementptr [7 x [7 x [7 x int]]]* %mat, int 0, int 
%i.0.0.ph, int %j.1.2.ph, int %k.2.4              ; <int*> [#uses=1]
        %tmp6 = load int* %tmp5         ; <int> [#uses=1]
        %tmp10 = xor int %k.2.4, 1              ; <int> [#uses=1]
        %tmp13 = getelementptr [7 x [7 x [7 x int]]]* %mat, int 0, int 
%i.0.0.ph, int %j.1.2.ph, int %tmp10             ; <int*> [#uses=1]
        %tmp14 = load int* %tmp13               ; <int> [#uses=1]
        %tmp15 = xor int %tmp14, %tmp6          ; <int> [#uses=1]
        %tmp17 = add int %tmp15, %sum.0.4               ; <int> [#uses=4]
        %tmp19 = add int %k.2.4, 1              ; <int> [#uses=2]
        %tmp13 = setgt int %tmp19, 6            ; <bool> [#uses=1]
        br bool %tmp13, label %bb22, label %cond_true

bb22:           ; preds = %cond_true
        %tmp24 = add int %j.1.2.ph, 1           ; <int> [#uses=2]
        %tmp2719 = setgt int %tmp24, 6          ; <bool> [#uses=1]
        br bool %tmp2719, label %bb31, label %cond_true

bb31:           ; preds = %bb22
        %tmp33 = add int %i.0.0.ph, 1           ; <int> [#uses=2]
        %tmp36 = setgt int %tmp33, 6            ; <bool> [#uses=1]
        br bool %tmp36, label %bb40, label %cond_true

bb40:           ; preds = %bb31
        ret int %tmp17
}

Now the problem with this code , is that the calculation of the address 
mat[i][j] which is done by the (two) getelementptr instructions
is quite expensive (involving at least two multiplications and one 
addition) hence it actualy should have been moved out of the inner loop.
At the very least inside the loop it should have been calculated once , 
and not twice. Anyway this is just a syptom of a general
phenomenon in which it seems my LLVM assumes that getelemenptrs are 
atomic and it does not try to do any optimization on
them.  
Can someone tell me if there is a known  solution /existing code 
transformation that does  the optimization ?