[llvm-dev] Is llvm capable of doing loop interchange optimization?

Mon Nov 20 04:10:24 PST 2017

Hi Denis,

On 18/11/2017 15:26, Denis Bakhvalov via llvm-dev wrote:
> Hello,
> 
> I've been playing around with the really simple example of not cache
> friendly loop like this:
> 
> #define N 100
> 
> void foo(int** __restrict__ a,
>           int** __restrict__ b)
> {
>      for (int i = 0; i < N; ++i)
>          for (int j = 0; j < N; ++j)
>              a[j][i] += b[j][i];
> }
> 
> link to compiler explorer:
> https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,source:'%23define+N+100%0A%0Avoid+foo(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bj%5D%5Bi%5D+%2B%3D+b%5Bj%5D%5Bi%5D%3B%0A%7D%0A%0Avoid+bar(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bi%5D%5Bj%5D+%2B%3D+b%5Bi%5D%5Bj%5D%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:49.999182365253795,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang500,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),libs:!(),options:'-O3+-funroll-loops',source:1),l:'5',n:'0',o:'x86-64+clang+5.0.0+(Editor+%231,+Compiler+%231)',t:'0')),k:50.000817634746205,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4
> 
> I was thinking that modern compilers are optimizing such cases very
> aggressively. I quickly checked gcc, there is special option
> -floop-interchange, but it requires properly configured gcc compiler
> with Graphite infrastructure enabled. I didn't have that at hand, so I
> just stopped right there.
> 
> Loop has no side effects and arrays do not alias, so from my point of
> view it's safe to do interchange here. I tried different march options
> as well with no success.
> I'm interested if llvm is capable of doing loop interchange in
> general, at least the most simple cases like the mentioned one?
> 

There is a (relatively basic) loop interchange pass in LLVM. It is not 
enabled by default however, but you can enable it by passing `-mllvm 
-enable-loopinterchange` to Clang.

LoopInterchange in Clang does not trigger in your example, because a[j] 
and b[i] may alias. I am no expert on restrict, but it seems both GCC 
and Clang cannot prove that the access do not alias with the annotations 
you provided (also `int *restrict *restrict` does not do the trick).

In the code below, accesses to aa and bb cannot alias, and both GCC 
(with -O3) and Clang (recent master build, with `-O3 -mllvm 
-enable-loopinterchange`) interchange the 2 loops.

int aa[N][N];
int bb[N][N];

void foo1() {
     for (int i = 0; i < N; ++i)
         for (int j = 0; j < N; ++j)
             aa[j][i] += bb[j][i];
}

Cheers,
Florian