[llvm-dev] Is llvm capable of doing loop interchange optimization?

Mon Nov 20 04:35:27 PST 2017

On 20/11/2017, Florian Hahn <florian.hahn at arm.com> wrote:
> Hi Denis,
>
> On 18/11/2017 15:26, Denis Bakhvalov via llvm-dev wrote:
>> Hello,
>>
>> I've been playing around with the really simple example of not cache
>> friendly loop like this:
>>
>> #define N 100
>>
>> void foo(int** __restrict__ a,
>>           int** __restrict__ b)
>> {
>>      for (int i = 0; i < N; ++i)
>>          for (int j = 0; j < N; ++j)
>>              a[j][i] += b[j][i];
>> }
>>
>> link to compiler explorer:
>> https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,source:'%23define+N+100%0A%0Avoid+foo(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bj%5D%5Bi%5D+%2B%3D+b%5Bj%5D%5Bi%5D%3B%0A%7D%0A%0Avoid+bar(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bi%5D%5Bj%5D+%2B%3D+b%5Bi%5D%5Bj%5D%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:49.999182365253795,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang500,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),libs:!(),options:'-O3+-funroll-loops',source:1),l:'5',n:'0',o:'x86-64+clang+5.0.0+(Editor+%231,+Compiler+%231)',t:'0')),k:50.000817634746205,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4
>>
>> I was thinking that modern compilers are optimizing such cases very
>> aggressively. I quickly checked gcc, there is special option
>> -floop-interchange, but it requires properly configured gcc compiler
>> with Graphite infrastructure enabled. I didn't have that at hand, so I
>> just stopped right there.
>>
>> Loop has no side effects and arrays do not alias, so from my point of
>> view it's safe to do interchange here. I tried different march options
>> as well with no success.
>> I'm interested if llvm is capable of doing loop interchange in
>> general, at least the most simple cases like the mentioned one?
>>
>
> There is a (relatively basic) loop interchange pass in LLVM. It is not
> enabled by default however, but you can enable it by passing `-mllvm
> -enable-loopinterchange` to Clang.
>
> LoopInterchange in Clang does not trigger in your example, because a[j]
> and b[i] may alias. I am no expert on restrict, but it seems both GCC
> and Clang cannot prove that the access do not alias with the annotations
> you provided (also `int *restrict *restrict` does not do the trick).
>
> In the code below, accesses to aa and bb cannot alias, and both GCC
> (with -O3) and Clang (recent master build, with `-O3 -mllvm
> -enable-loopinterchange`) interchange the 2 loops.
>
>
> int aa[N][N];
> int bb[N][N];
>
> void foo1() {
>      for (int i = 0; i < N; ++i)
>          for (int j = 0; j < N; ++j)
>              aa[j][i] += bb[j][i];
> }
>
>
>
> Cheers,
> Florian
>

Hi Florian,
Thank you very much for answering the question.
Looks like for now gcc does a better job with the example you provided:
https://godbolt.org/g/Z5Cgu3

-- 
Best regards,
Denis.