[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Tue Feb 10 13:09:09 PST 2015

In http://reviews.llvm.org/D7514#121534, @ohsallen wrote:

> Hi Michael,
>
> > Probably I miss somethings, but what dependencies would unrolling of the outer loop break?
>
>
> It would break the dependencies between the reduction operations. With the example below, we can dispatch twice more instructions (if target permits), which is profitable to exploit more ILP.
>
>   // Original loop.
>   for (int i = 0; i < n; i++) 
>       for (int j = 0; j < 3; j++)
>           r += arr[i][j];
>  
>   // After unrolling innermost loop.
>   for (int i = 0; i < n; i++)  {
>       r += arr[i][0];
>       r += arr[i][1];
>       r += arr[i][2];
>   }

We should do some experimentation on other architectures. An ooo core might be able to hide latency by dispatching loop iterations before the previous ones retire, however, I'm not sure what limits exist on this in general, and how we should model it. Do the instruction register dependencies from the reduction kill this in general, or only on certain targets? It might be that, even when it is possible, it is limited by register renaming resources, and if there are many reduction steps in the loop body, and each step requires a rename, then our ability to speculate is limited significantly by that. And there are in-order targets that can't speculate into future loop iterations at all. Thoughts?

> // After unrolling outermost loop (in vectorizer, which breaks dependencies).

>  for (int i = 0; i < n; i += 2)  {

> 

>   r += arr[i][0];

>   r_0 += arr[i+1][0];

>   r += arr[i][1];

>   r_0 += arr[i+1][1];

>   r += arr[i][2];

>   r_0 += arr[i+1][2];

> 

> }

>  r += r_0;

> 

>   

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/