[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Tue Jan 29 12:25:28 PST 2013

On 01/29/2013 08:58 PM, Nadav Rotem wrote:
>
> On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es
> <mailto:tobias at grosser.es>> wrote:
>
>>
>> # ignore assumed dependences.
>> for (i = 0; i < 4; i++) {
>>   tmp1 = A[3i+1];
>>   tmp2 = A[3i+2];
>>   tmp3 = tmp1 + tmp2;
>>   A[3i] = tmp3;
>> }
>>
>> Now I apply for whatever reason a partial reg2mem transformation.
>>
>> float tmp3[1];
>>
>> # ignore assumed dependences. // Still valid?
>> for (i = 0; i < 4; i++) {
>>   tmp1 = A[3i+1];
>>   tmp2 = A[3i+2];
>>   tmp3[0] = tmp1 + tmp2;
>>   A[3i] = tmp3[0];
>> }
>
>
> The transformation that you described is illegal because it changes the behavior
> of the loop. In the first version only A is modified, and in the second version
> of the loop both A and tmp3 are modified. Can you think of another example that
> demonstrates why the per-instruction attribute is needed ?

The problem here is that a "parallel loop" and a traditional C "sequential loop"
are different beasts semantically and should be treated as such by all
optimizations.

I'm afraid the above is a legal transformation in sequential C code. That tmp3
is a stack object added by the compiler, thus the programmer's view of the
program behavior doesn't change (similar to reg spilling). Due to the C's
sequential semantics of loops, the additional loop carried dependency
still retains the original intended behavior (but breaks the parallel
loop semantics).

The correct way would be to treat the loop as a parallel loop, not a
sequential loop (as the programmer has wanted), thus add and *respect* the
parallel loop metadata throughout LLVM. I.e., not allow optimizations like this
because the loop's semantics become different.

I think that's the best way to go but quite intrusive and risky (takes
time to stabilize) therefore the other way around might make more sense
until the cases have been shaken out. Or, we can just "jump straight
into the cold water" and add the parallel loop information to the loop branch
only and fix illegal optimizations as they appear. I'm OK with this also.

> I am afraid that so many different llvm transformations will have to be modified
> to preserve parallelism. This is not something that I want to slip in. If we
> want to add new parallelism semantics to LLVM them we need to discuss the bigger
> picture. We need to plan a mechanism that will allow us to implement support for
> a number of different models (Vectorizers, SPMD languages such as GL and CL,
> parallel threads such as OpenMP, etc).

In my opionion we should start with the lowest hanging fruit (the parallel
loops) and improve on this as we get experience on it.

-- 
--Pekka