[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Tobias Grosser tobias at grosser.es
Wed Jan 30 15:45:56 PST 2013


On 01/29/2013 07:58 PM, Nadav Rotem wrote:
>
> On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es
> <mailto:tobias at grosser.es>> wrote:
>
>>
>> # ignore assumed dependences.
>> for (i = 0; i < 4; i++) {
>>   tmp1 = A[3i+1];
>>   tmp2 = A[3i+2];
>>   tmp3 = tmp1 + tmp2;
>>   A[3i] = tmp3;
>> }
>>
>> Now I apply for whatever reason a partial reg2mem transformation.
>>
>> float tmp3[1];
>>
>> # ignore assumed dependences. // Still valid?
>> for (i = 0; i < 4; i++) {
>>   tmp1 = A[3i+1];
>>   tmp2 = A[3i+2];
>>   tmp3[0] = tmp1 + tmp2;
>>   A[3i] = tmp3[0];
>> }
>
>
> The transformation that you described is illegal because it changes the
> behavior of the loop. In the first version only A is modified, and in
> the second version of the loop both A and tmp3 are modified. Can you
> think of another example that demonstrates why the per-instruction
> attribute is needed ?

Hi Nadav,

I can not directly follow why this transformation would be illegal by 
itself. Introducing stack memory and performing calculations there is 
something -reg2mem does and that should be legal in the context of 
sequential LLVM-IR. Did I miss something?

I think the transformation I describe is only 'illegal' in the sense 
that it makes the llvm.loop.parallel metadata incorrect. This is exactly 
what I wanted to point out. Metadata was until now always optional, 
meaning transformations that don't understand a piece of metadata would 
never transform code in a way that the metadata becomes incorrect. 
Instead, transformations either know the metadata and
update it accordingly or the metadata will be automatically removed as 
soon as instructions are touched. My impression here comes e.g. from the 
blog post describing LLVM meta data [1]: "A subtle point that was 
touched on above is that we don't want the optimizers to have to know 
about metadata."

You asked for another example. I had the feeling clang should generate 
this metadata automatically given certain user defined pragmas, right?
Here a simple ".c" code:

void foo(float *A) {
         # pragma vectorize
         for (long i = 0; i < 4; i++) {
                 float tmp3 = A[i];
                 A[i + 4] = tmp3;
         }
}

Do you agree this code would be something we want to execute in 
parallel? Looking at the LLVM-IR 'clang -O0 -S' generates from it, we 
actually get the following:

 > define void @foo(float* %A) nounwind uwtable {
 > entry:
 >   %A.addr = alloca float*, align 8
 >   %i = alloca i64, align 8
 >   %tmp3 = alloca float, align 4
 >   store float* %A, float** %A.addr, align 8
 >   store i64 0, i64* %i, align 8
 >   br label %for.cond
 >
 > for.cond:                                         ; preds = %for.inc, >
 >   %0 = load i64* %i, align 8
 >   %cmp = icmp slt i64 %0, 4
 >   br i1 %cmp, label %for.body, label %for.end
 >
 > for.body:                                         ; preds = %for.cond
 >   %1 = load i64* %i, align 8
 >   %2 = load float** %A.addr, align 8
 >   %arrayidx = getelementptr inbounds float* %2, i64 %1
 >   %3 = load float* %arrayidx, align 4
 >   store float %3, float* %tmp3, align 4

clang produces by default a lot of temporary stack arrays. This loop is 
not vectorizable before -mem2reg is executed. Attaching the 
loop.parallel metadata would either be incorrect or we would need to 
define precisely which memory references need to be moved to registers
before the parallelism that was declared by the metadata is actually there.

> I am afraid that so many different llvm transformations will have to be
> modified to preserve parallelism. This is not something that I want to
> slip in. If we want to add new parallelism semantics to LLVM them we
> need to discuss the bigger picture.
 > We need to plan a mechanism that
> will allow us to implement support for a number of different models
> (Vectorizers, SPMD languages such as GL and CL, parallel threads such as
> OpenMP, etc).

I am not proposing to change the types of parallelism the proposed 
meta-data should cover. I just want to make sure the semantics of the 
proposed meta-data are well defined.

Cheers
Tobi

[1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html



More information about the llvm-dev mailing list