[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer

Wed Jan 30 16:17:04 PST 2013

Tobi, 

I completely agree with everything you wrote here. I share your concerns and I would also like to see a clear definition of the 'llvm.parallel' pragma. 

Thanks,
Nadav

On Jan 30, 2013, at 3:45 PM, Tobias Grosser <tobias at grosser.es> wrote:

> On 01/29/2013 07:58 PM, Nadav Rotem wrote:
>> 
>> On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es
>> <mailto:tobias at grosser.es>> wrote:
>> 
>>> 
>>> # ignore assumed dependences.
>>> for (i = 0; i < 4; i++) {
>>>  tmp1 = A[3i+1];
>>>  tmp2 = A[3i+2];
>>>  tmp3 = tmp1 + tmp2;
>>>  A[3i] = tmp3;
>>> }
>>> 
>>> Now I apply for whatever reason a partial reg2mem transformation.
>>> 
>>> float tmp3[1];
>>> 
>>> # ignore assumed dependences. // Still valid?
>>> for (i = 0; i < 4; i++) {
>>>  tmp1 = A[3i+1];
>>>  tmp2 = A[3i+2];
>>>  tmp3[0] = tmp1 + tmp2;
>>>  A[3i] = tmp3[0];
>>> }
>> 
>> 
>> The transformation that you described is illegal because it changes the
>> behavior of the loop. In the first version only A is modified, and in
>> the second version of the loop both A and tmp3 are modified. Can you
>> think of another example that demonstrates why the per-instruction
>> attribute is needed ?
> 
> Hi Nadav,
> 
> I can not directly follow why this transformation would be illegal by itself. Introducing stack memory and performing calculations there is something -reg2mem does and that should be legal in the context of sequential LLVM-IR. Did I miss something?
> 
> I think the transformation I describe is only 'illegal' in the sense that it makes the llvm.loop.parallel metadata incorrect. This is exactly what I wanted to point out. Metadata was until now always optional, meaning transformations that don't understand a piece of metadata would never transform code in a way that the metadata becomes incorrect. Instead, transformations either know the metadata and
> update it accordingly or the metadata will be automatically removed as soon as instructions are touched. My impression here comes e.g. from the blog post describing LLVM meta data [1]: "A subtle point that was touched on above is that we don't want the optimizers to have to know about metadata."
> 
> You asked for another example. I had the feeling clang should generate this metadata automatically given certain user defined pragmas, right?
> Here a simple ".c" code:
> 
> void foo(float *A) {
>        # pragma vectorize
>        for (long i = 0; i < 4; i++) {
>                float tmp3 = A[i];
>                A[i + 4] = tmp3;
>        }
> }
> 
> Do you agree this code would be something we want to execute in parallel? Looking at the LLVM-IR 'clang -O0 -S' generates from it, we actually get the following:
> 
> > define void @foo(float* %A) nounwind uwtable {
> > entry:
> >   %A.addr = alloca float*, align 8
> >   %i = alloca i64, align 8
> >   %tmp3 = alloca float, align 4
> >   store float* %A, float** %A.addr, align 8
> >   store i64 0, i64* %i, align 8
> >   br label %for.cond
> >
> > for.cond:                                         ; preds = %for.inc, >
> >   %0 = load i64* %i, align 8
> >   %cmp = icmp slt i64 %0, 4
> >   br i1 %cmp, label %for.body, label %for.end
> >
> > for.body:                                         ; preds = %for.cond
> >   %1 = load i64* %i, align 8
> >   %2 = load float** %A.addr, align 8
> >   %arrayidx = getelementptr inbounds float* %2, i64 %1
> >   %3 = load float* %arrayidx, align 4
> >   store float %3, float* %tmp3, align 4
> 
> clang produces by default a lot of temporary stack arrays. This loop is not vectorizable before -mem2reg is executed. Attaching the loop.parallel metadata would either be incorrect or we would need to define precisely which memory references need to be moved to registers
> before the parallelism that was declared by the metadata is actually there.
> 
>> I am afraid that so many different llvm transformations will have to be
>> modified to preserve parallelism. This is not something that I want to
>> slip in. If we want to add new parallelism semantics to LLVM them we
>> need to discuss the bigger picture.
> > We need to plan a mechanism that
>> will allow us to implement support for a number of different models
>> (Vectorizers, SPMD languages such as GL and CL, parallel threads such as
>> OpenMP, etc).
> 
> I am not proposing to change the types of parallelism the proposed meta-data should cover. I just want to make sure the semantics of the proposed meta-data are well defined.
> 
> Cheers
> Tobi
> 
> [1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html