[LLVMdev] [PATCH] parallel loop awareness to the LoopVectorizer
Nadav Rotem
nrotem at apple.com
Wed Jan 30 16:17:04 PST 2013
Tobi,
I completely agree with everything you wrote here. I share your concerns and I would also like to see a clear definition of the 'llvm.parallel' pragma.
Thanks,
Nadav
On Jan 30, 2013, at 3:45 PM, Tobias Grosser <tobias at grosser.es> wrote:
> On 01/29/2013 07:58 PM, Nadav Rotem wrote:
>>
>> On Jan 29, 2013, at 12:51 AM, Tobias Grosser <tobias at grosser.es
>> <mailto:tobias at grosser.es>> wrote:
>>
>>>
>>> # ignore assumed dependences.
>>> for (i = 0; i < 4; i++) {
>>> tmp1 = A[3i+1];
>>> tmp2 = A[3i+2];
>>> tmp3 = tmp1 + tmp2;
>>> A[3i] = tmp3;
>>> }
>>>
>>> Now I apply for whatever reason a partial reg2mem transformation.
>>>
>>> float tmp3[1];
>>>
>>> # ignore assumed dependences. // Still valid?
>>> for (i = 0; i < 4; i++) {
>>> tmp1 = A[3i+1];
>>> tmp2 = A[3i+2];
>>> tmp3[0] = tmp1 + tmp2;
>>> A[3i] = tmp3[0];
>>> }
>>
>>
>> The transformation that you described is illegal because it changes the
>> behavior of the loop. In the first version only A is modified, and in
>> the second version of the loop both A and tmp3 are modified. Can you
>> think of another example that demonstrates why the per-instruction
>> attribute is needed ?
>
> Hi Nadav,
>
> I can not directly follow why this transformation would be illegal by itself. Introducing stack memory and performing calculations there is something -reg2mem does and that should be legal in the context of sequential LLVM-IR. Did I miss something?
>
> I think the transformation I describe is only 'illegal' in the sense that it makes the llvm.loop.parallel metadata incorrect. This is exactly what I wanted to point out. Metadata was until now always optional, meaning transformations that don't understand a piece of metadata would never transform code in a way that the metadata becomes incorrect. Instead, transformations either know the metadata and
> update it accordingly or the metadata will be automatically removed as soon as instructions are touched. My impression here comes e.g. from the blog post describing LLVM meta data [1]: "A subtle point that was touched on above is that we don't want the optimizers to have to know about metadata."
>
> You asked for another example. I had the feeling clang should generate this metadata automatically given certain user defined pragmas, right?
> Here a simple ".c" code:
>
> void foo(float *A) {
> # pragma vectorize
> for (long i = 0; i < 4; i++) {
> float tmp3 = A[i];
> A[i + 4] = tmp3;
> }
> }
>
> Do you agree this code would be something we want to execute in parallel? Looking at the LLVM-IR 'clang -O0 -S' generates from it, we actually get the following:
>
> > define void @foo(float* %A) nounwind uwtable {
> > entry:
> > %A.addr = alloca float*, align 8
> > %i = alloca i64, align 8
> > %tmp3 = alloca float, align 4
> > store float* %A, float** %A.addr, align 8
> > store i64 0, i64* %i, align 8
> > br label %for.cond
> >
> > for.cond: ; preds = %for.inc, >
> > %0 = load i64* %i, align 8
> > %cmp = icmp slt i64 %0, 4
> > br i1 %cmp, label %for.body, label %for.end
> >
> > for.body: ; preds = %for.cond
> > %1 = load i64* %i, align 8
> > %2 = load float** %A.addr, align 8
> > %arrayidx = getelementptr inbounds float* %2, i64 %1
> > %3 = load float* %arrayidx, align 4
> > store float %3, float* %tmp3, align 4
>
> clang produces by default a lot of temporary stack arrays. This loop is not vectorizable before -mem2reg is executed. Attaching the loop.parallel metadata would either be incorrect or we would need to define precisely which memory references need to be moved to registers
> before the parallelism that was declared by the metadata is actually there.
>
>> I am afraid that so many different llvm transformations will have to be
>> modified to preserve parallelism. This is not something that I want to
>> slip in. If we want to add new parallelism semantics to LLVM them we
>> need to discuss the bigger picture.
> > We need to plan a mechanism that
>> will allow us to implement support for a number of different models
>> (Vectorizers, SPMD languages such as GL and CL, parallel threads such as
>> OpenMP, etc).
>
> I am not proposing to change the types of parallelism the proposed meta-data should cover. I just want to make sure the semantics of the proposed meta-data are well defined.
>
> Cheers
> Tobi
>
> [1] http://blog.llvm.org/2010/04/extensible-metadata-in-llvm-ir.html
More information about the llvm-dev
mailing list