[RFC] #pragma ivdep
Redmond, Paul
paul.redmond at intel.com
Wed Mar 6 15:35:31 PST 2013
On 2013-03-06, at 3:42 PM, Pekka Jääskeläinen wrote:
> Hi Paul,
>
> On 03/06/2013 10:11 PM, Redmond, Paul wrote:
>> I have updated the patch to not add metadata on loads and stores where the
>> pointer comes from an alloca. I wonder if the check should be more
>> conservative and only include pointers coming from Arguments and GlobalValues
>> (perhaps Constants too?)
>
> I think it's safer that way around for now.
Hmm.. I'm not sure that this approach (or my original) is general enough. Here's a simple loop and the corresponding IR as generated by clang:
void test0(int *a, int k, int c, int m) {
#pragma ivdep
for (int i = 0; i < m; ++i)
a[i] = a[i + k] * c;
}
define void @test0(i32* %a, i32 %k, i32 %c, i32 %m) #0 {
entry:
%a.addr = alloca i32*, align 8
%k.addr = alloca i32, align 4
%c.addr = alloca i32, align 4
%m.addr = alloca i32, align 4
%i = alloca i32, align 4
store i32* %a, i32** %a.addr, align 8
store i32 %k, i32* %k.addr, align 4
store i32 %c, i32* %c.addr, align 4
store i32 %m, i32* %m.addr, align 4
store i32 0, i32* %i, align 4
br label %for.cond
for.cond: ; preds = %for.inc, %entry
%0 = load i32* %i, align 4
%1 = load i32* %m.addr, align 4
%cmp = icmp slt i32 %0, %1
br i1 %cmp, label %for.body, label %for.end
for.body: ; preds = %for.cond
%2 = load i32* %i, align 4
%3 = load i32* %k.addr, align 4
%add = add nsw i32 %2, %3
%idxprom = sext i32 %add to i64
%4 = load i32** %a.addr, align 8
%arrayidx = getelementptr inbounds i32* %4, i64 %idxprom
%5 = load i32* %arrayidx, align 4
%6 = load i32* %c.addr, align 4
%mul = mul nsw i32 %5, %6
%7 = load i32* %i, align 4
%idxprom1 = sext i32 %7 to i64
%8 = load i32** %a.addr, align 8
%arrayidx2 = getelementptr inbounds i32* %8, i64 %idxprom1
store i32 %mul, i32* %arrayidx2, align 4
br label %for.inc
for.inc: ; preds = %for.body
%9 = load i32* %i, align 4
%inc = add nsw i32 %9, 1
store i32 %inc, i32* %i, align 4
br label %for.cond, !llvm.loop.parallel !0
for.end: ; preds = %for.cond
ret void
}
Here we see that clang allocas a new value to hold 'a'. The modified AnnotateParallelLoopAccess obviously won't find any loads or stores that satisfy the conditions since the pointer will always trace back to an alloca anyway..
Any ideas?
paul
>
> +/// Assuming I is in a parallel loop, return true if I needs
> +/// llvm.mem.parallel_loop_access metadata.
> +static bool AnnotateParallelLoopAccess(llvm::Instruction *I) {
>
> So maybe this comment should be
> "...return true if llvm.mem.parallel_loop_access metadata
> can be added safely to I."
>
> --
> --Pekka
>
More information about the cfe-commits
mailing list