[RFC] #pragma ivdep

Redmond, Paul paul.redmond at intel.com
Wed Mar 6 15:35:31 PST 2013


On 2013-03-06, at 3:42 PM, Pekka Jääskeläinen wrote:

> Hi Paul,
> 
> On 03/06/2013 10:11 PM, Redmond, Paul wrote:
>> I have updated the patch to not add metadata on loads and stores where the
>> pointer comes from an alloca. I wonder if the check should be more
>> conservative and only include pointers coming from Arguments and GlobalValues
>> (perhaps Constants too?)
> 
> I think it's safer that way around for now.

Hmm.. I'm not sure that this approach (or my original) is general enough. Here's a simple loop and the corresponding IR as generated by clang:

void test0(int *a, int k, int c, int m) {                                       
  #pragma ivdep                                                                 
  for (int i = 0; i < m; ++i)                                                   
    a[i] = a[i + k] * c;                                                        
}

define void @test0(i32* %a, i32 %k, i32 %c, i32 %m) #0 {
entry:
  %a.addr = alloca i32*, align 8
  %k.addr = alloca i32, align 4
  %c.addr = alloca i32, align 4
  %m.addr = alloca i32, align 4
  %i = alloca i32, align 4
  store i32* %a, i32** %a.addr, align 8
  store i32 %k, i32* %k.addr, align 4
  store i32 %c, i32* %c.addr, align 4
  store i32 %m, i32* %m.addr, align 4
  store i32 0, i32* %i, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32* %i, align 4
  %1 = load i32* %m.addr, align 4
  %cmp = icmp slt i32 %0, %1
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %2 = load i32* %i, align 4
  %3 = load i32* %k.addr, align 4
  %add = add nsw i32 %2, %3
  %idxprom = sext i32 %add to i64
  %4 = load i32** %a.addr, align 8
  %arrayidx = getelementptr inbounds i32* %4, i64 %idxprom
  %5 = load i32* %arrayidx, align 4
  %6 = load i32* %c.addr, align 4
  %mul = mul nsw i32 %5, %6
  %7 = load i32* %i, align 4
  %idxprom1 = sext i32 %7 to i64
  %8 = load i32** %a.addr, align 8
  %arrayidx2 = getelementptr inbounds i32* %8, i64 %idxprom1
  store i32 %mul, i32* %arrayidx2, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %9 = load i32* %i, align 4
  %inc = add nsw i32 %9, 1
  store i32 %inc, i32* %i, align 4
  br label %for.cond, !llvm.loop.parallel !0

for.end:                                          ; preds = %for.cond
  ret void
}

Here we see that clang allocas a new value to hold 'a'. The modified AnnotateParallelLoopAccess obviously won't find any loads or stores that satisfy the conditions since the pointer will always trace back to an alloca anyway..

Any ideas?

paul

> 
> +/// Assuming I is in a parallel loop, return true if I needs
> +/// llvm.mem.parallel_loop_access metadata.
> +static bool AnnotateParallelLoopAccess(llvm::Instruction *I) {
> 
> So maybe this comment should be
> "...return true if llvm.mem.parallel_loop_access metadata
> can be added safely to I."
> 
> -- 
> --Pekka
> 





More information about the cfe-commits mailing list