[PATCH] D13529: [Polly] Allow alloca instructions in the SCoP

Michael Kruse via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 16 01:11:51 PST 2016


Meinersbur added a comment.

> We should move this discussion to the patch page http://reviews.llvm.org/D13529 once phab is

>  available again.


Here we are.

> You can deal with the inlineing once we cross that bridge. However, just

>  as a first idea (which has __nothing__ to do with http://reviews.llvm.org/D13529):

> 

>   An alloca that is inlined and was in the function entry of the callee

>   (which is the case most of the time) [or even at any other block that

>   was not part of a loop] can be placed in the function entry of the

>   caller, thus not increasing the maximal stack usage. We can even place

>   appropriate llvm.lifetime.start/stop markers to help futher analyses.




> Let's focus on one thing at a time (which would be a simple alloca).


This was what I suggesting as well (at the email's end). But there is no need for modeling of allocas in SCoPs in these simple cases. Hongbin's original email started trying that.

> You mention stacksave/stackrestore a lot but I am still not sure why.


stacksave/stackrestore is essential for correctly handling alloca in the general case. Clang will insert them for variable length variables (CodeGenFunction::EmitAutoVarAlloca), all others will be in the entry block. The only other possibility I know to get such an alloca (without stacksave/stackrestore) is __builtin_alloca()/alloca() that are hopefully not used in loops or will likely trigger undefined behaviour. We should ignore such cases.

Are those benchmarks in which you encountered them of any importance for Polly? (Barcelona OpenMP Task Suite is not part of LLVM's test suite and there is no alloca() in its alignment test. I could find one in in OpenMP loop in nqueens and floorplan each, which i evil enough but OpenMP compilation should put them into separate functions).

In test-suite, I found that oggenc.c contains such allocas, but to allocate array elements:

  for(i=0;i<vi1->channels;i++)
    lappcm[i]=alloca(sizeof(**lappcm)*n1);

That's something we cannot loop-optimize.

Another occurance is  in gawk's builtin.c with a comment excusing its use. (And memcpy call in the loop, so Polly won't optimize it either)

For inlining, blacklisting stacksave/stackrestore in ScopDetection is not enough. Handled naively, we would compile

  void foo(int m) {
  	float A[m];
  }
  
  void bar(int n) {
  	for (int i = 0; i < n; i += 1) {
  		foo(i);
  	}
  }

into

  define void @bar(i32 %n) #0 {
  entry:
    %n.addr = alloca i32, align 4
    %i = alloca i32, align 4
    store i32 %n, i32* %n.addr, align 4
    store i32 0, i32* %i, align 4
    br label %for.cond
  
  for.cond:                                         ; preds = %for.inc, %entry
    %0 = load i32, i32* %i, align 4
    %1 = load i32, i32* %n.addr, align 4
    %cmp = icmp slt i32 %0, %1
    br i1 %cmp, label %for.body, label %for.end
  
  for.body:                                         ; preds = %for.cond
    %2 = load i32, i32* %i, align 4
    %3 = zext i32 %2 to i64
    %vla = alloca float, i64 %3, align 16
    br label %for.inc
  
  for.inc:                                          ; preds = %for.body
    %6 = load i32, i32* %i, align 4
    %add = add nsw i32 %6, 1
    store i32 %add, i32* %i, align 4
    br label %for.cond
  
  for.end:                                          ; preds = %for.cond
    ret void
  }

which will stack overflow for n high large enough where the source program did not, ie. this is a miscompile.


http://reviews.llvm.org/D13529





More information about the llvm-commits mailing list