[PATCH] D81765: Don't inline dynamic allocas that simplify to huge static allocas.

Tue Jun 23 19:59:36 PDT 2020

aemerson marked an inline comment as done.
aemerson added inline comments.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:838
   if (I.isArrayAllocation()) {
     Constant *Size = SimplifiedValues.lookup(I.getArraySize());
----------------
efriedma wrote:
> aemerson wrote:
> > efriedma wrote:
> > > aemerson wrote:
> > > > efriedma wrote:
> > > > > Hhow is the handling for array allocation supposed to work if the alloca isn't in the entry block of the function?  It looks like this code assumes that constant-propagation is enough to turn any dynamic alloca into a static alloca?
> > > > > 
> > > > > Not really a problem with this change, exactly, but I want to make sure I'm understanding the surrounding code correctly.
> > > > I'm not entirely sure what you mean, but if an array allocation isn't in the entry block, and if constant prop can't simplify it to a static alloca, then it remains a dynamic alloca and the inliner gives up (see the end of this function).
> > > My question is what happens if an array allocation isn't in the entry block, but we can simplify the size to a constant.
> > That was the original motivation for this fix actually. If its simplified to a constant, then it can be converted to a static alloca and then later moved into the entry block. If that constant allocation is too large then we can get overflows.
> Take the following testcase.  On trunk, we inline into caller1 and caller3, but not caller2 and caller4.  I think your patch stops inlining in caller1, but not caller3.  That seems weird to me...
> 
> ```
> target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-apple-macosx10.15.0"
> 
> define void @caller1(i8 *%p1, i1 %b) {
> entry:
>   %cond = icmp eq i1 %b, true
>   br i1 %cond, label %exit, label %split
> 
> split:
>   call void @callee(i8* %p1, i32 0, i32 -1)
>   br label %exit
> 
> exit:
>   ret void
> }
> 
> define void @caller2(i8 *%p1, i1 %b, i32 %len) {
> entry:
>   %cond = icmp eq i1 %b, true
>   br i1 %cond, label %exit, label %split
> 
> split:
>   call void @callee(i8* %p1, i32 0, i32 %len)
>   br label %exit
> 
> exit:
>   ret void
> }
> 
> define void @caller3(i8 *%p1, i1 %b, i32 %len) {
> entry:
>   %cond = icmp eq i1 %b, true
>   br i1 %cond, label %exit, label %split
> 
> split:
>   call void @callee(i8* %p1, i32 0, i32 300)
>   br label %exit
> 
> exit:
>   ret void
> }
> 
> define void @callee(i8* %p1, i32 %l1, i32 %l2) {
> entry:
>   %ext = zext i32 %l2 to i64
>   br label %foo
> foo:
>   %vla = alloca float, i64 %ext, align 16
>   %call = call i1 @extern_call(float* nonnull %vla) nounwind 
>   br i1 %call, label %foo, label %exit
>   
> exit:
>   ret void
> }
> 
> define void @caller4(i8 *%p1, i1 %b, i32 %len) {
> entry:
>   %cond = icmp eq i1 %b, true
>   br i1 %cond, label %exit, label %split
> 
> split:
>   call void @callee2(i8* %p1, i32 0)
>   br label %exit
> 
> exit:
>   ret void
> }
> 
> define void @callee2(i8* %p1, i32 %l1) {
> entry:
>   br label %foo
> foo:
>   %vla = alloca float, i64 300, align 16
>   %call = call i1 @extern_call(float* nonnull %vla) nounwind 
>   br i1 %call, label %foo, label %exit
>   
> exit:
>   ret void
> }
> 
> declare i1 @extern_call(float*)
> ```
At first I thought that was because the 300 value in caller3 was below the threshold for preventing inlining, which it is. But there's actually a bug here where the patch isn't scaling the allocation size by the size of the type. That's why it still allows inlining even if you change it to 30000. I'll fix that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81765/new/

https://reviews.llvm.org/D81765