[PATCH] D20003: X86CallFrameOpt: a first step towards optimizing inalloca calls (PR27076)
David Kreitzer via llvm-commits
llvm-commits at lists.llvm.org
Fri May 6 08:34:50 PDT 2016
DavidKreitzer added a comment.
Hi Hans,
I think X86CallFrameOptimization is the wrong place to be trying to eliminate the _chkstk calls for inalloca. The ability to do the store-to-push optimization has no bearing on whether the chkstk call is needed. Your comment about the pushes naturally probing the stack is true, but it is also true of the original stores.
I think David had the right idea in r262370. But I assume the problem he ran into is that clang is nesting these inalloca calls, so it isn't easy to tell how much stack space is ultimately going to be allocated. Consider a case like this:
struct S
{
S(const S&);
char a[3000];
};
void f1(S, int);
int f2(S);
void f3(S *s)
{
f1(*s, f2(*s));
}
We get this from clang:
define void @"\01?f3@@YAXPAUS@@@Z"(%struct.S* %s) #0 {
entry:
%argmem4 = alloca inalloca <{ %struct.S, i32 }>, align 4
%inalloca.save1 = tail call i8* @llvm.stacksave()
%argmem = alloca inalloca <{ %struct.S }>, align 4
%0 = getelementptr inbounds <{ %struct.S }>, <{ %struct.S }>* %argmem, i32 0,
i32 0
%call = call x86_thiscallcc %struct.S* @"\01??0S@@QAE at ABU0@@Z"(%struct.S* %0,
%struct.S* dereferenceable(3000) %s)
%call2 = call i32 @"\01?f2@@YAHUS@@@Z"(<{ %struct.S }>* inalloca nonnull %argm
em)
call void @llvm.stackrestore(i8* %inalloca.save1)
%1 = getelementptr inbounds <{ %struct.S, i32 }>, <{ %struct.S, i32 }>* %argme
m4, i32 0, i32 0
%call3 = call x86_thiscallcc %struct.S* @"\01??0S@@QAE at ABU0@@Z"(%struct.S* %1,
%struct.S* dereferenceable(3000) %s)
%2 = getelementptr inbounds <{ %struct.S, i32 }>, <{ %struct.S, i32 }>* %argme
m4, i32 0, i32 1
store i32 %call2, i32* %2, align 4, !tbaa !1
call void @"\01?f1@@YAXUS@@H at Z"(<{ %struct.S, i32 }>* inalloca nonnull %argmem
4)
ret void
}
See how there are two inalloca calls at the top? They effectively grow the stack by 6000 bytes (which is big enough to require _chkstk even though the separate 3000-byte allocations are not). Neither MSVC nor ICC do it like this. They both allocate space for the call to f2, call f2, and cleanup the stack from calling f2 before allocating space for the call to f1. I think we need to fix clang to do something similar and then David's solution ought to work.
For the example in pr27076, another thing clang could do is just avoid inalloca altogether. We shouldn't need it for passing objects that use the default copy constructor.
http://reviews.llvm.org/D20003
More information about the llvm-commits
mailing list