[PATCH] D80951: [GlobalOpt] Remove preallocated calls when possible
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 18 12:01:27 PDT 2020
efriedma added inline comments.
================
Comment at: llvm/lib/Transforms/IPO/GlobalOpt.cpp:2334
+
+ // FIXME: This doesn't handle invoke
+ Builder.SetInsertPoint(NewCB->getNextNonDebugInstruction());
----------------
aeubanks wrote:
> aeubanks wrote:
> > efriedma wrote:
> > > aeubanks wrote:
> > > > efriedma wrote:
> > > > > aeubanks wrote:
> > > > > > efriedma wrote:
> > > > > > > "This doesn't handle invoke" is rough...
> > > > > > >
> > > > > > > In the normal destination, you can stick a stackrestore as the first instruction (breaking the critical edge if necessary). In the unwind destination, I'm not sure. For Itanium-style unwinding, you could just stick it immediately after the landingpad, but I'm not sure what you can do on Windows; you can't stackrestore in a funclet.
> > > > > > >
> > > > > > > I guess ultimately, we really want to use a static alloca (in the entry block) in most cases, but it's not safe in all cases, as we've discussed before.
> > > > > > >
> > > > > > > At the very least, this probably should bail out somehow, not crash.
> > > > > > Bailed out for any invokes.
> > > > > >
> > > > > > I've been trying to understand how inalloca handles this and still don't quite understand.
> > > > > >
> > > > > >
> > > > > > ```
> > > > > > void bar2() {
> > > > > > for (int i = 0; i < 1000000; ++i) {
> > > > > > try {
> > > > > > foo(A(), 5);
> > > > > > } catch (int) {}
> > > > > > }
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > becomes
> > > > > >
> > > > > > ```
> > > > > > define dso_local void @"?bar2@@YAXXZ"() local_unnamed_addr #0 personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) {
> > > > > > entry:
> > > > > > %agg.tmp.ensured = alloca %struct.A, align 8
> > > > > > br label %for.body
> > > > > >
> > > > > > for.cond.cleanup: ; preds = %for.inc
> > > > > > ret void
> > > > > >
> > > > > > for.body: ; preds = %for.inc, %entry
> > > > > > %i.05 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
> > > > > > %inalloca.save = call i8* @llvm.stacksave()
> > > > > > %argmem = alloca inalloca <{ %struct.A*, %struct.A, i32 }>, align 4
> > > > > > %0 = getelementptr inbounds <{ %struct.A*, %struct.A, i32 }>, <{ %struct.A*, %struct.A, i32 }>* %argmem, i32 0, i32 1
> > > > > > %call = call x86_thiscallcc %struct.A* @"??0A@@QAE at XZ"(%struct.A* nonnull %0) #2
> > > > > > %1 = getelementptr inbounds <{ %struct.A*, %struct.A, i32 }>, <{ %struct.A*, %struct.A, i32 }>* %argmem, i32 0, i32 0
> > > > > > store %struct.A* %agg.tmp.ensured, %struct.A** %1, align 4
> > > > > > %2 = getelementptr inbounds <{ %struct.A*, %struct.A, i32 }>, <{ %struct.A*, %struct.A, i32 }>* %argmem, i32 0, i32 2
> > > > > > store i32 5, i32* %2, align 4, !tbaa !3
> > > > > > %call1 = invoke %struct.A* @"?foo@@YA?AUA@@U1 at H@Z"(<{ %struct.A*, %struct.A, i32 }>* inalloca nonnull %argmem)
> > > > > > to label %invoke.cont unwind label %catch.dispatch
> > > > > >
> > > > > > catch.dispatch: ; preds = %for.body
> > > > > > %3 = catchswitch within none [label %catch] unwind to caller
> > > > > >
> > > > > > catch: ; preds = %catch.dispatch
> > > > > > %4 = catchpad within %3 [%rtti.TypeDescriptor2* @"??_R0H at 8", i32 0, i8* null]
> > > > > > catchret from %4 to label %for.inc
> > > > > >
> > > > > > for.inc: ; preds = %invoke.cont, %catch
> > > > > > %inc = add nuw nsw i32 %i.05, 1
> > > > > > %exitcond = icmp eq i32 %inc, 1000000
> > > > > > br i1 %exitcond, label %for.cond.cleanup, label %for.body
> > > > > >
> > > > > > invoke.cont: ; preds = %for.body
> > > > > > call void @llvm.stackrestore(i8* %inalloca.save)
> > > > > > call x86_thiscallcc void @"??1A@@QAE at XZ"(%struct.A* nonnull %agg.tmp.ensured) #2
> > > > > > br label %for.inc
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > The `@llvm.stackrestore()` only gets called in the non-exceptional case, but I verified that this doesn't leak stack memory when `foo()` always throws. I'm not sure how that happens.
> > > > > You're not testing what you want to test; the problem isn't when the function with the inalloca argument itself throws. The behavior is obvious in that case: the alloca gets consumed by the call. The issue is what happens if an exception gets thrown before that. Expanding out your example:
> > > > >
> > > > > ```
> > > > > struct A { A() { throw 1; } ~A(); };
> > > > > void foo(A a, int z); // Never gets called
> > > > > void bar() {
> > > > > for (int i = 0; i < 1000000; ++i) {
> > > > > try {
> > > > > foo(A(), 5);
> > > > > } catch (int) {}
> > > > > }
> > > > > }
> > > > > ```
> > > > >
> > > > > I'm pretty sure with inalloca, the memory just leaks. I guess preallocated has the same problem,
> > > > >
> > > > > Not sure how we solve it with preallocated. I guess in the backend, we need to apply some sort of adjustment after a catchret. Not sure how we compute the size of that adjustment, though. I guess we need to note on each catchret/cleanupret which allocations die at that point.
> > > > I realized that at some point and I've been trying out different variations of `A()` and `foo()` both randomly throwing, plus nested calls to `foo()`, but it seems like there's never a stack leak with inalloca with `@llvm.stackrestore`s removed. The `@llvm.stackrestore` only appears in the non-exceptional path, so it can't possibly help prevent stack leaks due to exceptions? And in the non-exceptional case the stack will get cleaned up as expected. So I'm not sure that the stack restores are necessary for inalloca?
> > > >
> > > > I am still trying to understand how stack cleanups work in an exception handler though.
> > > >
> > > > ```
> > > > void bar() {
> > > > for (int i = 0; i < 1000000; ++i) {
> > > > try {
> > > > foo(foo(A(), A(), 5), A(), 6);
> > > > } catch (int) {
> > > > }
> > > > }
> > > > }
> > > > ```
> > > Try the following for some interesting results at -O0.
> > >
> > > ```
> > > #include <cstdio>
> > > #define INLINE __attribute((always_inline))
> > > //#define INLINE __attribute((noinline))
> > > struct A {
> > > __attribute((optnone)) A(int) {z[1]=10;}
> > > __attribute((optnone)) A() { z[1]=5; throw 1; }
> > > __attribute((optnone)) ~A() { printf("%d\n", z[1]); }
> > > int z[10000];
> > > };
> > > __attribute((optnone)) void foo(int z, A a) { a.z[0] = 100; }
> > > INLINE static int inner() { try {foo(1, A());}catch(int){} return 3; }
> > > void bar() {
> > > for (int i = 0; i < 10; ++i) {
> > > foo(inner(), A(1));
> > > }
> > > }
> > > int main() { bar(); }
> > > ```
> > Interesting...
> > noinline is fine (with and without removing the stack restores), but always_inline produces incorrect results. Keeping the stack restores, the printfs print a random value (consistent within a run), and removing the outer stack restore makes it crash after the first iteration.
> >
> > So this is a bug with the current implementation of inalloca? I can't tell if it's specifically an inalloca/exception/inlining issue (the LLVM IR looks fine at a quick glance) or a deeper exception handling issue.
> >
> > This reminds me of https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html which Reid has brought up to me before, where you can apparently jump out while in the middle of constructing arguments.
> I tried putting the outer stackrestore in the exception path instead of the non-exceptional path and that makes it produce the right output without stack leaks.
>
> Filed https://bugs.llvm.org/show_bug.cgi?id=46386.
One, it looks like llvm::CloneBasicBlock doesn't recognize inalloca allocations as dynamic in some cases. That's pretty clearly a bug, and I think fixing that might be enough to deal with my earlier testcase. But really, it shouldn't matter: the frontend should wrap inalloca allocations with appropriate stacksave/stackrestore operations anyway.
Two, it looks like clang has some general issues with the interaction between dynamic allocation and exceptions. Take the following testcase:
```
int z = 10000;
__attribute((optnone)) int bar(int*z) { int a = *z; throw a;}
int main() {
for (int i = 0; i < 1000; ++i) {
try {
int a[z];
bar(a);
__builtin_trap();
} catch (...) {
}
}
}
```
This currently overflows the stack on Linux because there isn't an appropriate stackrestore anywhere.
Three, I suspect that unwinding on 32-bit Windows is freeing dynamic allocations in cases where it shouldn't, and this is hiding some inalloca issues. Don't have a good testcase for this; this is just what I suspect from not seeing stack overflows in cases where the IR indicates there should be stack overflows.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80951/new/
https://reviews.llvm.org/D80951
More information about the llvm-commits
mailing list