[PATCH] D37093: [coroutines] Promote cleanup.dest.slot-like allocas to registers to avoid storing them in the coroutine frame

Tue Sep 5 14:53:15 PDT 2017

hfinkel added a comment.

In https://reviews.llvm.org/D37093#860646, @GorNishanov wrote:

> > That's fine, but does not in itself explain why the variable cannot be extracted from the frame, or otherwise made available to the cleanup code. I'm not worried about the size of the stack frame at -O0 being larger than necessary (it will, compared to optimized code, be larger regardless). Moreover, the lowering algorithm must be sound at the IR level (not just in a state where it happens to work for some patterns that Clang happens to create).
>
> For cleanup, clang creates a pattern that current coroutine frame building algorithm cannot handle. To handle it, we will probably need to convert allocas to SSA and see def/use chains, but, once converted to SSA, the problematic pattern disappears.
>  Another approach is to prove that for a particular alloca store-load pairs never cross a suspend point, if that is the case, we can duplicate the alloca (so that every coroutine part has its own copy) since the value is never carried over from one suspend/resume to another.  However, this algorithm seems complicated and have to be hand built, whereas, we can reuse well-tested alloca promotion algorithm to eliminate the problematic pattern.
>
> Probably moving the promotion from CoroSplit to CoroEarly which rans immediately after front-end will help to make sure that later optimization passes will not interfere with cleanup.dest alloca promotion.

I'm missing something here. So the problem is this code, right:

  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
  call void @free(i8* %mem)
  %x = load i32, i32* %slot

If we have an alloca, %slot in this case, that is promoted to live in the coroutine frame, why don't you just extract it again if it's needed after the call to coro.free?

Currently, this gets transformed into:

  define internal fastcc void @happy.case.destroy(%happy.case.Frame* %FramePtr) {
  entry.destroy:
    %vFrame = bitcast %happy.case.Frame* %FramePtr to i8*
    %slot.reload.addr = getelementptr inbounds %happy.case.Frame, %happy.case.Frame* %FramePtr, i32 0, i32 4
    call void @free(i8* %vFrame)
    %x = load i32, i32* %slot.reload.addr
    call void @print.i32(i32 %x)
    ret void
  }

How about, for all uses of alloca after the original call to @llvm.coro.free, you should create a new alloca in the foo.destroy function, copy the value from the frame into that new alloca, and then use that alloca in the later code? Like this:

  define internal fastcc void @happy.case.destroy(%happy.case.Frame* %FramePtr) {
  entry.destroy:
    %slot = alloca i32
    %vFrame = bitcast %happy.case.Frame* %FramePtr to i8*
    %slot.reload.addr = getelementptr inbounds %happy.case.Frame, %happy.case.Frame* %FramePtr, i32 0, i32 4
    call void @llvm.memcpy.p0i32.p0i32.i32(i32* %slot, i32* %slot.reload.addr, i32 4, i32 4, i1 false)
    call void @free(i8* %vFrame)
    %x = load i32, i32* %slot
    call void @print.i32(i32 %x)
    ret void
  }

https://reviews.llvm.org/D37093