[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Tue Jun 22 00:02:23 PDT 2021

ychen marked 5 inline comments as done.
ychen added a comment.

In D97915#2829581 <https://reviews.llvm.org/D97915#2829581>, @ChuanqiXu wrote:

> A remained question.
>
> - what's the semantics if user specified their allocation/deallocation functions?
>
> Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
> It looks like we would treat them just like `::operator new`. And it makes sense at the first glance. But the problem comes from that we judge whether
> or not to over allocate a frame by this condition:
>
>   coro.align > align of new
>
> But if the user uses their new, it makes no sense to use the `align of new` as the condition. On the other hand, if user specified their new function and the 
> alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?
>
> May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.

I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. `align of new` (aka __STDCPP_DEFAULT_NEW_ALIGNMENT__) should apply to both.

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:743-744
+        CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
+    auto *AlignedUpAddr = EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign,
+                                             S.getAlignedAllocate(), true);
+    // rawFrame = frame;
----------------
ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > ChuanqiXu wrote:
> > > > > Maybe we could calculate it in place instead of trying to call a function which is not designed for llvm::value*. It looks like the calculation isn't too complex.
> > > > I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the useful parts of `EmitBuiltinAlignTo`.  The initial intention is code sharing and easy readability. What's the benefit of not calling it?
> > > Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand.
> > > 
> > > Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.
> > > Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand.
> > > 
> > > Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.
> > 
> > 
> I guess you forgot to reply what you want to say.
Yep, I meant to say the use of "void *" is removed.

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:431-433
+  auto *Diff = Builder.CreateNSWSub(AlignCall, AlignOfNewInt);
+  auto *NewCoroSize = Builder.CreateAdd(CI->getArgOperand(CoroSizeIdx), Diff);
+  CI->setArgOperand(CoroSizeIdx, NewCoroSize);
----------------
ChuanqiXu wrote:
> In other comments, I find 'size += align - NEW_ALIGN + sizeof(void*);'. But I don't find sizeof(void*) in this function.
Sorry, that's a stale comment. It should be `size += align - NEW_ALIGN`. The `sizeof(void*)` was supposed for the newly added raw memory pointer stored in the frame. In the current implementation, `sizeof(void*)` is factored into the `llvm.coro.size()` calculation because CoroFrame is responsible for allocating the extra raw memory pointer if it is needed at all.

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+      llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+      CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+      FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());
----------------
ChuanqiXu wrote:
> We allocate overaligned-frame like:
> ```
> | --- for align --- | --- true frame --- |
> ```
> And here we set the argument to the address of true frame. Then I wonder how about the memory for the `for align` part. Would we still free them correctly? Maybe I missed something.
> Would we still free them correctly? 
Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory pointer to the coroutine frame.  Later at dealloc time, `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory pointer back and free it.

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:466-472
+  if (Dealloc->getNumArgOperands() > 1) {
+    // Size may only be the second argument of allocator call.
+    if (auto *CoroSize =
+            dyn_cast<llvm::IntrinsicInst>(Dealloc->getArgOperand(1)))
+      if (CoroSize->getIntrinsicID() == llvm::Intrinsic::coro_size)
+        OverAllocateFrame(CGF, Dealloc, /*IsAlloc*/ false);
+  }
----------------
ChuanqiXu wrote:
> We don't need to handle this condition in this patch.
This handling is for `sized delete` (`void T::operator delete  ( void* ptr, std::size_t sz );`) instead of `aligned delete`. `sized delete` needs the same `size` that is used for `new`. Please check the `f3` test in `coro-alloc.cpp` (The test was missing the CHECK lines for this, I've added it.).

================
Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:1133-1137
   // Create an entry for every spilled value.
   for (auto &S : FrameData.Spills) {
     FieldIDType Id = B.addField(S.first->getType(), None);
     FrameData.setFieldIndex(S.first, Id);
   }
----------------
ChuanqiXu wrote:
> Why we move this snippet to the front? Although it is not defined, the layout for the current frame would be:
> ```
> | resume func addr | destroy func addr | promise | other things needed |
> ```
> Move this to the front may break this.
The intent is to structure the code better, no intention to change the frame layout here. My understanding is that `promise` already has a fixed offset ahead of this. `FrameData::Allocas` is ordered but there is no defined semantics. There seem no tests failing due to reordered frame layout. However, I might be wrong. Could you describe how it changes the layout?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915