[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Wed Jun 23 00:00:26 PDT 2021

ychen added a comment.

In D97915#2832667 <https://reviews.llvm.org/D97915#2832667>, @ChuanqiXu wrote:

> In D97915#2832446 <https://reviews.llvm.org/D97915#2832446>, @ychen wrote:
>
>> In D97915#2829581 <https://reviews.llvm.org/D97915#2829581>, @ChuanqiXu wrote:
>>
>>> A remained question.
>>>
>>> - what's the semantics if user specified their allocation/deallocation functions?
>>>
>>> Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
>>> It looks like we would treat them just like `::operator new`. And it makes sense at the first glance. But the problem comes from that we judge whether
>>> or not to over allocate a frame by this condition:
>>>
>>>   coro.align > align of new
>>>
>>> But if the user uses their new, it makes no sense to use the `align of new` as the condition. On the other hand, if user specified their new function and the 
>>> alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?
>>>
>>> May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.
>>
>> I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. `align of new` (aka __STDCPP_DEFAULT_NEW_ALIGNMENT__) should apply to both.
>
> Yeah, I just curious about the question and not sure about the answer yet. I agree with that it should be safe if we treat user-specified allocation/deallocation as ::operator new. Maybe I am a little bit of pedantry. I just not sure if the developer would be satisfied when they find we allocate padding space they didn't want when they offered a new/delete method. (Maybe I am too anxious).
>
> ---
>
> Another problem I find in this patch is that the introduction for `raw frame` makes the frame confusing. For example, the following codes is aim to calculate the size for the over allocated  frame:
>
>   %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
>   %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
>   %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
>
> It makes sense only if `llvm.coro.size` stands for the size of 'true frame' (I am not sure what's the meaning for raw frame now. Let me use 'true frame' temporarily). But the document now says that '@llvm.coro.size' returns the size of the frame. It's confusing
> I am not require to fix it by rename 'llvm.coro.size' or rephrase the document simply. I am thinking about the the semantics of coroutine intrinsics after we introduce the concept of 'raw frame'.

These are great points. The semantics of some coroutine intrinsics needs a little bit of tweaking otherwise they are confusing with the introduction of `raw frame` (suggestions for alternative names are much appreciated) which I defined in the updated patch (Coroutines.rst). Docs for `llvm.coro.size` mentions `coroutine frame` which I've used verbatim in the docs for `llvm.coro.raw.frame.ptr.*`.
Using your diagram below: `raw frame` is  `| --- for align --- | --- true frame --- |`. `Coroutine frame` is `| --- true frame --- |`. (BTW, `llvm.coro.begin` docs are stale which I updated in the patch, please take a look @ChuanqiXu @rjmccall @lxfind).

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:452-461
+  llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
+      llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
+  auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
+  auto *FramePtrAddrStart =
+      CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
+  auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
+      FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());
----------------
ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > We allocate overaligned-frame like:
> > > ```
> > > | --- for align --- | --- true frame --- |
> > > ```
> > > And here we set the argument to the address of true frame. Then I wonder how about the memory for the `for align` part. Would we still free them correctly? Maybe I missed something.
> > > Would we still free them correctly? 
> > Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory pointer to the coroutine frame.  Later at dealloc time, `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory pointer back and free it.
> To make it clear, what's the definition for 'raw ptr'? From the context, I think it means the true frame in above diagram from the context.
> 
> So this confuses me:
> > llvm.coro.raw.frame.ptr.addr is called again to load the raw memory pointer back and free it.
> 
> If the raw memory means the true frame, it may not be right. Since the part for 'for-align' wouldn't be freed.
I've updated the `coroutine.rst` to a hopefully better explanation of the semantics of the newly added intrinsics.

With the above diagram, `raw frame` is the whole thing `| --- for align --- | --- true frame --- |`, `raw frame ptr` points to the left bar of `for align`. 

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:424
+// dynamically adjust the frame start address at runtime.
+void OverAllocateFrame(CodeGenFunction &CGF, llvm::CallInst *CI, bool IsAlloc) {
+  unsigned CoroSizeIdx = IsAlloc ? 0 : 1;
----------------
ChuanqiXu wrote:
> So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too).
> By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it?
> So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too).

Renamed to `GrowFrameSize` since there are similar uses of `grow` in LLVM. Please let me know if it makes sense.

> By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it?

I think it works to a certain degree and attempted it with D100739. In theory, the over-alignment handling should be dealt with in the front-end since that's an ABI issue (not specified in any ABI documentation though). However, coroutine is special since the optimizer could change the alignment so there must be some work that needs to be delayed until CoroSplit(CoroFrame) time. In D100739, I was trying to argue that more than one frontend may need this over-alignment handling hence it *might* be ok to implement all work in LLVM. @rjmccall (https://reviews.llvm.org/D100739#2718681) seems not a fan of that and thinks `llvm.coro.raw.frame.ptr.offset` could be the way forward hence the current design which let front-end do all the work and leave the required piece to LLVM. That required piece is, at CoroSplit(CoroFrame) time, decide if the frame is over-aligned and if so add the `raw frame pointer` to the frame itself.

================
Comment at: clang/lib/CodeGen/CGCoroutine.cpp:719
+  auto *AlignedUpAddr =
+      EmitBuiltinAlignTo(AlignedAllocateCall, CoroAlign, S.getAllocate(), true);
+  // rawFrame = frame
----------------
ChuanqiXu wrote:
> How do your think about to replace EmitBuiltinAlignTo inplace?
I think with the interface issue being fixed, it is preferable to call it but I don't feel strongly about it so I just went ahead inlined `EmitBuiltinAlignTo` to help review/discussion.

================
Comment at: clang/test/CodeGenCoroutines/coro-alloc.cpp:106
+  // CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
+  // CHECK-NEXT: %[[MEM:.+]] = load i8*, i8** %[[ADDR2]], align 8
+  // CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
----------------
ChuanqiXu wrote:
> It defines variable 'MEM' again in conflicting with the line at 89. Does it matter?
It should not matter. It works like variable definitions. Uses always get the most recent definitions. "FileCheck variables can be defined multiple times, and substitutions always get the latest value. Variables can also be substituted later on the same line they were defined on."

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915