[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

Tue Jul 6 23:07:31 PDT 2021

ChuanqiXu added a comment.

In D97915#2861036 <https://reviews.llvm.org/D97915#2861036>, @ychen wrote:

> In D97915#2860984 <https://reviews.llvm.org/D97915#2860984>, @ChuanqiXu wrote:
>
>> In D97915#2860916 <https://reviews.llvm.org/D97915#2860916>, @ychen wrote:
>>
>>> In D97915#2859237 <https://reviews.llvm.org/D97915#2859237>, @ChuanqiXu wrote:
>>>
>>>> In D97915#2848816 <https://reviews.llvm.org/D97915#2848816>, @ychen wrote:
>>>>
>>>>>> Thanks for clarifying. Let's solve the semantics problem first.
>>>>>> With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.
>>>>>
>>>>> Done.
>>>>>
>>>>>> Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic `llvm.coro.raw.size`).  Then we can omit a field in the frame to save space.
>>>>>
>>>>> ("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)
>>>>>
>>>>> Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the `raw frame pointer`" only after `insertSpills` in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the `raw frame pointer` (try grepping "alloc.frame.ptr" in this review page). Using  "llvm.coro.raw.frame.ptr.offset" instead of  "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
>>>>>
>>>>>   19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
>>>>>   20   %3 = sub nsw i64 64, 16
>>>>>   21   %4 = add i64 128, %3
>>>>>   22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
>>>>>   23   %mask = sub i64 64, 1
>>>>>   24   %intptr = ptrtoint i8* %call1 to i64
>>>>>   25   %over_boundary = add i64 %intptr, %mask
>>>>>   26   %inverted_mask = xor i64 %mask, -1
>>>>>   27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
>>>>>   28   %diff = sub i64 %aligned_intptr, %intptr
>>>>>   29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
>>>>>   30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
>>>>>   31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     
>>>>>   
>>>>>        ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
>>>>>        ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
>>>>>        ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
>>>>>        ; %addr1 = bitcast i8* %addr to i8**
>>>>>        ; store i8* %call1, i8** %addr1, align 8
>>>>>   
>>>>>   
>>>>>   32   br label %coro.init.from.coro.alloc.align
>>>>>   33
>>>>>   34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
>>>>>   35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
>>>>>   36   br label %coro.init
>>>>>   37
>>>>>   38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
>>>>>      o.init.from.coro.alloc
>>>>>   39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
>>>>>      .coro.init, %coro.init.from.coro.alloc.align ]
>>>>>   40   %FramePtr = bitcast i8* %5 to %f0.Frame*
>>>>>   41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
>>>>>   42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
>>>>>   43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
>>>>>   44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
>>>>>   45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
>>>>>   46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
>>>>>   47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
>>>>>   48   store i8* %8, i8** %7, align 8
>>>>>   49   br label %AllocaSpillBB
>>>>>   50
>>>>>   51 AllocaSpillBB:                                    ; preds = %coro.init
>>>>>   52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
>>>>>   53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
>>>>>   54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
>>>>>   55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
>>>>>   56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
>>>>>   57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
>>>>>   58   br label %PostSpill
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.
>>>>>
>>>>> The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
>>>>
>>>> Thanks for clarifying!
>>>>
>>>> I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, `%call1` in your example marks the address for the raw frame. Then can we use the value `%call1` in every place where we want to use the address for coroutine frame?
>>>> If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace `llvm.coro.raw.frame` with `%call1` simply. Similarly, we could define intrinsic `llvm.coro.raw.frame.size`. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.
>>>>
>>>> ---
>>>>
>>>> Then the code generated now looks roughly like:
>>>>
>>>>   if (should over align) {
>>>>      /// ...
>>>>      mem = ...
>>>>   } else {
>>>>      /// ...
>>>>      mem = ...
>>>>   }
>>>>   coro.begin(id, mem);
>>>>
>>>> It looks redundant since the `then` part and `else` part looks  very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
>>>> But I am wondering if it is possible to use another pattern (assume `llvm.coro.alloc` returns true):
>>>>
>>>>   %raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
>>>>   %true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
>>>>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>>>>
>>>> Then for `llvm.coro.frame`, we could return `@raw.frame.ptr ` simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
>>>> And for the destruction, we could emit:
>>>>
>>>>   call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size()) 
>>>>
>>>> In this way, I guess we would get simpler implementation and generated codes.
>>>>
>>>> BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.
>>>>
>>>> How do you think about this?
>>>
>>> I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (`%call1`) from coroutine frame pointer statically at deallocation time.
>>
>> Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern:
>>
>>   if (should over align) {
>>      /// ...
>>      mem = ...
>>   } else {
>>      /// ...
>>      mem = ...
>>   }
>>   coro.begin(id, mem);
>>
>> into this one:
>>
>>   %true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
>>                                                                                                                                                                            ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed
>>   call @llvm.coro.begin(coro.id, %true.frame.ptr)
>>
>> and this one:
>>
>>   call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size())                                         ; Then use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want.
>>
>> It looks like we could generate the same code in the front for normal and over aligned coroutines.
>
> Yeah, I think it works for this patch alone. It shifts the semantic lowering from Clang to LLVM but does not perform less work. For future language support like D102147 <https://reviews.llvm.org/D102147>, @llvm.coro.create.frame needs to be repurposed based on the new semantics and that seems a sign that it should be implemented in frontend.

For language support, `::operator new(size_t, align_t)`, I think it could be implemented like:

  %allocated = call @new(call @llvm.coro.raw.frame.size(), align_val)
  %true.frame.ptr = call @llvm.coro.create.frame(%allocated, 0) ; if the second argument is 0, it means `llvm.coro.create.frame` could be lowered to `%allocated` simply.
  call @llvm.coro.begin(coro.id, %true.frame.ptr)

It looks not hard to implement. And we don't need to refactor the CodeGen part a lot. In this way, I think the main effort to support `::operator new(size_t, align_t)` would be in the Sema part and the works remained in CodeGen part would be little. It wouldn't touch the middle end part neither.

> It shifts the semantic lowering from Clang to LLVM but does not perform less work.

I think it would be simpler. At least, we don't need to emit `getReturnStmtOnAllocFailure` twice and we don't need to touch `CallCoroDelete`  neither. And we don't organize the basic blocks in the CodeGenCoroutineBody. And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of `llvm.coro.create.frame` to see if we need to over align coroutine frame.
If yes, we need to lower the `llvm.coro.create.frame` to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return `%allocated` simply.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915