[PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures

Jose Manuel Monsalve Diaz via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 9 11:04:14 PDT 2021


josemonsalve2 added a comment.

In D102107#2867417 <https://reviews.llvm.org/D102107#2867417>, @ABataev wrote:

> In D102107#2867382 <https://reviews.llvm.org/D102107#2867382>, @jdoerfert wrote:
>
>> In D102107#2832740 <https://reviews.llvm.org/D102107#2832740>, @ABataev wrote:
>>
>>> In D102107#2832286 <https://reviews.llvm.org/D102107#2832286>, @jdoerfert wrote:
>>>
>>>> In D102107#2824581 <https://reviews.llvm.org/D102107#2824581>, @ABataev wrote:
>>>>
>>>>> In D102107#2823706 <https://reviews.llvm.org/D102107#2823706>, @jdoerfert wrote:
>>>>>
>>>>>> In D102107#2821976 <https://reviews.llvm.org/D102107#2821976>, @ABataev wrote:
>>>>>>
>>>>>>> We used this kind of codegen initially but later found out that it causes a large overhead when gathering pointers into a record. What about hybrid scheme where the first args are passed as arguments and others (if any) are gathered into a record?
>>>>>>
>>>>>> I'm confused, maybe I misunderstand the problem. The parallel function arguments need to go from the main thread to the workers somehow, I don't see how this is done w/o a record. This patch makes it explicit though.
>>>>>
>>>>> Pass it in a record for workers only? And use a hybrid scheme for all other parallel regions.
>>>>
>>>> I still do not follow. What does it mean for workers only? What is a hybrid scheme? And, probably most importantly, how would we not eventually put everything into a record anyway?
>>>
>>> On the host you don’t need to put everything into a record, especially for small parallel regions. Pass some first args in registers and only the remaining args gather into the record. For workers just pass all args in the record.
>>
>> Could you please respond to my question so we make progress here. We *always* have to pass things in a record, do you agree?
>
> On the GPU device, yes. And I'm absolutely fine with packing args for the GPU device. But the patch packs the args not only for the GPU devices but also for the host and other devices which may not require packing/unpacking. For such devices/host better to avoid packing/unpacking as it introduces overhead in many cases.

Hi Alexey,

Wouldn't you always need to pack to pass the arguments to the outlined function? What is the benefit of avoiding packing the arguments in the runtime call, if then you have to pack them for the outlined function?

I would really appreciate an example, since I am just getting an understanding of OpenMP in LLVM.

Thanks!

>> If we pack the things eventually to pass it to the workers, why would we not pack it right away and avoid complexity? Passing varargs, then packing them later (with the same thread) into a record to give it to the workers is arguably introducing cost. What is the benefit of a hybrid approach given that it is (theoretically) more costly and arguably more complex?




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102107/new/

https://reviews.llvm.org/D102107



More information about the cfe-commits mailing list