[Openmp-commits] [PATCH] D102107: [OpenMP] Codegen aggregate for outlined function captures
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Jul 13 14:59:15 PDT 2021
ABataev added a comment.
In D102107#2874123 <https://reviews.llvm.org/D102107#2874123>, @jdoerfert wrote:
> In D102107#2867693 <https://reviews.llvm.org/D102107#2867693>, @ABataev wrote:
>
>> In D102107#2867670 <https://reviews.llvm.org/D102107#2867670>, @josemonsalve2 wrote:
>>
>>> In D102107#2867417 <https://reviews.llvm.org/D102107#2867417>, @ABataev wrote:
>>>
>>>> In D102107#2867382 <https://reviews.llvm.org/D102107#2867382>, @jdoerfert wrote:
>>>>
>>>>> In D102107#2832740 <https://reviews.llvm.org/D102107#2832740>, @ABataev wrote:
>>>>>
>>>>>> In D102107#2832286 <https://reviews.llvm.org/D102107#2832286>, @jdoerfert wrote:
>>>>>>
>>>>>>> In D102107#2824581 <https://reviews.llvm.org/D102107#2824581>, @ABataev wrote:
>>>>>>>
>>>>>>>> In D102107#2823706 <https://reviews.llvm.org/D102107#2823706>, @jdoerfert wrote:
>>>>>>>>
>>>>>>>>> In D102107#2821976 <https://reviews.llvm.org/D102107#2821976>, @ABataev wrote:
>>>>>>>>>
>>>>>>>>>> We used this kind of codegen initially but later found out that it causes a large overhead when gathering pointers into a record. What about hybrid scheme where the first args are passed as arguments and others (if any) are gathered into a record?
>>>>>>>>>
>>>>>>>>> I'm confused, maybe I misunderstand the problem. The parallel function arguments need to go from the main thread to the workers somehow, I don't see how this is done w/o a record. This patch makes it explicit though.
>>>>>>>>
>>>>>>>> Pass it in a record for workers only? And use a hybrid scheme for all other parallel regions.
>>>>>>>
>>>>>>> I still do not follow. What does it mean for workers only? What is a hybrid scheme? And, probably most importantly, how would we not eventually put everything into a record anyway?
>>>>>>
>>>>>> On the host you don’t need to put everything into a record, especially for small parallel regions. Pass some first args in registers and only the remaining args gather into the record. For workers just pass all args in the record.
>>>>>
>>>>> Could you please respond to my question so we make progress here. We *always* have to pass things in a record, do you agree?
>>>>
>>>> On the GPU device, yes. And I'm absolutely fine with packing args for the GPU device. But the patch packs the args not only for the GPU devices but also for the host and other devices which may not require packing/unpacking. For such devices/host better to avoid packing/unpacking as it introduces overhead in many cases.
>>>
>>> Hi Alexey,
>>>
>>> Wouldn't you always need to pack to pass the arguments to the outlined function? What is the benefit of avoiding packing the arguments in the runtime call, if then you have to pack them for the outlined function?
>>>
>>> I would really appreciate an example, since I am just getting an understanding of OpenMP in LLVM.
>>>
>>> Thanks!
>>
>> Hi, generally speaking, no, you don't need to pack them. Initially, we packed/unpacked args, but then decided not to do it.
>> Here is an example:
>>
>> int a, b;
>> #pragma omp parallel
>> printf("%d %d\n", a, b);
>>
>> What we generate currently is something like this:
>>
>> %a = alloca i32
>> %b = alloca i32
>> call __kmpc_fork_call(..., @outlined, %a, %b)
>> ...
>> internal @outlined(i32 *%a, i32 *%b) {
>> printf(....);
>> }
>>
>> `__kmpc_fork_call` inside calls `@outlined` function with the passed args.
>
> While on the user facing side this does not pack the arguments, we still do, and that is the point. In the runtime `__kmp_fork_call` we do
>
> for (i = argc - 1; i >= 0; --i)
> *argv++ = va_arg(kmp_va_deref(ap), void *);
>
> which packs the variadic arguments into a buffer. So eventually we have to walk them and store them into a consecutive buffer. What
> happens before is that we pass some of them in registers but at the end of the day we put them in memory. After all, that is what
> `extern int __kmp_invoke_microtask(microtask_t pkfn, int gtid, int npr, int argc, void *argv[], ...`
> expects.
Ok, then it should be good for the host too. Would be good if you could check that it really does not affect the performance and then land it if so.
> If you believe we do not pack/unpack on the host, please walk me through that. As far as I can tell, the "user facing side" might
> look like variadic calls all the way but that is not what is happening in the runtime. Thus, there is no apparent reason to complicate
> the scheme. I'm also happy if you have timing results that indicate otherwise.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D102107/new/
https://reviews.llvm.org/D102107
More information about the Openmp-commits
mailing list