[Openmp-commits] [PATCH] D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad
George Rokos via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Apr 5 11:05:59 PDT 2018
grokos added inline comments.
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:75-76
+ int8_t ExecutionMode;
+ int32_t NumReductionVars;
+ int32_t ReductionVarsSize;
+
----------------
ABataev wrote:
> Why do you need all that data before starting the outlined function? Can we allocate the memory during execution of the outlined function by some runtime function call?
> Like this:
> ```
> __omp_offloading....
> <master>
> %Scratchpad = call i8 *__kmpc_allocate_scratchpad(<Size_of_the_reductions>);
> ....
> __kmpc_deallocate_scratchpad(i8 *%Scratchpad);
> <end_master>
> ```
>
We can go down that route if you prefer. I haven't been able to find official documentation about which type of memory allocation is faster (`cuadMalloc` on the host vs `malloc` on the device), so I assume they perform equally fast.
Any thoughts on that?
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:638-641
+ if (KernelInfo->ExecutionMode == GENERIC) {
+ // Leave room for the master warp which will be added below.
+ cudaThreadsPerBlock -= DeviceInfo.WarpSize[device_id];
+ }
----------------
Hahnfeld wrote:
> I think this shouldn't be in this patch?
Removed.
================
Comment at: libomptarget/plugins/cuda/src/rtl.cpp:709-712
+#ifdef OMPTARGET_DEBUG
+ if (Scratchpad == NULL)
+ DP("Failed to allocate reduction scratchpad\n");
+#endif
----------------
Hahnfeld wrote:
> Maybe error out completely?
Correct, if allocating the scratchpad fails the kernel cannot be executed, so we'll return `OFFLOAD_FAIL`.
Repository:
rOMP OpenMP
https://reviews.llvm.org/D45326
More information about the Openmp-commits
mailing list