[Openmp-commits] [PATCH] D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad
George Rokos via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Apr 5 11:13:19 PDT 2018
grokos added a comment.
One caveat regarding Alexey's proposal: According to the CUDA programming guide <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations>, `malloc` on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.
More information about the Openmp-commits