[Openmp-commits] [PATCH] D45326: [OpenMP] [CUDA plugin] Add support for teams reduction via scratchpad

Thu Apr 5 12:31:45 PDT 2018

ABataev added a comment.

In https://reviews.llvm.org/D45326#1058799, @grokos wrote:

> In https://reviews.llvm.org/D45326#1058740, @ABataev wrote:
>
> > In https://reviews.llvm.org/D45326#1058730, @grokos wrote:
> >
> > > One caveat regarding Alexey's proposal: According to the CUDA programming guide <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations>, `malloc` on the device allocates space from a fixed-size heap. The default size of this heap is 8MB. If we run into a scenario where more than 8MB will be required for the reduction scratchpad, allocating the scratchpad from the device will fail. The heap size can be user-defined from the host, but for that to happen the host must know how large the scratchpad needs to be, which defeats the purpose of moving scratchpad allocation from the plugin to the nvptx runtime.
> >
> >
> > But you can change the limit using `cudaThreadSetLimit`
>
>
> That's what I'm saying. You can increase the limit, but how large will you set it? How will you know how many bytes are needed for the scratchpad if the compiler doesn't provide this information?

We already using the global memory allocation, so I don't see any reason why we can't use it for scratchpad. We just need to set some initial amount which is big enough and, probably, add the option that will allow increasing this size.

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D45326