[PATCH] D38978: [OpenMP] Enable the lowering of implicitly shared variables in OpenMP GPU-offloaded target regions to the GPU shared memory

Fri Jan 5 11:21:58 PST 2018

gtbercea added a comment.

In https://reviews.llvm.org/D38978#968565, @tra wrote:

> In https://reviews.llvm.org/D38978#968222, @gtbercea wrote:
>
> > > I'm still curious to hear what do you plan to do when your depot use grows beyond certain limit. At the very least there's the physical limit on shared memory size. Shared memory use also affects how many threads can be launched which has large impact on performance. IMO having some sort of user-controllable threshold would be very desirable.
> >
> > When shared memory isn't enough to hold the shared depot, global memory will be used instead. That is a scheme which will be covered by a future patch.
>
>
> Good luck with that. IMO if your kernel requires all shared memory available per multiprocessor, you are almost guaranteed suboptimal performance because you will not have enough threads running -- neither for peak compute, nor to hide global memory access latency. My bet that you will eventually end up limiting shared memory use to a fairly small fraction of it.

I completely agree, this scheme will be efficient only when modest amounts of shared memory are required, for larger memory footprints, a global memory scheme will be used instead.

> Given that impact is limited to explicitly annotated functions only,  this lack of tune-ability is OK with me for now. I'd add a TODO item somewhere to describe that tuning specific limits is WIP.

I'll choose a sensible default for the cut-off point/condition and make it tune-able by the user once we have the global memory scheme in place.

https://reviews.llvm.org/D38978