[Openmp-dev] nested parallelism in libomptarget-nvptx

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Fri Sep 7 08:29:28 PDT 2018


Hi all,

I've started some cleanups in libomptarget-nvptx, the OpenMP runtime 
implementation on Nvidia GPUs. The ultimate motivation is reducing the 
memory overhead: At the moment the runtime statically allocates ~660MiB 
of global memory. This amount can't be used by applications. This might 
not sound much, but wasting precious memory doesn't sound wise.
I found that a portion of 448MiB come from buffers for data sharing. In 
particular they appear to be so large because the code is prepared to 
handle nested parallelism where every thread would be in the position to 
share data with its nested worker threads.
 From what I've seen so far this doesn't seem to be necessary for Clang 
trunk: Nested parallel regions are serialized, so only the initial 
thread needs to share data with one set of worker threads. That's in 
line with comments saying that there is no support for nested 
parallelism.

However I found that my test applications compiled with clang-ykt 
support two levels of parallelism. My guess would be that this is 
related to "convergent parallelism": parallel.cu explains that this is 
meant for a "team of threads in a warp only". And indeed, each nested 
parallel region seems to be executed by 32 threads.
I'm not really sure how this works because I seem to get one OpenMP 
thread per CUDA thread in the outer parallel region. So where are the 
nested worker threads coming from?

In any case: If my analysis is correct, I'd like to propose adding a 
CMake flag which disables this (seemingly) legacy support [1]. That 
would avoid the memory overhead for users of Clang trunk and enable 
future optimizations (I think).
Thoughts, opinions?

Cheers,
Jonas


1: Provided that IBM still wants to keep the code and we can't just go 
ahead and drop it. I guess that this can happen at some point in time, 
but I'm not sure if we are in that position right now.


More information about the Openmp-dev mailing list