[Openmp-dev] OpenMP GPU shared memory

Johannes Doerfert via Openmp-dev openmp-dev at lists.llvm.org
Sat Apr 18 16:11:07 PDT 2020


Hi Xin,


I think what you found is some runtime code that lives in shared memory. 
This is not to be confused with user data put into shared memory.

To do the latter, you can use the allocate directive, e.g.,


int Global[32];

#pragma omp allocate(Global) allocator(omp_pteam_mem_alloc)


Wrt. to the feedback I don't think there is anything in place. You could 
use nvprof if you run it maybe. However, I agree we should have a

flag that provides better information.


I hope this helps.


Cheers,

   Johannes




On 4/18/20 5:37 AM, ichbinwu via Openmp-dev wrote:
> hello everybody,
>
> I have a question about GPU shared memory in the OpenMP implementation 
> in LLVM.
>
> In the paper by Grinberg, Bertolli, and Haque (Hands on with OpenMP 
> 4.5 and Unified Memory: Developing Applications for IBM's Hybrid CPU + 
> GPU systems (Part II), IWOMP 2017) I found "3. Clang's Extension for 
> OpenMP 4.5 for device On-chip Memory Allocation" and learnt that the 
> GPU shared memory can be used in a tricky manner with OpenMP 
> directives. In order to find the compiler limit for this static memory 
> allocation I looked at the source code files under `openmp`. It seems 
> the relevant files are:
>
> 1. openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
>     * commit: 197b7b24
>     * line: DS_Slot_Size = 256,
>
> 2. openmp/libomptarget/deviceRTLs/common/omptarget.h
>     * commit: d0b9ed5c
>     * line: char Data[DS_Slot_Size];
>
> My questions are:
>
> 1. Is the hard-coded limit for GPU shared memory 256 Bytes or (256 * 
> 4) Bytes? Because I see the comment in 
> `openmp/libomptarget/deviceRTLs/common/omptarget.h`
>
> // Additional master slot type which is initialized with the default 
> master slot
> // size of 4 bytes.
>
> 2. Could we enlarge this limit to, e.g. 512 Bytes or even 1024 Bytes? 
> Concerning the hardware specification of green GPUs, if we assume the 
> shared memory per multiprocessor is 48 KB and at most 32 thread blocks 
> (or contention groups) reside on one multiprocessor, this limit can be 
> as large as 1536 Bytes, isn't it?
>
> 3. How could we check/verify that the static memory allocation is on 
> GPU shared memory (not on global memory), when an OpenMP source file 
> is compiled by Clang/LLVM? My current approach is to look at the 
> generated assembly code (`-S`), which is not really convenient. It 
> would be good, if the compiler can print some message or give a short 
> report during compilation.
>
> Thank you in advance!
>
> Best wishes!
>
> Xin
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200418/f27522f7/attachment.html>


More information about the Openmp-dev mailing list