[PATCH] D59424: [OpenMP][NVPTX] Replace void** buffer by byte-wise buffer

Fri Mar 15 12:08:35 PDT 2019

ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73
+/// Note: Only the team master is allowed to call non-const functions!
+struct shared_bytes_buffer {
+
----------------
jdoerfert wrote:
> > What is this buffer used for? Transferring pointers to the shread variables to the parallel regions? If so, it must be handled by the compiler. There are several reasons to do this:
> > 1) You're using malloc/free functions for large buffers. The fact is that the size of this buffer is known at the compile time and compiler can generate the fixed size buffer in the global memory if required. We already have similar implementation for target regions, globalized variables etc. You can take a look and adapt it for your purpose.
> > 2) Malloc/free are not very fast on the GPU, so it will get an additional performance with the preallocated buffers.
> > 3) Another one problem with malloc/free is that they are using preallocated memory and the size of this memory is limited by 8Mb (if I do recall correctly). This memory is required for the correct support of the local variables globalization and we alredy ran into the situation when malloc could not allocate enough memory for it with some previous implementations.
> > 4) You can reused the shared memory buffers already generated by the compiler and save shared memory.
> 
> [Quote by ABataev copied from https://reviews.llvm.org/D59319?id=190767#inline-525900 after the patch was split.]
> 
> 
> This buffer is supposed to be used to communicate variables in shared and firstprivate clauses between threads in a team. In this patch it is simply used to implement the old `void**` buffer. How, when, if we use it is part of the interface implementation. For now, this buffer simply serves the users of the `omptarget_nvptx_globalArgs` global.
> 
> If you want to provide compiler allocated memory to avoid the buffer use, no problem,
> the `__kmpc_target_region_kernel_parallel` function allows to do so, see the `SharedMemPointers` flag. I wouldn't want to put the logic to generate these buffers in the front-end though.
Why?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59424/new/

https://reviews.llvm.org/D59424