[Openmp-commits] [PATCH] D59424: [OpenMP][NVPTX] Replace void** buffer by byte-wise buffer

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Mar 15 12:48:10 PDT 2019

ABataev added inline comments.

Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73
+/// Note: Only the team master is allowed to call non-const functions!
+struct shared_bytes_buffer {
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > jdoerfert wrote:
> > > > > > What is this buffer used for? Transferring pointers to the shread variables to the parallel regions? If so, it must be handled by the compiler. There are several reasons to do this:
> > > > > > 1) You're using malloc/free functions for large buffers. The fact is that the size of this buffer is known at the compile time and compiler can generate the fixed size buffer in the global memory if required. We already have similar implementation for target regions, globalized variables etc. You can take a look and adapt it for your purpose.
> > > > > > 2) Malloc/free are not very fast on the GPU, so it will get an additional performance with the preallocated buffers.
> > > > > > 3) Another one problem with malloc/free is that they are using preallocated memory and the size of this memory is limited by 8Mb (if I do recall correctly). This memory is required for the correct support of the local variables globalization and we alredy ran into the situation when malloc could not allocate enough memory for it with some previous implementations.
> > > > > > 4) You can reused the shared memory buffers already generated by the compiler and save shared memory.
> > > > > 
> > > > > [Quote by ABataev copied from https://reviews.llvm.org/D59319?id=190767#inline-525900 after the patch was split.]
> > > > > 
> > > > > 
> > > > > This buffer is supposed to be used to communicate variables in shared and firstprivate clauses between threads in a team. In this patch it is simply used to implement the old `void**` buffer. How, when, if we use it is part of the interface implementation. For now, this buffer simply serves the users of the `omptarget_nvptx_globalArgs` global.
> > > > > 
> > > > > If you want to provide compiler allocated memory to avoid the buffer use, no problem,
> > > > > the `__kmpc_target_region_kernel_parallel` function allows to do so, see the `SharedMemPointers` flag. I wouldn't want to put the logic to generate these buffers in the front-end though.
> > > > Why?
> > > Why what?
> > Why you don't want to put the buffers generation to the compiler?
> > Why you don't want to put the buffers generation to the compiler?
> I did never say that. I explicitly explained how the new interface allows you to do exactly that. Maybe you confuse it with me not wanting the generation to be part of the front-end (=Clang). That is because it is an implementation choice for performance, as such it should not be done by Clang if it might obstruct analyses later on, or could be done better with more analysis support.
> If we conclude we actually need to share values, after we tried to eliminate the need for sharing altogether, we can decide if a global buffer is preferable. If so, we can check if
> there is already one that is unused or if we would need to create a new one.
> //Regarding this patch:// It is not supposed to change the behavior at all. This patch just introduces a more general buffer which is then used to implement the old buffer. How/when the buffer is used is not affected.
Only Clang can do better analysis for the shared variables, runtime is not. This part must be implemented in Clang, not in the runtime. I had no time to fix the existing implementation of parameters passing in non-SPMD mode. But if you working on this, you should definetely implement this in the compiler, not in the library.
It does not brea the analysis or anything else. It really produces better code with the better performance and less memory use.
If you're going to implement it, you need to implement it in the best possible way. Who else is going to fix this later, when we ran into the problems with the shared memory and/or `malloc`ed memory?

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list