[Openmp-commits] [PATCH] D59424: [OpenMP][NVPTX] Replace void** buffer by byte-wise buffer
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Mar 15 13:02:39 PDT 2019
jdoerfert marked an inline comment as done.
jdoerfert added inline comments.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73
+/// Note: Only the team master is allowed to call non-const functions!
+struct shared_bytes_buffer {
+
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > > What is this buffer used for? Transferring pointers to the shread variables to the parallel regions? If so, it must be handled by the compiler. There are several reasons to do this:
> > > > > > > 1) You're using malloc/free functions for large buffers. The fact is that the size of this buffer is known at the compile time and compiler can generate the fixed size buffer in the global memory if required. We already have similar implementation for target regions, globalized variables etc. You can take a look and adapt it for your purpose.
> > > > > > > 2) Malloc/free are not very fast on the GPU, so it will get an additional performance with the preallocated buffers.
> > > > > > > 3) Another one problem with malloc/free is that they are using preallocated memory and the size of this memory is limited by 8Mb (if I do recall correctly). This memory is required for the correct support of the local variables globalization and we alredy ran into the situation when malloc could not allocate enough memory for it with some previous implementations.
> > > > > > > 4) You can reused the shared memory buffers already generated by the compiler and save shared memory.
> > > > > >
> > > > > > [Quote by ABataev copied from https://reviews.llvm.org/D59319?id=190767#inline-525900 after the patch was split.]
> > > > > >
> > > > > >
> > > > > > This buffer is supposed to be used to communicate variables in shared and firstprivate clauses between threads in a team. In this patch it is simply used to implement the old `void**` buffer. How, when, if we use it is part of the interface implementation. For now, this buffer simply serves the users of the `omptarget_nvptx_globalArgs` global.
> > > > > >
> > > > > > If you want to provide compiler allocated memory to avoid the buffer use, no problem,
> > > > > > the `__kmpc_target_region_kernel_parallel` function allows to do so, see the `SharedMemPointers` flag. I wouldn't want to put the logic to generate these buffers in the front-end though.
> > > > > Why?
> > > > Why what?
> > > Why you don't want to put the buffers generation to the compiler?
> > > Why you don't want to put the buffers generation to the compiler?
> >
> > I did never say that. I explicitly explained how the new interface allows you to do exactly that. Maybe you confuse it with me not wanting the generation to be part of the front-end (=Clang). That is because it is an implementation choice for performance, as such it should not be done by Clang if it might obstruct analyses later on, or could be done better with more analysis support.
> >
> > If we conclude we actually need to share values, after we tried to eliminate the need for sharing altogether, we can decide if a global buffer is preferable. If so, we can check if
> > there is already one that is unused or if we would need to create a new one.
> >
> > //Regarding this patch:// It is not supposed to change the behavior at all. This patch just introduces a more general buffer which is then used to implement the old buffer. How/when the buffer is used is not affected.
> Only Clang can do better analysis for the shared variables, runtime is not. This part must be implemented in Clang, not in the runtime. I had no time to fix the existing implementation of parameters passing in non-SPMD mode. But if you working on this, you should definetely implement this in the compiler, not in the library.
> It does not brea the analysis or anything else. It really produces better code with the better performance and less memory use.
> If you're going to implement it, you need to implement it in the best possible way. Who else is going to fix this later, when we ran into the problems with the shared memory and/or `malloc`ed memory?
> Only Clang can do better analysis for the shared variables, runtime is not. This part must be implemented in Clang, not in the runtime. I had no time to fix the existing implementation of parameters passing in non-SPMD mode. But if you working on this, you should definetely implement this in the compiler, not in the library.
Clang is not the right place for analysis. //That is what the patch set is all about.//
I **do not** propose to only use the library implementation to handle sharing.
The initially generated code will utilize a library based solution that potentially calls malloc and memcpy. However, the later LLVM passes can change the behavior by providing a buffer that is accessible by all threads in the team, eliminating the need for malloc and memcpy.
This is a *generalization* of the scheme you currently employ, eliminating the need for mallocs completely (which is not the case right now).
> It does not brea the analysis or anything else. It really produces better code with the better performance and less memory use.
Adding (global) state makes analysis results rarely better but almost always worse. If you do not believe me, please feel free to ask a compiler developer that you trust.
> If you're going to implement it, you need to implement it in the best possible way.
I totally agree.
> Who else is going to fix this later, when we ran into the problems with the shared memory and/or malloced memory?
Please read my comments again. I explain multiple times that:
A) This patch has nothing to do with your problem.
B) How the proposed TRegion interface tackles your problem.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59424/new/
https://reviews.llvm.org/D59424
More information about the Openmp-commits
mailing list