[PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface
Johannes Doerfert via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 14 15:03:59 PDT 2019
jdoerfert marked 3 inline comments as done.
jdoerfert added inline comments.
================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100
+///
+EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode,
+ bool RequiresOMPRuntime,
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > ABataev wrote:
> > > > > > > Better to use `ident_loc` for passing info about execution mode and full/lightweight runtime.
> > > > > > Could you please explain why you think that? Adding indirection through a structure does not really seem beneficial to me.
> > > > > Almost all function from libomp rely on `ident_loc`. The functions, which were added for NVPTX without this parameter had a lot of problems later and most of them were replaced with the functions with this parameter type. Plus, this parameter is used for OMPD/OMPT and it may be important for future OMPD/OMPT support.
> > > > > Almost all function from libomp rely on ident_loc.
> > > >
> > > > If you look at the implementation of this interface for NVPTX you will see that the called functions do not take `ident_loc` values. When you create the calls from the existing NVPTX code generation in clang, the current code **does not use** `ident_loc` for similar functions, see:
> > > > `___kmpc_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime)`,
> > > > `__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`,
> > > > `__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime, int16_t RequiresDataSharing)`,
> > > > `__kmpc_kernel_parallel(void **outlined_function, int16_t IsOMPRuntimeInitialized)`,
> > > > ...
> > > >
> > > >
> > > >
> > > > > Plus, this parameter is used for OMPD/OMPT and it may be important for future OMPD/OMPT support.
> > > >
> > > > If we at some point need to make the options permanent in an `ident_loc` we can simply pass an `ident_loc` and require it to be initialized by the call. Cluttering the user code with stores and indirection is exactly what I do want to avoid.
> > > 1. The new functions rely on `ident_loc`. We had to add those new functions because the old ones did not use it and it was bad design decision. Now we need to fix this. I suggest you do everything right from the very beginning rather than fixing this later by adding extra entry points to support OMPT/OMPD or something else, for example.
> > > 2. No, you cannot simply change the interface of the library to keep the compatibility with the previous versions of the compiler/library. You will need to add the new entries.
> > Let's start this one again because I still haven't understood. Why do we need to populate the `ident_loc` again? What information has to be in there at which point? I want this to be clear because a lot of other "design decisions" of the existing code base are in my opinion not necessary and consequently missing here. That includes, for example, various global variables. If we have a description of the problem you try to solve with the `ident_loc` we might be able to find a way that cuts down on state.
> >
> >
> > Regarding the "compatibility", this is not a stable interface people can rely on. Whatever is committed in this first patch __is not__ set in stone. Also, we can _always_ add a `__kmpc_init_ident_loc(....)` function after the fact.
> Ident_loc holds the data about current source code location, execution mode and is full runtime required or not. Also, it is used in OMPT/OMPD support.
> Regarding "compatibility" libraries must be most stable part of the compiler, because the user migbt need to link the old object file/library with the new one. Because of this the new versions of libraries must be compatible with old ones. And you need to maintain the deprecated parts to keep the compatibility with the previous versions. All these libs already have a lot of old code that because of the initial poor design and we need to maintain them. I would like to avoid this situation with this patch.
> Ident_loc holds the data about current source code location, execution mode and is full runtime required or not. Also, it is used in OMPT/OMPD support.
We can store that information through a `__kmpc_init_ident_loc(....)` call once needed.
> Regarding "compatibility" libraries must be most stable part of the compiler, because the user migbt need to link the old object file/library with the new one. Because of this the new versions of libraries must be compatible with old ones. And you need to maintain the deprecated parts to keep the compatibility with the previous versions. All these libs already have a lot of old code that because of the initial poor design and we need to maintain them. I would like to avoid this situation with this patch.
The way I understand you now is that you want a way to extend the interface in the future and adding a changeable `ident_loc` pointer is your proposed way. Do I understand your reaonsing for `ident_loc` here correctly or is it (this and) something else?
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+////////////////////////////////////////////////////////////////////////////////
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
+
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > It would be good to store it the global memory rather than in the shared to save th shared memory. Also, we already are using several shared memory buffers for different purposes, it would be good to merge them somehow to reduce pressure on shared memory.
> > > > I would have reused your buffer but it is for reasons unclear to me, not a byte-wise buffer but an array of `void *` and also used as such. Using it as a byte-wise buffer might cause problems or at least confusion. Changing it to a byte-wise buffer would be fine with me. I don't need a separate buffer but just one with the functionality implemented in this one.
> > > I don't know what `my` buffer are talking about. I'm just saying that we already using a lot of shared memory and adding another one shared memory buffer of ~150 bytes per team increases pressure on the shared memory. It would be good to reuse the existing buffers somehow. It was just a suggestion.
> > > I don't know what my buffer are talking about.
> >
> > Sorry, my bad. The one you see in the (last part of the) implementation below in the beginning of the shown lines of `openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h`. It is called `omptarget_nvptx_SharedArgs` and it does (a subset of) what this new buffer does, providing space for shared variables in parallel regions.
> >
> > > I'm just saying that we already using a lot of shared memory and adding another one shared memory buffer of ~150 bytes per team increases pressure on the shared memory. It would be good to reuse the existing buffers somehow. It was just a suggestion.
> >
> > I understand and I agree. My comment explained why I didn't do that in the first place, hoping that you see the problem and agree we should rewrite the users of `omptarget_nvptx_SharedArgs` to use `target_region_shared_buffer`[1], thereby reducing the required shared memory.
> >
> > [1] The name is subject to change! I don't care much.
> >
> This is not `my` buffer. Unfortunately, I did not work on this library since the very beginning. There are some other buffers, generated by the compiler, for example, and we can try to reuse them.
> This is not my buffer.
My "you" was not directed at you but a general one. The wording was bad, my apologies.
> There are some other buffers, generated by the compiler, for example, and we can try to reuse them.
I'm not 100% sure which buffers you refer to here but I think that are the ones the new code generation does not emit anymore.
I'm all for merging/replacing multiple buffers implemented in the device RTL, I didn't do it because it breaks compatibility or it forces me to inherit design choices I dislike (the void** buffer). From my perspective we could get rid of the existing `omptarget_nvptx_SharedArgs` space by letting it use the `target_region_shared_buffer` internally. That solves the problem for now and once `omptarget_nvptx_SharedArgs` isn't directly needed anymore it is removed.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59319/new/
https://reviews.llvm.org/D59319
More information about the llvm-commits
mailing list