[Openmp-commits] [PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface

Thu Mar 14 13:09:00 PDT 2019

ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:27
+
+/// The target region _kernel_ interface for GPUs
+///
----------------
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > All exported functions are declared in the `interface.h` file. I don't think we need an extra interface file here
> > > `interface.h`, or to be more precise for people that do not know, `deviceRTLs/nvptx/src/interface.h`, is nvptx specific. This file, `deviceRTLs/common/target_region.h`, is by design target agnostic and not placed _under_ the nvptx subfolder. If you are willing to move `interface.h` into a common space and remove the nvptx specific functions we can merge the two. Otherwise, I have strong reservations agains that and good reason not to do it.
> > I see that currently it is written in Cuda. It means, it targets NVidia GPUs, at least at the moment. I'm fine to put this header file into the common directory, if you're sure that this is really target agnostic. But maybe just for a start we should put it to NVPTX directory? Later, when you or somebody else will add support for other GPUs and he/she will find out that these functions are really target agnostic, they can be moved into the common directory?
> > I see that currently it is written in Cuda. It means, it targets NVidia GPUs, at least at the moment
> 
> How do you see that? (I hope we both talk about this file, correct?)
> 
> 
> > But maybe just for a start we should put it to NVPTX directory?
> 
> Why? What is the benefit? If we want it to be agnostic, regardless of the current state, it should be developed _outside_ of the target specific directories.
> 
I'm not talking about this particular file, just like I said we can put it into `common` subdirectory. I'm talking about the implementation files. They all are written in Cuda, no?
But it is not proved yet that this solution is target agnostic. Did you test it for AMD?

================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100
+///
+EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode,
+                                               bool RequiresOMPRuntime,
----------------
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > Better to use `ident_loc` for passing info about execution mode and full/lightweight runtime.
> > > Could you please explain why you think that? Adding indirection through a structure does not really seem beneficial to me.
> > Almost all function from libomp rely on `ident_loc`. The functions, which were added for NVPTX without this parameter had a lot of problems later and most of them were replaced with the functions with this parameter type. Plus, this parameter is used for OMPD/OMPT and it may be important for future OMPD/OMPT support.
> > Almost all function from libomp rely on ident_loc.
> 
> If you look at the implementation of this interface for NVPTX you will see that the called functions do not take `ident_loc` values. When you create the calls from the existing NVPTX code generation in clang, the current code **does not use** `ident_loc` for similar functions, see:
> `___kmpc_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime)`,
> `__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`,
> `__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime, int16_t RequiresDataSharing)`,
> `__kmpc_kernel_parallel(void **outlined_function, int16_t IsOMPRuntimeInitialized)`,
> ...
> 
> 
> 
> > Plus, this parameter is used for OMPD/OMPT and it may be important for future OMPD/OMPT support.
> 
> If we at some point need to make the options permanent in an `ident_loc` we can simply pass an `ident_loc` and require it to be initialized by the call. Cluttering the user code with stores and indirection is exactly what I do want to avoid.
1. The new functions rely on `ident_loc`. We had to add those new functions because the old ones did not use it and it was bad design decision. Now we need to fix this. I suggest you do everything right from the very beginning rather than fixing this later by adding extra entry points to support OMPT/OMPD or something else, for example.
2. No, you cannot simply change the interface of the library to keep the compatibility with the previous versions of the compiler/library. You will need to add the new entries.  

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+////////////////////////////////////////////////////////////////////////////////
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
+
----------------
jdoerfert wrote:
> ABataev wrote:
> > It would be good to store it the global memory rather than in the shared to save th shared memory. Also, we already are using several shared memory buffers for different purposes, it would be good to merge them somehow to reduce pressure on shared memory.
> I would have reused your buffer but it is for reasons unclear to me, not a byte-wise buffer but an array of `void *` and also used as such. Using it as a byte-wise buffer might cause problems or at least confusion. Changing it to a byte-wise buffer would be fine with me. I don't need a separate buffer but just one with the functionality implemented in this one.
I don't know what `my` buffer are talking about. I'm just saying that we already using a lot of shared memory and adding another one shared memory buffer of ~150 bytes per team increases pressure on the shared memory. It would be good to reuse the existing buffers somehow. It was just a suggestion.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:64
+
+/// Filter threads into masters and workers. If \p UseStateMachine is true,
+/// required workers will enter a state machine through and be trapped there.
----------------
jdoerfert wrote:
> ABataev wrote:
> > What is the criteria for `UseStateMachine`? Under what conditions it can be set to `true` and `false`? Also, what if have several parallel regions in non-SPMD kernel and `UseStateMachine` is `true`?
> > What is the criteria for UseStateMachine? Under what conditions it can be set to true and false? 
> 
> `UseStateMachine` is an option exposed to the outer world through the `__kmpc_target_region_kernel_init` call. The semantics are explained here and in the declaration of `__kmpc_target_region_kernel_init`.
> 
> > Also, what if have several parallel regions in non-SPMD kernel and UseStateMachine is true?
> 
> I don't see the problem, I expect all kernels having threads in their own state machine and no interference between them. That is at least what should happen. Maybe I miss something. Do you see a problem?
> 
1. I see its semantics, I'm asking when it must be set to `true` and when to `false`. Maybe I missed something, but currently, it is always set to `true` in the compiler patch. Do you really need it?
2. What if you have a single kernel with several consecutive parallel regions? Can you handle this?

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:166
+      // Copy the shared and private variables into shared memory.
+      char *SVMemory = __kmpc_target_region_kernel_get_shared_memory();
+      char *PVMemory = __kmpc_target_region_kernel_get_private_memory();
----------------
Use `void *` also, better to keep the same coding style across the whole library

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59319/new/

https://reviews.llvm.org/D59319