[PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface
Alexey Bataev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Mar 14 08:32:28 PDT 2019
ABataev added inline comments.
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:27
+/// The target region _kernel_ interface for GPUs
> ABataev wrote:
> > All exported functions are declared in the `interface.h` file. I don't think we need an extra interface file here
> `interface.h`, or to be more precise for people that do not know, `deviceRTLs/nvptx/src/interface.h`, is nvptx specific. This file, `deviceRTLs/common/target_region.h`, is by design target agnostic and not placed _under_ the nvptx subfolder. If you are willing to move `interface.h` into a common space and remove the nvptx specific functions we can merge the two. Otherwise, I have strong reservations agains that and good reason not to do it.
I see that currently it is written in Cuda. It means, it targets NVidia GPUs, at least at the moment. I'm fine to put this header file into the common directory, if you're sure that this is really target agnostic. But maybe just for a start we should put it to NVPTX directory? Later, when you or somebody else will add support for other GPUs and he/she will find out that these functions are really target agnostic, they can be moved into the common directory?
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100
+EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode,
+ bool RequiresOMPRuntime,
> ABataev wrote:
> > Better to use `ident_loc` for passing info about execution mode and full/lightweight runtime.
> Could you please explain why you think that? Adding indirection through a structure does not really seem beneficial to me.
Almost all function from libomp rely on `ident_loc`. The functions, which were added for NVPTX without this parameter had a lot of problems later and most of them were replaced with the functions with this parameter type. Plus, this parameter is used for OMPD/OMPT and it may be important for future OMPD/OMPT support.
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:124
+/// unpacking code.
+typedef void (*ParallelWorkFnTy)(char * /* SharedValues */,
+ char * /* PrivateValues */);
We used `void *` for buffers usually, I think it is better to use `void *` here too instead of `char *`.
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
It would be good to store it the global memory rather than in the shared to save th shared memory. Also, we already are using several shared memory buffers for different purposes, it would be good to merge them somehow to reduce pressure on shared memory.
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:64
+/// Filter threads into masters and workers. If \p UseStateMachine is true,
+/// required workers will enter a state machine through and be trapped there.
What is the criteria for `UseStateMachine`? Under what conditions it can be set to `true` and `false`? Also, what if have several parallel regions in non-SPMD kernel and `UseStateMachine` is `true`?
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the cfe-commits