[PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface
Alexey Bataev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Mar 15 10:40:34 PDT 2019
ABataev added inline comments.
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:104
+EXTERN int8_t __kmpc_target_region_kernel_init(ident_t *Ident, bool UseSPMDMode,
+ bool RequiresOMPRuntime,
If you're using `ident_t` `UseSPMDMode` and `RequiresOMPRuntime` parameters are not needed anymore. They are passed in `ident_t` structure.
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
What is this buffer used for? Transferring pointers to the shread variables to the parallel regions? If so, it must be handled by the compiler. There are several reasons to do this:
1. You're using malloc/free functions for large buffers. The fact is that the size of this buffer is known at the compile time and compiler can generate the fixed size buffer in the global memory if required. We already have similar implementation for target regions, globalized variables etc. You can take a look and adapt it for your purpose.
2. Malloc/free are not very fast on the GPU, so it will get an additional performance with the preallocated buffers.
3. Another one problem with malloc/free is that they are using preallocated memory and the size of this memory is limited by 8Mb (if I do recall correctly). This memory is required for the correct support of the local variables globalization and we alredy ran into the situation when malloc could not allocate enough memory for it with some previous implementations.
4. You can reused the shared memory buffers already generated by the compiler and save shared memory.
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
More information about the cfe-commits