[Openmp-commits] [PATCH] D60918: [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Apr 22 12:23:51 PDT 2019

ABataev added a comment.

In D60918#1473401 <https://reviews.llvm.org/D60918#1473401>, @Hahnfeld wrote:

> Why is it enough to have one counter per warp, what happens if threads within a warp diverge? Before D55773 <https://reviews.llvm.org/D55773> we had a counter per thread...

Counter per thread significantly affects the performance. It requires the class allocated in the global memory with the dynamically controlled queue to handle this object in the global memory. Doru removed this at cost of some functionality. At the moment, this solution does not work effectively when we have even warp divergence, to handle correctly L2 parallelism you must compile your code with full runtime support. This patch fixes this problem. It supports the divergence between the warps. Plus, it fixes the problem with the threads divergence within the warp. As far as the divergent threads are serialized, it is ok to have a parallelism level counter on per-warp basis. The test demonstrates the threads divergence and produces correct result in SPMD without runtime mode.
We can use per-thread counter, but it requires 1K of shared memory. This solution allows to save the shared memory and use only 32 bytes per block.

  rOMP OpenMP



More information about the Openmp-commits mailing list