[Openmp-commits] [PATCH] D65013: [OPENMP][NVPTX]Fix parallel level counter in Cuda 9.0.

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri Aug 23 08:46:34 PDT 2019


ABataev added a comment.

In D65013#1643020 <https://reviews.llvm.org/D65013#1643020>, @jdoerfert wrote:

> Generally, this seems fine but I was hoping we could say what configuration and test file can be used to reproduce this error.


Generally speaking, we just switched to `__syncwarp` function instead of shfl_sync in this patch (In cuda <= 8 we just don't need to reconverge the threads because of the architecture, in Cuda 9+ there is a special function `__syncwarp` for this). It will just improve performance in Cuda 8 and won't affect Cuda9+ at all.
The problem is in threads divergence within the warp. We use parallel level counter on per-warp basis (because we're very limited in shared memory). Previously, it was even worse, we had parallel level counter on per-block basis.

I think, it would be better to rename the patch. Just like I said, this is not a fix, just a small improvement in the way we reconverge the threads.


Repository:
  rOMP OpenMP

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65013/new/

https://reviews.llvm.org/D65013





More information about the Openmp-commits mailing list