[Openmp-commits] [PATCH] D65013: [OPENMP][NVPTX]Fix parallel level counter in Cuda 9.0.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Aug 23 08:46:34 PDT 2019
ABataev added a comment.
In D65013#1643020 <https://reviews.llvm.org/D65013#1643020>, @jdoerfert wrote:
> Generally, this seems fine but I was hoping we could say what configuration and test file can be used to reproduce this error.
Generally speaking, we just switched to `__syncwarp` function instead of shfl_sync in this patch (In cuda <= 8 we just don't need to reconverge the threads because of the architecture, in Cuda 9+ there is a special function `__syncwarp` for this). It will just improve performance in Cuda 8 and won't affect Cuda9+ at all.
The problem is in threads divergence within the warp. We use parallel level counter on per-warp basis (because we're very limited in shared memory). Previously, it was even worse, we had parallel level counter on per-block basis.
I think, it would be better to rename the patch. Just like I said, this is not a fix, just a small improvement in the way we reconverge the threads.
Repository:
rOMP OpenMP
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D65013/new/
https://reviews.llvm.org/D65013
More information about the Openmp-commits
mailing list