[Openmp-commits] [PATCH] D65112: [OPENMP][NVPTX]Make the test compatible with CUDA9+, NFC.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Jul 23 02:04:36 PDT 2019
ABataev added a comment.
In D65112#1596911 <https://reviews.llvm.org/D65112#1596911>, @Hahnfeld wrote:
> So if a user does this, the application will also hang?
Yes, unfortunately. They changed the way they handle threads in warps. Before cuda9 the threads in the warps were implicitly convergent, in cuda 9+ it is not so.
Cuda introduced a whole bunch of new, _sync versions of primitives and a __syncwarp function, that require an additional parameter - a mask of active threads in the warp. And all threads with this mask shall execute those primitives, otherwise the result is undefined. It is recommended to define these mask at the start of your code block because, since the threads are not explicitly convergent anymore, you cannot rely on the results of __ballot(1) or __activemask() functions.
We have to hardcoded this mask to 0xffffffff because of this, i.e. the minimal granularity in cuda9+ is the whole warp (for the functionality that require to synchronize threads in the warp, like critical sections, reductions, etc.). Otherwise the behavior is undefined.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits