[Openmp-commits] [PATCH] D65112: [OPENMP][NVPTX]Make the test compatible with CUDA9+, NFC.

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Jul 23 03:50:37 PDT 2019

jdoerfert requested changes to this revision.
jdoerfert added a comment.
This revision now requires changes to proceed.

In D65112#1596995 <https://reviews.llvm.org/D65112#1596995>, @ABataev wrote:

> In D65112#1596911 <https://reviews.llvm.org/D65112#1596911>, @Hahnfeld wrote:
> > So if a user does this, the application will also hang?
> Yes, unfortunately. They changed the way they handle threads in warps. Before cuda9 the threads in the warps were implicitly convergent, in cuda 9+ it is not so.
>  Cuda introduced a whole bunch of new, _sync versions of primitives and a __syncwarp function, that require an additional parameter - a mask of active threads in the warp. And all threads with this mask shall execute those primitives, otherwise the result is undefined. It is recommended to define these mask at the start of your code block because, since the threads are not explicitly convergent anymore, you cannot rely on the results of __ballot(1) or __activemask() functions. 
>  We have to hardcoded this mask to 0xffffffff because of this, i.e. the minimal granularity in cuda9+ is the whole warp (for the functionality that require to synchronize threads in the warp, like critical sections, reductions, etc.). Otherwise the behavior is undefined.

I have the feeling this just hides bugs in the runtime. I am also confused why this shows up now. What changed?

  rOMP OpenMP



More information about the Openmp-commits mailing list