[Openmp-commits] [PATCH] D65112: [OPENMP][NVPTX]Make the test compatible with CUDA9+, NFC.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Jul 23 04:04:05 PDT 2019
ABataev added a comment.
In D65112#1597111 <https://reviews.llvm.org/D65112#1597111>, @jdoerfert wrote:
> In D65112#1596995 <https://reviews.llvm.org/D65112#1596995>, @ABataev wrote:
> > In D65112#1596911 <https://reviews.llvm.org/D65112#1596911>, @Hahnfeld wrote:
> > > So if a user does this, the application will also hang?
> > Yes, unfortunately. They changed the way they handle threads in warps. Before cuda9 the threads in the warps were implicitly convergent, in cuda 9+ it is not so.
> > Cuda introduced a whole bunch of new, _sync versions of primitives and a __syncwarp function, that require an additional parameter - a mask of active threads in the warp. And all threads with this mask shall execute those primitives, otherwise the result is undefined. It is recommended to define these mask at the start of your code block because, since the threads are not explicitly convergent anymore, you cannot rely on the results of __ballot(1) or __activemask() functions.
> > We have to hardcoded this mask to 0xffffffff because of this, i.e. the minimal granularity in cuda9+ is the whole warp (for the functionality that require to synchronize threads in the warp, like critical sections, reductions, etc.). Otherwise the behavior is undefined.
> I have the feeling this just hides bugs in the runtime. I am also confused why this shows up now. What changed?
Read the docs, please, before saying anything like that.
Nothing has changed, it did not work in Cuda9+ since the very beginning.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits