[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jun 13 09:33:50 PDT 2019
ABataev added a comment.
In D62393#1541969 <https://reviews.llvm.org/D62393#1541969>, @tra wrote:
> > @reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 <https://reviews.llvm.org/D50391> (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.
> Do I understand it correctly that the argument is about using C++ `volatile` with the assumption that those will map to ld.volatile/st.volatile, which happen to provide sufficiently strong guarantees to be equivalent of LLVM's `atomic monotonic` operations?
> If that's the case, then I would agree that it's an implementation detail one should not be relying on. If atomicity is needed, we should figure out a way to express that in C++.
> In practice, the standard C++ library support in cuda is somewhat limited. I don't think atomic<> works, so __volatile__ might be a 'happens to work' short-term workaround. If that's the case, then there should a comment describing what's going on here and TODO to fix it when better support for atomics on GPU is available.
> Another option would be to use atomic*() functions provided by CUDA, but those do have limited functionality on older GPUs.
> Yet another alternative is to explicitly use ld.volatile/st.volatile from inline assembly. I don't think we have it plumbed as clang builtin.
Artem, thanks for your comment. But we need not only atomicity, bht also we need to command the ptxas compiler to not optimize accesses to parallelLevel. According to several comments, (like this one https://stackoverflow.com/a/1533115) it is recommended to use volatile modifier in Cuda in such cases. If clang does not provide the required level of support for Cuda volatile, I think this is an incompatibility with nvcc.
Also, I already thought about using PTX instructions directly. Probably, you're right that it would be better to use them if you're sure that there is a difference between nvcc and clang.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits