[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.

Thu Jun 13 09:18:28 PDT 2019

tra added a comment.

> @reames , @tra , @Hahnfeld , @jlebar , @chandlerc,  I see that this was discussed in D50391 <https://reviews.llvm.org/D50391> (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.

Do I understand it correctly that the argument is about using C++ `volatile` with the assumption that those will map to ld.volatile/st.volatile, which happen to provide sufficiently strong guarantees to be equivalent of LLVM's `atomic monotonic` operations?

If that's the case, then I would agree that it's an implementation detail one should not be relying on. If atomicity is needed, we should figure out a way to express that in C++.

In practice, the standard C++ library support in cuda is somewhat limited. I don't think atomic<> works, so __volatile__ might be a 'happens to work' short-term workaround. If that's the case, then there should a comment describing what's going on here and TODO to fix it when better support for atomics on GPU is available.

Another option would be to use atomic*() functions provided by CUDA, but those do have limited functionality on older GPUs.

Yet another alternative is to explicitly use ld.volatile/st.volatile from inline assembly. I don't think we have it plumbed as clang builtin.

Repository:
  rOMP OpenMP

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62393/new/

https://reviews.llvm.org/D62393