[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Jun 12 08:59:28 PDT 2019
ABataev added a comment.
In D62393#1539985 <https://reviews.llvm.org/D62393#1539985>, @Hahnfeld wrote:
> Trying to catch up on all the comments, replies to some:
> In D62393#1530780 <https://reviews.llvm.org/D62393#1530780>, @ABataev wrote:
> > In D62393#1530340 <https://reviews.llvm.org/D62393#1530340>, @jdoerfert wrote:
> > > I would have assumed `omp_in_parallel` to return `true` if the parallelLevel is not the default (0 or 1). Why isn't that so?
> > according tk the standard, it should return 1 only if we ar3 in the active parallel region. Active parallel region is the region with number of threaded greater than 1.
> I don't agree with this statement and its implication that we need to track both levels and active-levels as I already argued in D61380 <https://reviews.llvm.org/D61380>: In my understanding of the standard, `omp_in_parallel` should return true iff it's nested somewhere in an active parallel region.
Maybe I did not express myself clear, but this is exactly how it works currently. I agreed with you and implemented it exactly just like you said: 1 is returned if at least 1 parallel region is active.
And we count both, active and inactive regions. We may have only 1 active region, the very first one. If the first region is active, the MSB is set to 1 to mark this situation.
> In D62393#1538255 <https://reviews.llvm.org/D62393#1538255>, @hfinkel wrote:
>> @reames , @tra , @Hahnfeld , @jlebar , @chandlerc, I see that this was discussed in D50391 <https://reviews.llvm.org/D50391> (in terms of using PTX's volatile to provide atomic lowering), but it's not clear to me that using volatile with atomic semantics in LLVM is something we guarantee will work (even in CUDA mode). I imagine that we'd need to lower these in terms of relaxed atomics in Clang for this to really be guaranteed to work correctly.
> That patch was about lowering atomics in LLVM IR to PTX's volatile which is clearly documented to work. This has nothing to do with the volatile modifier in C/C++ which does not get translated into atomics in LLVM IR by default AFAIK (at least not without switches for compatibility with MS). I'm not sure about the semantics in CUDA and it may not be documented at all (as others have noted). I'd probably agree with what @__simt__ wrote about this.
I also agree that clang in Cuda mode should translate it to relaxed atomic, if it is not. Otherwise, it means the compiler incorrectly handles volatile from CUDA point of view.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits