[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Jun 5 04:37:47 PDT 2019
ABataev added a comment.
In D62393#1530340 <https://reviews.llvm.org/D62393#1530340>, @jdoerfert wrote:
> > It marks if the parallel region has more than 1 thread. Only the very 1 parallel region may have >1 threads. Required to correctly implement omp_in_parallel function, at least.
> I would have assumed `omp_in_parallel` to return `true` if the parallelLevel is not the default (0 or 1). Why isn't that so?
according tk the standard, it should return 1 only if we ar3 in the active parallel region. Active parallel region is the region with number of threaded greater than 1.
>> Because it is per warp, it is required to be volatile, because being combined with the atomic operations + fully inlined runtime functions, the memory ordering is not preserved, and thus it leads to undefined behavior in full runtime mode, which uses atomic operations.
> The accesses to the parallelLevel array are non-atomic, it is not surprising that they can behave "unexpected" when atomic accesses are present. Why don't we access parallelLevel atomically? Why don't we synchronize the warp after the parallel level is changed? If we want to communicate the updated value to all thread it seems natural to do that explicitly and not to rely on `volatile`.
According to PTX ISA, volatile modifier has the same semantics as relaxed modifier and thus the load/store operations become morally strong. Plus, volatile synchronizes memory accesses on each read/write operations forthe variable.
The extra synchronization is not very good, because it slows down the execution of the whole program.
CHANGES SINCE LAST ACTION
More information about the Openmp-commits