[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Jun 6 16:21:46 PDT 2019
jdoerfert added a comment.
In D62393#1530780 <https://reviews.llvm.org/D62393#1530780>, @ABataev wrote:
> In D62393#1530340 <https://reviews.llvm.org/D62393#1530340>, @jdoerfert wrote:
> > > It marks if the parallel region has more than 1 thread. Only the very 1 parallel region may have >1 threads. Required to correctly implement omp_in_parallel function, at least.
> > I would have assumed `omp_in_parallel` to return `true` if the parallelLevel is not the default (0 or 1). Why isn't that so?
> according tk the standard, it should return 1 only if we ar3 in the active parallel region. Active parallel region is the region with number of threaded greater than 1.
The standard has "levels" and "active-levels". Both need to be tracked but it seems we only have a single array?
>>> Because it is per warp, it is required to be volatile, because being combined with the atomic operations + fully inlined runtime functions, the memory ordering is not preserved, and thus it leads to undefined behavior in full runtime mode, which uses atomic operations.
>> The accesses to the parallelLevel array are non-atomic, it is not surprising that they can behave "unexpected" when atomic accesses are present. Why don't we access parallelLevel atomically? Why don't we synchronize the warp after the parallel level is changed? If we want to communicate the updated value to all thread it seems natural to do that explicitly and not to rely on `volatile`.
> According to PTX ISA, volatile modifier has the same semantics as relaxed modifier and thus the load/store operations become morally strong. Plus, volatile synchronizes memory accesses on each read/write operations forthe variable.
> The extra synchronization is not very good, because it slows down the execution of the whole program.
I don't get it. You say volatile makes each operation synchronize but you also say synchronization is not very good. I'm also still not following what race condition you try to prevent with volatile in the first place, could you elaborate which accesses are racing without it?
CHANGES SINCE LAST ACTION
More information about the Openmp-commits