[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.

Alexey Bataev via Openmp-commits openmp-commits at lists.llvm.org
Thu Jun 6 16:30:19 PDT 2019



Best regards,
Alexey Bataev

> 6 июня 2019 г., в 19:18, Johannes Doerfert via Phabricator <reviews at reviews.llvm.org> написал(а):
> 
> jdoerfert added a comment.
> 
> In D62393#1530780 <https://reviews.llvm.org/D62393#1530780>, @ABataev wrote:
> 
>> In D62393#1530340 <https://reviews.llvm.org/D62393#1530340>, @jdoerfert wrote:
>> 
>>>> It marks if the parallel region has more than 1 thread. Only the very 1 parallel region may have >1 threads. Required to correctly implement omp_in_parallel function, at least.
>>> 
>>> I would have assumed `omp_in_parallel` to return `true` if the parallelLevel is not the default (0 or 1). Why isn't that so?
>> 
>> 
>> according tk the standard, it should return 1 only if we ar3 in the active parallel region. Active parallel region is the region with number of threaded greater than 1.
> 
> 
> The standard has "levels" and "active-levels". Both need to be tracked but it seems we only have a single array?
> 

They all are tracked on this array.


>>> 
>>> 
>>>> Because it is per warp, it is required to be volatile, because being combined with the atomic operations + fully inlined runtime functions, the memory ordering is not preserved, and thus it leads to undefined behavior in full runtime mode, which uses atomic operations.
>>> 
>>> The accesses to the parallelLevel array are non-atomic, it is not surprising that they can behave "unexpected" when atomic accesses are present. Why don't we access parallelLevel atomically? Why don't we synchronize the warp after the parallel level is changed? If we want to communicate the updated value to all thread it seems natural to do that explicitly and not to rely on `volatile`.
>> 
>> According to PTX ISA, volatile modifier has the same semantics as relaxed modifier and thus the load/store operations become morally strong. Plus, volatile synchronizes memory accesses on each read/write operations forthe variable.
>> The extra synchronization is not very good, because it slows down the execution of the whole program.
> 
> I don't get it. You say volatile makes each operation synchronize but you also say synchronization is not very good. I'm also still not following what race condition you try to prevent with volatile in the first place, could you elaborate which accesses are racing without it?
> 
> 

We don't need to synchronize the whole program, we just need to synchronize accesses to the parallelLevel array. In SPMD mode with full runtime (or SPMD mode eith atomic ops) the ptx compiler generates unsafe code for parallelLevel array without being marked as volatile (i.e. simple atomic-like construct in CUDa ).

> Repository:
>  rOMP OpenMP
> 
> CHANGES SINCE LAST ACTION
>  https://reviews.llvm.org/D62393/new/
> 
> https://reviews.llvm.org/D62393
> 
> 
> 


More information about the Openmp-commits mailing list