[Openmp-commits] [PATCH] D106310: [Libomptarget] Remove volatile from NVPTX work function

Tue Jul 20 09:14:47 PDT 2021

jdoerfert added a comment.

In D106310#2890709 <https://reviews.llvm.org/D106310#2890709>, @JonChesterfield wrote:

> So I think, at present, the 'barrier' call works as a thread fence as far as the compiler is concerned, in that it doesn't move loads across it.

Yes, it does. Potentially impacted accesses are not moved across. Loads and stores of shared memory are impacted by such a fence.

> Given better cross function analysis, would the property of 'any call => fence' hold? I'm not confident it holds today within LLVM though I think it must do in clang.

It would. There is no (non-buggy) analysis that would remove the barriers impact on a shared memory accesses.

> In particular, as far as I can tell C and C++ require synchronisation via atomic variables, not merely fences on non-atomics, and LLVM tends to reflect C++ semantics.

No. As long as there is no race there is no need to do atomics. The barrier synchronizes sufficiently and also acts as a memory fence. There is no problem in C/C++/IR here.

> So I think one thread writing the variable while others read is a data race and we are avoiding being burned by that by conservative optimisations around function calls.

Assuming no synchronization between the reads and write, you are correct. However, no volatile or atomic would actually make this work as it is required to if there was
a race without it. Let's talk about atomics as volatile is way worse. If you do the reads and write atomically there is no race and therefore no UB, great. However, it would
also mean the workers might read the value before it was written, not so great. The only way this scheme works is that we write -> synchronize -> read, and that is what we do.
As such, the reads and write do not need to be atomic (or volatile).

> Changing the loads/stores on this variable to relaxed atomic firmly kills the UB data race, and will interact properly with llvm.fence (and hopefully) the nvptx barrier.

There is no data race nor is there UB. There is also no need for it to "interact properly" with the barrier.

> So I think we should use relaxed load/stores where we currently use volatile variables, instead of relying on clang not miscompiling our data race.

See above.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106310/new/

https://reviews.llvm.org/D106310