[Mlir-commits] [mlir] [mlir][gpu] Add address space modifier to Barrier (PR #110527)

Mon Sep 30 11:23:43 PDT 2024

fabianmcg wrote:

> I'd expect these to not wait for all threads to reach the same program point, right?

A fence never makes thread sync guarantees, only memory ordering (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-fence-functions). Which precisely highlights the issue with this change, those are completely separate concepts.

> Would this definition make `gpu.barrier` correspond to HIP/CUDA's `__syncthreads()`? Since I have a suspicion that that was the original semantics of `gpu.barrier`

Yes, `__syncthreads = gpu.barrier`.

> IE ideally, I'd like to have separate ops for waiting for all threads to reach the execution point (say, `gpu.synchronize`) and for specifying fences (say, `gpu.memfence`). Then I'd imagine for `gpu.barrier` to be defined as doing both (with the maximal memory fence scope).

I'm inclined to keep `barrier` as it is and only add `memfence`. The name `barrier`  is usually used by programming models to mean thread barriers, eg. https://www.openmp.org/spec-html/5.0/openmpsu90.html

https://github.com/llvm/llvm-project/pull/110527