[PATCH] D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.

Mon Jun 7 20:27:06 PDT 2021

t-tye added a comment.

In D103225#2803733 <https://reviews.llvm.org/D103225#2803733>, @rampitec wrote:

> In D103225#2792240 <https://reviews.llvm.org/D103225#2792240>, @rampitec wrote:
>
>>> In D103225#2785687 <https://reviews.llvm.org/D103225#2785687>, @rampitec wrote:
>>>
>>>> You probably need to wrap all prologue LDS stores into a block to execute it only from lane 0 and add a barrier after. @t-tye correct me if I am wrong.
>>>
>>> But, I remember that we had decided to avoid barrier here, and instead just make sure that each thread within each wave execute the store instructions? In anycase, let me clarify it with @t-tye and @b-sumner.
>>
>> I do not remember, but probably we can omit it since it is a singe store readonly memory. Anyway a confirmation from @t-tye would be nice.
>
> For the record, the agreed way is to do a store from lane 0 of each wave and follow with a wave barrier.

I agree with @rampitec, although we did suggest measuring to confirm that the work-group barrier and a single wave's lane 0 is not faster than multiple waves's lane 0 and a wave barrier.

We also observed that using a wave barrier is UB in the language memory model, although well defined in the AMDGPU hardware memory model. However, the current AMD GPU sync-scope definition implies the language rules and so would be an issue if any future atomic optimizations exploited that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103225/new/

https://reviews.llvm.org/D103225