[PATCH] D146829: [AMDGPU] Remove unnecessary waitcnts

Tue Mar 28 14:49:44 PDT 2023

t-tye added a comment.

In D146829#4227224 <https://reviews.llvm.org/D146829#4227224>, @OutOfCache wrote:

> In D146829#4224849 <https://reviews.llvm.org/D146829#4224849>, @t-tye wrote:
>
>> The waitcnt's serve two purposes. They notify that the result of the operation is available to the thread that requested it, and they ensure that the effect of the operation is visible to other threads before this thread continues to do other operations. This latter purpose is used to ensure the happens-before relationship in the memory model. So for example, if a VMEM release atomic is done at workgroup scope, should these operations be visible to other threads before the result that is store-released onto VMEM?
>>
>> If these operations go down the LDS queues (even if they are not performed in the LDS itself), then there are 2 queues for the waves of a workgroup, but a single L1 <https://reviews.llvm.org/L1> shared by all waves of a workgroup for VMEM. So to ensure visibility to all waves in the workgroup the LDS operation must be waited to complete before starting the VMEM operation if there needs to be a happens-before relation. That waiting is achieved by the waitcnt on LGKM before executing the VMEM instruction.
>
> Thank you for taking the time to explain! If I understand correctly, the waitcnt does not only notify the current lane that the result is available, but also the other lanes within the same workgroup. So without the waitcnt, there is a possibility that the other lanes see the result of the VMEM instruction first?

waitcnt causes the thread to stall until the previous operations that change the counter are completed. This can be used to ensure a result has been returned to the thread. It can also be used to delay executing following instructions until the previous operations have completed and so are globally visible. For the memory model it is the latter that it gets used for. I the thread ensures that the LDS/... operation are complete before executing a VMEM operation, it ensures all waves will see the updates to LDS and VMEM in the same order which is a requirement for seq_cst and release memory orders.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146829/new/

https://reviews.llvm.org/D146829