[PATCH] D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.

Mahesha S via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 9 08:05:58 PDT 2021


hsmhsm added a comment.

In D103225#2807161 <https://reviews.llvm.org/D103225#2807161>, @hsmhsm wrote:

> In D103225#2805419 <https://reviews.llvm.org/D103225#2805419>, @b-sumner wrote:
>
>> Two approaches for limiting the stores to lane 0 of each wave:
>>
>> 1. Write 1 to exec mask, store, and write -1 to exec mask.  This works since the exec mask at the start of the wave when this happens is -1
>> 2. Check for lane == 0 and branch.  The lane can be computed by a) wave64: __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)) b) wave32: __builtin_amdgcn_mbcnt_lo(~0u, 0u)
>
> OK, let's see, which one is more feasible from the implementation point of view.

Implemented approach(2). Here we actually do not need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)).  Irrespective of the wave64 or wave32, _builtin_amdgcn_mbcnt_lo(~0u, 0u) is enough. The reason is - we only want to identify lane 0. On the other hand, for wave64, if we wanted to identify any lane greater than 31, then we would need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103225/new/

https://reviews.llvm.org/D103225



More information about the llvm-commits mailing list