[PATCH] D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
Mahesha S via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 9 08:05:58 PDT 2021
hsmhsm added a comment.
In D103225#2807161 <https://reviews.llvm.org/D103225#2807161>, @hsmhsm wrote:
> In D103225#2805419 <https://reviews.llvm.org/D103225#2805419>, @b-sumner wrote:
>
>> Two approaches for limiting the stores to lane 0 of each wave:
>>
>> 1. Write 1 to exec mask, store, and write -1 to exec mask. This works since the exec mask at the start of the wave when this happens is -1
>> 2. Check for lane == 0 and branch. The lane can be computed by a) wave64: __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)) b) wave32: __builtin_amdgcn_mbcnt_lo(~0u, 0u)
>
> OK, let's see, which one is more feasible from the implementation point of view.
Implemented approach(2). Here we actually do not need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)). Irrespective of the wave64 or wave32, _builtin_amdgcn_mbcnt_lo(~0u, 0u) is enough. The reason is - we only want to identify lane 0. On the other hand, for wave64, if we wanted to identify any lane greater than 31, then we would need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103225/new/
https://reviews.llvm.org/D103225
More information about the llvm-commits
mailing list