[PATCH] D103225: [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 9 11:55:16 PDT 2021
rampitec added a comment.
In D103225#2808671 <https://reviews.llvm.org/D103225#2808671>, @rampitec wrote:
> In D103225#2808121 <https://reviews.llvm.org/D103225#2808121>, @hsmhsm wrote:
>
>> Implemented approach(2). Here we actually do not need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)). Irrespective of the wave64 or wave32, _builtin_amdgcn_mbcnt_lo(~0u, 0u) is enough. The reason is - we only want to identify lane 0. On the other hand, for wave64, if we wanted to identify any lane greater than 31, then we would need __builtin_amdgcn_mbcnt_hi(~0u, __builtin_amdgcn_mbcnt_lo(~0u, 0u)).
>
> As far as I understand mbcnt_lo will return 0 for any thread >= 32, so you still need to use mbcnt_hi.
>
> @b-sumner why do you suggest to nest the hi and lo calls? I think it shall be (__builtin_amdgcn_mbcnt_lo(~0u, 0u) + __builtin_amdgcn_mbcnt_hi(~0u, 0u)) == 0.
Anyhow, manual says:
Example to compute each thread's position in 0..63:
v_mbcnt_lo_u32_b32 v0, -1, 0
v_mbcnt_hi_u32_b32 v0, -1, v0
// v0 now contains ThreadPosition
So it tells to nest it. Probably to save one instruction.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D103225/new/
https://reviews.llvm.org/D103225
More information about the llvm-commits
mailing list