[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 4 09:06:38 PDT 2023
arsenm added a comment.
In D147408#4243427 <https://reviews.llvm.org/D147408#4243427>, @b-sumner wrote:
> In D147408#4243403 <https://reviews.llvm.org/D147408#4243403>, @foad wrote:
>
>>> Scalar branches may be the most expensive aspect of this algorithm
>>
>> If not-taken conditional branches are cheap then we could do something like this. It only has one taken branch, when we have finished handling all the active lanes.
>>
>> // Inclusive plus-scan v0 into v1. Also leaves the result of the plus-reduction in s3.
>> s_mov s0, exec
>> s_mov s3, 0 // accumulator
>> // repeat this section 32 or 64 times:
>> s_ff1 s1, s0 // find lowest remaining active lane
>> s_cmp_eq s1, -1
>> s_cbranch_scc1 end
>> s_bitset0 s0, s1
>> v_readlane s2, v0, s1
>> s_add s3, s2
>> v_writelane v1, s3, s1
>> // end of repeated section
>> end:
>
> Yes, that looks like what we want. The challenge will be creating IR that will lower to that.
I increasingly think we should just have intrinsics for reduction ops and move all this into codegen
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list