[PATCH] D147408: [AMDGPU] Enable AMDGPU Atomic Optimizer Pass by default.
Ruiling, Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 5 23:52:51 PDT 2023
ruiling added a comment.
> If not-taken conditional branches are cheap then we could do something like this. It only has one taken branch, when we have finished handling all the active lanes.
>
> // Inclusive plus-scan v0 into v1. Also leaves the result of the plus-reduction in s3.
> s_mov s0, exec
> s_mov s3, 0 // accumulator
> // repeat this section 32 or 64 times:
> s_ff1 s1, s0 // find lowest remaining active lane
> s_cmp_eq s1, -1
> s_cbranch_scc1 end
> s_bitset0 s0, s1
> v_readlane s2, v0, s1
> s_add s3, s2
> v_writelane v1, s3, s1
> // end of repeated section
> end:
The LLVM IR that can do this:
bb0:
%value = ...
%ballot = call i32 @llvm.amdgcn.ballot.i32(i1 1)
br label %bb1
bb1:
%accum = phi i32 [ 0, %entry ], [ %new_accum, %bb1 ]
%old_value_phi = phi i32 [ poison, %entry ], [ %old_value, %bb1 ]
%active_bits = phi i32 [ %ballot, %entry ], [ %new_active_bits, %bb1 ]
%ff1 = call i32 @llvm.cttz.i32(i32 %active_bits, i1 true)
%lane_value = call i32 @llvm.amdgcn.readlane(i32 %value, i32 %ff1)
%old_value = call i32 @llvm.amdgcn.writelane(i32 %accum, i32 %ff1, i32 %old_value_phi)
%new_accum = add i32 %accum, %lane_value
%mask = shl i32 1, %ff1
%inverse_mask = xor i32 %mask, -1
%new_active_bits = and i32 %active_bits, %inverse_mask
%is_end = icmp eq i32 %new_active_bits, 0
br i1 %is_end, label %bb2, label %bb1
bb2:
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147408/new/
https://reviews.llvm.org/D147408
More information about the llvm-commits
mailing list