[llvm] [AMDGPU] Improve codegen for GFX10+ DPP reductions and scans (PR #107108)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 3 06:36:53 PDT 2024
================
@@ -421,9 +421,10 @@ Value *AMDGPUAtomicOptimizerImpl::buildReduction(IRBuilder<> &B,
// Reduce within each pair of rows (i.e. 32 lanes).
assert(ST->hasPermLaneX16());
- Value *Permlanex16Call = B.CreateIntrinsic(
- V->getType(), Intrinsic::amdgcn_permlanex16,
- {V, V, B.getInt32(-1), B.getInt32(-1), B.getFalse(), B.getFalse()});
+ Value *Permlanex16Call =
+ B.CreateIntrinsic(AtomicTy, Intrinsic::amdgcn_permlanex16,
+ {PoisonValue::get(AtomicTy), V, B.getInt32(0),
+ B.getInt32(0), B.getFalse(), B.getFalse()});
----------------
jayfoad wrote:
Using 0 instead of -1 for the lane select is just a stylistic choice, since for a reduction all lanes have the same value and it doesn't matter which one we choose. I just think 0 is the least surprising value to use.
https://github.com/llvm/llvm-project/pull/107108
More information about the llvm-commits
mailing list