[PATCH] D64328: [AMDGPU] Optimize atomic max/min
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 10 01:24:24 PDT 2019
foad marked 4 inline comments as done.
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:328
+ case AtomicRMWInst::UMax:
+ IdentityVal = APInt::getMinValue(TyBitWidth);
+ break;
----------------
arsenm wrote:
> Shouldn't this just be zero for add/sub?
Yes, getMinValue means min unsigned value which is zero.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:330-338
+ case AtomicRMWInst::UMin:
+ IdentityVal = APInt::getMaxValue(TyBitWidth);
+ break;
+ case AtomicRMWInst::Max:
+ IdentityVal = APInt::getSignedMinValue(TyBitWidth);
+ break;
+ case AtomicRMWInst::Min:
----------------
arsenm wrote:
> The min and max values seem backwards? min(MinValue, X) -> MinValue, not X
I got one of these cases wrong in the first patch but they should all be correct now. The identity for min() is MaxValue and vice versa.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:388
CallInst *const ReadLaneLo = B.CreateIntrinsic(
Intrinsic::amdgcn_readlane, {}, {ExtractLo, B.getInt32(63)});
CallInst *const ReadLaneHi = B.CreateIntrinsic(
----------------
arsenm wrote:
> I don't understand why these are reading lane 63. Why not lane 0? Why not readfirstlane? This won't work with wave64, but I guess that's an existing problem
Here's how the exclusive scan works:
ExclScan lane 0 is 0
ExclScan lane 1 is V lane 0
ExclScan lane 2 is V lane 0+1
ExclScan lane 3 is V lane 0+1+2
...
ExclScan lane 63 is V lane 0+...+62
Here we're reading lane 63 of NewV = ExclScan+V, i.e. the sum of all lanes in V.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64328/new/
https://reviews.llvm.org/D64328
More information about the llvm-commits
mailing list