[llvm] [AMDGPU] - Add constant folding for s_bitreplicate (PR #72366)
Jessica Del via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 15 01:47:34 PST 2023
================
@@ -2422,6 +2423,23 @@ static Constant *ConstantFoldScalarCall1(StringRef Name,
return ConstantFP::get(Ty->getContext(), Val);
}
+
+ case Intrinsic::amdgcn_s_bitreplicate: {
+ uint64_t Val = Op->getZExtValue();
+ uint64_t ReplicatedVal = 0;
+ uint64_t ReplicatedOnes = 0b11;
+ // Input operand is always b32
+ for (unsigned i = 0; i < 32; ++i, ReplicatedOnes <<= 2, Val >>= 1) {
+ uint64_t Bit = Val & 1;
+
+ if (!Bit)
+ continue;
+
+ ReplicatedVal |= ReplicatedOnes;
+ }
----------------
OutOfCache wrote:
My initial version looked more straightforward like this:
```cpp
uint64_t ReplicatedVal = 0;
for (unsigned i = 0; i < 32; ++i) {
uint64_t Bit = (Val >> i) & 1;
if (!Bit)
continue;
ReplicatedVal |= (1 << (i * 2));
ReplicatedVal |= (1 << ((i * 2) + 1);
}
```
But I realized we don't need to shift the values by `i` every iteration, since `i` increases by 1 every time. So instead, I shift the original value by 1 every iteration and shift a mask with two ones (`0b11`) by 2.
The current version might be harder to read or understand. Let me know what you think.
https://github.com/llvm/llvm-project/pull/72366
More information about the llvm-commits
mailing list