[llvm] [AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce excessive codegen (PR #171253)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 18 10:29:36 PST 2025
================
@@ -644,6 +645,36 @@ static Value *promoteAllocaUserToVector(Instruction *Inst, const DataLayout &DL,
auto *SubVecTy = FixedVectorType::get(VecEltTy, NumLoadedElts);
assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));
+ // If idx is dynamic, then sandwich load with bitcasts.
+ // ie. VectorTy SubVecTy AccessTy
+ // <64 x i8> -> <16 x i8> <8 x i16>
+ // <64 x i8> -> <4 x i128> -> i128 -> <8 x i16>
+ // Extracting subvector with dynamic index has very large expansion in
+ // the amdgpu backend. Limit to pow2.
+ FixedVectorType *VectorTy = AA.Vector.Ty;
+ uint64_t NumBits = DL.getTypeStoreSize(SubVecTy) * 8u;
+ uint64_t LoadAlign = cast<LoadInst>(Inst)->getAlign().value();
+ bool IsAlignedLoad = NumBits <= (LoadAlign * 8u);
+ unsigned TotalNumElts = VectorTy->getNumElements();
+ bool IsProperlyDivisible = TotalNumElts % NumLoadedElts == 0;
----------------
arsenm wrote:
Can you keep this in terms of TypeSize operators
https://github.com/llvm/llvm-project/pull/171253
More information about the llvm-commits
mailing list