[llvm] [AMDGPU] In promote-alloca, if index is dynamic, sandwich load with bitcasts to reduce number of extractelements as they have large expansion in the backend. (PR #171253)
Nicolai Hähnle via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 12 13:32:56 PST 2025
================
@@ -556,6 +557,35 @@ static Value *promoteAllocaUserToVector(
auto *SubVecTy = FixedVectorType::get(VecEltTy, NumLoadedElts);
assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));
+ // If idx is dynamic, then sandwich load with bitcasts.
+ // ie. <64 x i8> -> <16 x i8> instead do
+ // <64 x i8> -> <4 x i128> -> i128 -> <16 x i8>
+ // Extracting subvector with dynamic index has very large expansion in
+ // the amdgpu backend. Limit to pow2 for UDiv.
+ if (!isa<ConstantInt>(Index) && SubVecTy->isIntOrIntVectorTy() &&
+ llvm::isPowerOf2_32(VectorTy->getNumElements()) &&
+ llvm::isPowerOf2_32(SubVecTy->getNumElements())) {
----------------
nhaehnle wrote:
Check instead that the subvector is a power-of-two and the vector is a multiple of it? That covers more cases that a reasonable programmer might use and I would expect/hope that it still gives reasonable codegen.
https://github.com/llvm/llvm-project/pull/171253
More information about the llvm-commits
mailing list