[llvm] [SROA] Use tree-structure merge to remove alloca (PR #152793)
Nikita Popov via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 22 14:06:07 PDT 2025
================
@@ -2811,6 +2902,220 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
return CanSROA;
}
+ /// Attempts to rewrite a partition using tree-structured merge optimization.
+ ///
+ /// This function analyzes a partition to determine if it can be optimized
+ /// using a tree-structured merge pattern, where multiple non-overlapping
+ /// stores completely fill an alloca. And there is no load from the alloca in
+ /// the middle of the stores. Such patterns can be optimized by eliminating
+ /// the intermediate stores and directly constructing the final vector by
+ /// using shufflevectors.
+ ///
+ /// Example transformation:
+ /// Before: (stores do not have to be in order)
+ /// %alloca = alloca <8 x float>
+ /// store <2 x float> %val0, ptr %alloca ; offset 0-1
+ /// store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
+ /// store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
+ /// store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
+ ///
+ /// After:
+ /// %alloca = alloca <8 x float>
+ /// %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2,
+ /// i32 3>
+ /// %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2,
+ /// i32 3>
+ /// %shuffle2 = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1,
+ /// i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+ /// store %shuffle2, ptr %alloca
+ ///
+ /// The optimization looks for partitions that:
+ /// 1. Have no overlapping split slice tails
+ /// 2. Contain non-overlapping stores that cover the entire alloca
+ /// 3. Have exactly one load that reads the complete alloca structure and not
+ /// in the middle of the stores (TODO: maybe we can relax the constraint
+ /// about reading the entire alloca structure)
+ ///
+ /// \param P The partition to analyze and potentially rewrite
+ /// \return An optional vector of values that were deleted during the rewrite
+ /// process, or std::nullopt if the partition cannot be optimized
+ /// using tree-structured merge
+ std::optional<SmallVector<Value *, 4>>
+ rewriteTreeStructuredMerge(Partition &P) {
+ // No tail slices that overlap with the partition
+ if (P.splitSliceTails().size() > 0)
+ return std::nullopt;
+
+ SmallVector<Value *, 4> DeletedValues;
+ LoadInst *TheLoad = nullptr;
+
+ // Structure to hold store information
+ struct StoreInfo {
+ StoreInst *Store;
+ uint64_t BeginOffset;
+ uint64_t EndOffset;
+ Value *StoredValue;
+ StoreInfo(StoreInst *SI, uint64_t Begin, uint64_t End, Value *Val)
+ : Store(SI), BeginOffset(Begin), EndOffset(End), StoredValue(Val) {}
+ };
+
+ SmallVector<StoreInfo, 4> StoreInfos;
+
+ // The alloca must be a fixed vector type
+ Type *AllocatedEltTy = nullptr;
+ if (auto *FixedVecTy = dyn_cast<FixedVectorType>(NewAI.getAllocatedType()))
+ AllocatedEltTy = FixedVecTy->getElementType();
+ else
+ return std::nullopt;
+ // If the allocated element type is a pointer, we do not handle it
+ // TODO: handle this case by using inttoptr/ptrtoint
+ if (AllocatedEltTy->isPtrOrPtrVectorTy())
----------------
nikic wrote:
```suggestion
if (AllocatedEltTy->isPointerTy())
```
It's already scalar. Same for checks below.
https://github.com/llvm/llvm-project/pull/152793
More information about the llvm-commits
mailing list