[llvm] [SROA] Use tree-structure merge to remove alloca (PR #152793)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 19 15:06:15 PDT 2025
================
@@ -2811,6 +2896,213 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
return CanSROA;
}
+ /// Attempts to rewrite a partition using tree-structured merge optimization.
+ ///
+ /// This function analyzes a partition to determine if it can be optimized
+ /// using a tree-structured merge pattern, where multiple non-overlapping
+ /// stores completely fill an alloca. And there is no load from the alloca in
+ /// the middle of the stores. Such patterns can be optimized by eliminating
+ /// the intermediate stores and directly constructing the final vector by
+ /// using shufflevectors.
+ ///
+ /// Example transformation:
+ /// Before: (stores do not have to be in order)
+ /// %alloca = alloca <8 x float>
+ /// store <2 x float> %val0, ptr %alloca ; offset 0-1
+ /// store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
+ /// store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
+ /// store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
+ ///
+ /// After:
+ /// %alloca = alloca <8 x float>
+ /// %shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2,
+ /// i32 3>
+ /// %shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2,
+ /// i32 3>
+ /// %shuffle2 = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1,
+ /// i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+ /// store %shuffle2, ptr %alloca
+ ///
+ /// The optimization looks for partitions that:
+ /// 1. Have no overlapping split slice tails
+ /// 2. Contain non-overlapping stores that cover the entire alloca
+ /// 3. Have exactly one load that reads the complete alloca structure and not
+ /// in the middle of the stores (TODO: maybe we can relax the constraint
+ /// about reading the entire alloca structure)
+ ///
+ /// \param P The partition to analyze and potentially rewrite
+ /// \return An optional vector of values that were deleted during the rewrite
+ /// process, or std::nullopt if the partition cannot be optimized
+ /// using tree-structured merge
+ std::optional<SmallVector<Value *, 4>>
+ rewriteTreeStructuredMerge(Partition &P) {
+ // No tail slices that overlap with the partition
+ if (P.splitSliceTails().size() > 0)
+ return std::nullopt;
+
+ SmallVector<Value *, 4> DeletedValues;
+ LoadInst *TheLoad = nullptr;
+
+ // Structure to hold store information
+ struct StoreInfo {
+ StoreInst *Store;
+ uint64_t BeginOffset;
+ uint64_t EndOffset;
+ Value *StoredValue;
+ StoreInfo(StoreInst *SI, uint64_t Begin, uint64_t End, Value *Val)
+ : Store(SI), BeginOffset(Begin), EndOffset(End), StoredValue(Val) {}
+ };
+
+ SmallVector<StoreInfo, 4> StoreInfos;
+
+ // If the new alloca is a fixed vector type, we use its element type as the
+ // allocated element type, otherwise we use i8 as the allocated element
+ Type *AllocatedEltTy =
+ isa<FixedVectorType>(NewAI.getAllocatedType())
+ ? cast<FixedVectorType>(NewAI.getAllocatedType())->getElementType()
+ : Type::getInt8Ty(NewAI.getContext());
+
+ // Helper to check if a type is
+ // 1. A fixed vector type
+ // 2. The element type is not a pointer
+ // 3. The element type size is byte-aligned
+ // We only handle the cases that the ld/st meet these conditions
+ auto IsTypeValidForTreeStructuredMerge = [&](Type *Ty) -> bool {
+ auto *FixedVecTy = dyn_cast<FixedVectorType>(Ty);
+ return FixedVecTy &&
+ DL.getTypeSizeInBits(FixedVecTy->getElementType()) % 8 == 0 &&
+ !FixedVecTy->getElementType()->isPointerTy();
+ };
----------------
Chengjunp wrote:
If the types are not byte aligned, it may be not safe to do this transformation. And non-byte aligned cases are not very common. So, just to be conversative, I choose not to support non-byte aligned ld/st.
https://github.com/llvm/llvm-project/pull/152793
More information about the llvm-commits
mailing list