[llvm] Revert "[LSV] Merge contiguous chains across scalar types" (PR #170381)

Tue Dec 2 14:25:29 PST 2025

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Drew Kersnar (dakersnar)

<details>
<summary>Changes</summary>

Reverts llvm/llvm-project#154069. I pointed out a number of issues post-merge, most importantly examples of miscompiles: https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.

While the motivation of the change is clear, I think the implementation approach is flawed. It seems like the goal is to allow elements like `load <2xi16>` and `load i32` to be vectorized together despite the current algorithm not grouping them into the same equivalence classes. I personally think that if we want to attempt this it should be a more wholistic approach, maybe even redefining the concept of an equivalence class. This current solution seems like it would be really hard to do bug-free, and even if the bugs were present, it is only able to merge chains that happen to be adjacent to each other after `splitChainByContiguity`. But we can discuss more in the re-land. Maybe the broader approach I'm proposing is too difficult, and a narrow optimization is worthwhile. Regardless, this should be reverted, it needs more iteration before it is correct.

---

Patch is 411.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170381.diff


50 Files Affected:

- (modified) llvm/include/llvm/Transforms/Utils/Local.h (+1-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+1-1) 
- (modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+2-2) 
- (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+1-1) 
- (modified) llvm/lib/Transforms/Utils/Local.cpp (+22-38) 
- (modified) llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp (+63-219) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/amdgpu-irtranslator.ll (-2) 
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+129-133) 
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+792-825) 
- (modified) llvm/test/CodeGen/AMDGPU/build_vector.ll (+15-16) 
- (modified) llvm/test/CodeGen/AMDGPU/fabs.bf16.ll (+40-18) 
- (modified) llvm/test/CodeGen/AMDGPU/fabs.f16.ll (+14-14) 
- (modified) llvm/test/CodeGen/AMDGPU/fabs.ll (+15-16) 
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll (+26-28) 
- (modified) llvm/test/CodeGen/AMDGPU/fdiv.ll (+162-171) 
- (modified) llvm/test/CodeGen/AMDGPU/fnearbyint.ll (+11-12) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll (+39-45) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll (+7-7) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+15-16) 
- (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+15-16) 
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_sint.ll (+49-57) 
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_uint.ll (+30-39) 
- (modified) llvm/test/CodeGen/AMDGPU/fshl.ll (+149-141) 
- (modified) llvm/test/CodeGen/AMDGPU/fshr.ll (+206-197) 
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics.ll (-82) 
- (modified) llvm/test/CodeGen/AMDGPU/half.ll (+109-65) 
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+27-27) 
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll (+6-6) 
- (modified) llvm/test/CodeGen/AMDGPU/kernel-args.ll (+12-12) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll (+120-192) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.ll (+58-59) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.ll (+58-59) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.ll (+33-34) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+56-57) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+56-57) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (+12-14) 
- (modified) llvm/test/CodeGen/AMDGPU/max.ll (+16-18) 
- (modified) llvm/test/CodeGen/AMDGPU/min.ll (+108-108) 
- (modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+30-58) 
- (modified) llvm/test/CodeGen/AMDGPU/rotl.ll (+36-38) 
- (modified) llvm/test/CodeGen/AMDGPU/rotr.ll (+28-30) 
- (modified) llvm/test/CodeGen/AMDGPU/s_addk_i32.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll (+63-64) 
- (modified) llvm/test/CodeGen/AMDGPU/store-to-constant.ll (+2-4) 
- (modified) llvm/test/CodeGen/AMDGPU/udivrem.ll (+61-65) 
- (modified) llvm/test/CodeGen/AMDGPU/uint_to_fp.f64.ll (+1-1) 
- (removed) llvm/test/Transforms/InstCombine/copy-access-metadata.ll (-215) 
- (removed) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/copy-metadata-load-store.ll (-159) 
- (modified) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors-complex.ll (+66-258) 
- (modified) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll (+3-281) 


``````````diff

diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index d0af2d3d2e4c2..9acfd872e574b 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -431,7 +431,7 @@ LLVM_ABI void combineAAMetadata(Instruction *K, const Instruction *J);
 
 /// Copy the metadata from the source instruction to the destination (the
 /// replacement for the source instruction).
-LLVM_ABI void copyMetadataForAccess(Instruction &Dest, Instruction &Source);
+LLVM_ABI void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source);
 
 /// Patch the replacement so that it is not more restrictive than the value
 /// being replaced. It assumes that the replacement does not get moved from
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 98884c441096e..fdff21b6ef8df 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1035,7 +1035,7 @@ bool LegalizeBufferContentTypesVisitor::visitLoadImpl(
       LoadInst *NewLI = IRB.CreateAlignedLoad(
           LoadableType, NewPtr, commonAlignment(OrigLI.getAlign(), ByteOffset),
           Name + ".off." + Twine(ByteOffset));
-      copyMetadataForAccess(*NewLI, OrigLI);
+      copyMetadataForLoad(*NewLI, OrigLI);
       NewLI->setAAMetadata(
           AANodes.adjustForAccess(ByteOffset, LoadableType, DL));
       NewLI->setAtomic(OrigLI.getOrdering(), OrigLI.getSyncScopeID());
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 3e04aeb675d2a..9491610190c10 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -415,7 +415,7 @@ void PointerReplacer::replace(Instruction *I) {
                               LT->getAlign(), LT->getOrdering(),
                               LT->getSyncScopeID());
     NewI->takeName(LT);
-    copyMetadataForAccess(*NewI, *LT);
+    copyMetadataForLoad(*NewI, *LT);
 
     IC.InsertNewInstWith(NewI, LT->getIterator());
     IC.replaceInstUsesWith(*LT, NewI);
@@ -606,7 +606,7 @@ LoadInst *InstCombinerImpl::combineLoadToNewType(LoadInst &LI, Type *NewTy,
       Builder.CreateAlignedLoad(NewTy, LI.getPointerOperand(), LI.getAlign(),
                                 LI.isVolatile(), LI.getName() + Suffix);
   NewLoad->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
-  copyMetadataForAccess(*NewLoad, LI);
+  copyMetadataForLoad(*NewLoad, LI);
   return NewLoad;
 }
 
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index a7c322bfcb981..70afe833c9f47 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -3272,7 +3272,7 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
       // Copy any metadata that is valid for the new load. This may require
       // conversion to a different kind of metadata, e.g. !nonnull might change
       // to !range or vice versa.
-      copyMetadataForAccess(*NewLI, LI);
+      copyMetadataForLoad(*NewLI, LI);
 
       // Do this after copyMetadataForLoad() to preserve the TBAA shift.
       if (AATags)
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index dec2e019333b9..a03cf6e953e35 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3100,70 +3100,54 @@ void llvm::combineAAMetadata(Instruction *K, const Instruction *J) {
   combineMetadata(K, J, /*DoesKMove=*/true, /*AAOnly=*/true);
 }
 
-void llvm::copyMetadataForAccess(Instruction &DestI, Instruction &SourceI) {
+void llvm::copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source) {
   SmallVector<std::pair<unsigned, MDNode *>, 8> MD;
-  SourceI.getAllMetadata(MD);
-  MDBuilder MDB(DestI.getContext());
-  Type *NewType = DestI.getType();
-
-  // Only needed for range metadata on loads.
-  const DataLayout *DL = nullptr;
-  const LoadInst *LSource = dyn_cast<LoadInst>(&SourceI);
-  if (LSource)
-    DL = &LSource->getDataLayout();
-
+  Source.getAllMetadata(MD);
+  MDBuilder MDB(Dest.getContext());
+  Type *NewType = Dest.getType();
+  const DataLayout &DL = Source.getDataLayout();
   for (const auto &MDPair : MD) {
     unsigned ID = MDPair.first;
     MDNode *N = MDPair.second;
-
+    // Note, essentially every kind of metadata should be preserved here! This
+    // routine is supposed to clone a load instruction changing *only its type*.
+    // The only metadata it makes sense to drop is metadata which is invalidated
+    // when the pointer type changes. This should essentially never be the case
+    // in LLVM, but we explicitly switch over only known metadata to be
+    // conservatively correct. If you are adding metadata to LLVM which pertains
+    // to loads, you almost certainly want to add it here.
     switch (ID) {
-    // Applies to both loads and stores as-is.
     case LLVMContext::MD_dbg:
+    case LLVMContext::MD_tbaa:
     case LLVMContext::MD_prof:
+    case LLVMContext::MD_fpmath:
     case LLVMContext::MD_tbaa_struct:
+    case LLVMContext::MD_invariant_load:
     case LLVMContext::MD_alias_scope:
     case LLVMContext::MD_noalias:
     case LLVMContext::MD_nontemporal:
+    case LLVMContext::MD_mem_parallel_loop_access:
     case LLVMContext::MD_access_group:
     case LLVMContext::MD_noundef:
     case LLVMContext::MD_noalias_addrspace:
-    case LLVMContext::MD_mem_parallel_loop_access:
-      DestI.setMetadata(ID, N);
-      break;
-
-    // Load-only metadata.
-    case LLVMContext::MD_fpmath:
-    case LLVMContext::MD_invariant_load:
-      if (isa<LoadInst>(DestI))
-        DestI.setMetadata(ID, N);
+      // All of these directly apply.
+      Dest.setMetadata(ID, N);
       break;
 
     case LLVMContext::MD_nonnull:
-      if (auto *LDest = dyn_cast<LoadInst>(&DestI)) {
-        if (LSource)
-          copyNonnullMetadata(*LSource, N, *LDest);
-      }
+      copyNonnullMetadata(Source, N, Dest);
       break;
 
     case LLVMContext::MD_align:
     case LLVMContext::MD_dereferenceable:
     case LLVMContext::MD_dereferenceable_or_null:
-      // Applies to both loads and stores only if the new type is also a
-      // pointer.
+      // These only directly apply if the new type is also a pointer.
       if (NewType->isPointerTy())
-        DestI.setMetadata(ID, N);
+        Dest.setMetadata(ID, N);
       break;
 
     case LLVMContext::MD_range:
-      if (auto *LDest = dyn_cast<LoadInst>(&DestI)) {
-        if (LSource && DL)
-          copyRangeMetadata(*DL, *LSource, N, *LDest);
-      }
-      break;
-
-    case LLVMContext::MD_tbaa:
-      if (isa<LoadInst>(DestI))
-        DestI.setMetadata(ID, N);
+      copyRangeMetadata(DL, Source, N, Dest);
       break;
     }
   }
diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 114df653bad83..c28314f6ab124 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -112,7 +112,6 @@
 #include <optional>
 #include <tuple>
 #include <type_traits>
-#include <unordered_map>
 #include <utility>
 #include <vector>
 
@@ -269,6 +268,11 @@ class Vectorizer {
   /// isGuaranteedToTransferExecutionToSuccessor(I) == true.
   bool runOnPseudoBB(BasicBlock::iterator Begin, BasicBlock::iterator End);
 
+  /// Runs the vectorizer on one equivalence class, i.e. one set of loads/stores
+  /// in the same BB with the same value for getUnderlyingObject() etc.
+  bool runOnEquivalenceClass(const EqClassKey &EqClassKey,
+                             ArrayRef<Instruction *> EqClass);
+
   /// Runs the vectorizer on one chain, i.e. a subset of an equivalence class
   /// where all instructions access a known, constant offset from the first
   /// instruction.
@@ -334,22 +338,12 @@ class Vectorizer {
   EquivalenceClassMap collectEquivalenceClasses(BasicBlock::iterator Begin,
                                                 BasicBlock::iterator End);
 
-  /// Inserts a cast instruction to convert Inst to DstTy.
-  Value *insertCast(Value *Val, Type *DstTy);
-
   /// Partitions Instrs into "chains" where every instruction has a known
   /// constant offset from the first instr in the chain.
   ///
   /// Postcondition: For all i, ret[i][0].second == 0, because the first instr
   /// in the chain is the leader, and an instr touches distance 0 from itself.
   std::vector<Chain> gatherChains(ArrayRef<Instruction *> Instrs);
-
-  // Helpers for chain merging.
-  std::optional<APInt> computeLeaderDelta(Instruction *I1, Instruction *I2);
-  bool chainsOverlapAfterRebase(const Chain &A, const Chain &B,
-                                const APInt &Delta) const;
-  static void rebaseChain(Chain &C, const APInt &Delta);
-  void normalizeChainToType(Chain &C, Type *CastTy);
 };
 
 class LoadStoreVectorizerLegacyPass : public FunctionPass {
@@ -431,20 +425,6 @@ PreservedAnalyses LoadStoreVectorizerPass::run(Function &F,
   return Changed ? PA : PreservedAnalyses::all();
 }
 
-static const Value *getUnderlyingObject(const Value *Ptr) {
-  const Value *ObjPtr = llvm::getUnderlyingObject(Ptr);
-  if (const auto *Sel = dyn_cast<SelectInst>(ObjPtr)) {
-    // The select's themselves are distinct instructions even if they share
-    // the same condition and evaluate to consecutive pointers for true and
-    // false values of the condition. Therefore using the select's themselves
-    // for grouping instructions would put consecutive accesses into different
-    // lists and they won't be even checked for being consecutive, and won't
-    // be vectorized.
-    return Sel->getCondition();
-  }
-  return ObjPtr;
-}
-
 bool Vectorizer::run() {
   bool Changed = false;
   // Break up the BB if there are any instrs which aren't guaranteed to transfer
@@ -488,88 +468,6 @@ bool Vectorizer::run() {
   return Changed;
 }
 
-Value *Vectorizer::insertCast(Value *Val, Type *DstTy) {
-  if (DL.getTypeSizeInBits(Val->getType()) == DL.getTypeSizeInBits(DstTy)) {
-    return Builder.CreateBitOrPointerCast(Val, DstTy, Val->getName() + ".bc");
-  }
-
-  // If the types are of different sizes and both are integers, we can use
-  // zext or sext to cast.
-  if (Val->getType()->isIntegerTy() && DstTy->isIntegerTy()) {
-    if (DL.getTypeSizeInBits(Val->getType()) < DL.getTypeSizeInBits(DstTy)) {
-      return Builder.CreateZExt(Val, DstTy, Val->getName() + ".bc");
-    }
-    return Builder.CreateTrunc(Val, DstTy, Val->getName() + ".bc");
-  }
-
-  return nullptr;
-}
-
-std::optional<APInt> Vectorizer::computeLeaderDelta(Instruction *I1,
-                                                    Instruction *I2) {
-  assert(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) ||
-          (isa<StoreInst>(I1) && isa<StoreInst>(I2))) &&
-         "computeLeaderDelta must be called with two load or two store "
-         "instructions");
-  Instruction *CtxInst = I1->comesBefore(I2) ? I2 : I1;
-  const Value *Ptr1 = getLoadStorePointerOperand(I1);
-  const Value *Ptr2 = getLoadStorePointerOperand(I2);
-  return getConstantOffset(const_cast<Value *>(Ptr1), const_cast<Value *>(Ptr2),
-                           CtxInst);
-}
-
-bool Vectorizer::chainsOverlapAfterRebase(const Chain &A, const Chain &B,
-                                          const APInt &Delta) const {
-  ConstantRange ARange(
-      A.front().OffsetFromLeader,
-      A.back().OffsetFromLeader +
-          DL.getTypeStoreSize(getLoadStoreType(A.back().Inst)));
-  ConstantRange BRange(
-      B.front().OffsetFromLeader + Delta,
-      B.back().OffsetFromLeader + Delta +
-          DL.getTypeStoreSize(getLoadStoreType(B.back().Inst)));
-  return !ARange.intersectWith(BRange).isEmptySet();
-}
-
-void Vectorizer::rebaseChain(Chain &C, const APInt &Delta) {
-  for (ChainElem &E : C)
-    E.OffsetFromLeader += Delta;
-}
-
-void Vectorizer::normalizeChainToType(Chain &C, Type *CastTy) {
-  for (ChainElem &Elem : C) {
-    Instruction *Inst = Elem.Inst;
-    Type *OrigValTy = getLoadStoreType(Inst);
-    if (OrigValTy == CastTy)
-      continue;
-
-    if (auto *LI = dyn_cast<LoadInst>(Inst)) {
-      Builder.SetInsertPoint(LI);
-      LoadInst *NewLoad = Builder.CreateLoad(CastTy, LI->getPointerOperand(),
-                                             LI->getName() + ".mut");
-      copyMetadataForAccess(*NewLoad, *LI);
-      Value *CastBack = insertCast(NewLoad, OrigValTy);
-      if (!CastBack)
-        llvm_unreachable("Failed to insert cast");
-      LI->replaceAllUsesWith(CastBack);
-      ToErase.emplace_back(LI);
-      Elem.Inst = NewLoad;
-    } else if (auto *SI = dyn_cast<StoreInst>(Inst)) {
-      Builder.SetInsertPoint(SI);
-      Value *CastVal = insertCast(SI->getValueOperand(), CastTy);
-      if (!CastVal)
-        llvm_unreachable("Failed to insert cast");
-      StoreInst *NewStore =
-          Builder.CreateStore(CastVal, SI->getPointerOperand());
-      NewStore->setAlignment(SI->getAlign());
-      NewStore->setVolatile(SI->isVolatile());
-      copyMetadataForAccess(*NewStore, *SI);
-      ToErase.emplace_back(SI);
-      Elem.Inst = NewStore;
-    }
-  }
-}
-
 bool Vectorizer::runOnPseudoBB(BasicBlock::iterator Begin,
                                BasicBlock::iterator End) {
   LLVM_DEBUG({
@@ -582,120 +480,49 @@ bool Vectorizer::runOnPseudoBB(BasicBlock::iterator Begin,
   });
 
   bool Changed = false;
-  SmallVector<Chain> ContiguousSubChains;
-
   for (const auto &[EqClassKey, EqClass] :
-       collectEquivalenceClasses(Begin, End)) {
-
-    LLVM_DEBUG({
-      dbgs() << "LSV: Running on equivalence class of size " << EqClass.size()
-             << " keyed on " << EqClassKey << ":\n";
-      for (Instruction *I : EqClass)
-        dbgs() << "  " << *I << "\n";
-    });
-
-    for (Chain &C : gatherChains(EqClass)) {
-
-      // Split up the chain into increasingly smaller chains, until we can
-      // finally vectorize the chains.
-      //
-      // (Don't be scared by the depth of the loop nest here.  These operations
-      // are all at worst O(n lg n) in the number of instructions, and splitting
-      // chains doesn't change the number of instrs.  So the whole loop nest is
-      // O(n lg n).)
-      for (auto &C : splitChainByMayAliasInstrs(C)) {
-        for (auto &C : splitChainByContiguity(C)) {
-          ContiguousSubChains.emplace_back(C);
-        }
-      }
-    }
-  }
-
-  // Merge chains in reverse order, so that the first chain is the largest.
-  for (int I = ContiguousSubChains.size() - 1; I > 0; I--) {
-    Chain &C1 = ContiguousSubChains[I - 1];
-    Chain &C2 = ContiguousSubChains[I];
+       collectEquivalenceClasses(Begin, End))
+    Changed |= runOnEquivalenceClass(EqClassKey, EqClass);
 
-    // If the scalar types of the chains are the same, we can merge them
-    // without inserting any casts.
-    if (getLoadStoreType(C1[0].Inst)->getScalarType() ==
-        getLoadStoreType(C2[0].Inst)->getScalarType())
-      continue;
-
-    const Value *C1Ptr = getLoadStorePointerOperand(C1[0].Inst);
-    const Value *C2Ptr = getLoadStorePointerOperand(C2[0].Inst);
-    unsigned AS1 = C1Ptr->getType()->getPointerAddressSpace();
-    unsigned AS2 = C2Ptr->getType()->getPointerAddressSpace();
-    bool C1IsLoad = isa<LoadInst>(C1[0].Inst);
-    bool C2IsLoad = isa<LoadInst>(C2[0].Inst);
-
-    // If the chains are mapped to different types, have distinct underlying
-    // pointer objects, or include both loads and stores, skip.
-    if (C1IsLoad != C2IsLoad || AS1 != AS2 ||
-        ::getUnderlyingObject(C1Ptr) != ::getUnderlyingObject(C2Ptr))
-      continue;
-
-    // Compute constant offset between chain leaders; if unknown, skip.
-    std::optional<APInt> DeltaOpt = computeLeaderDelta(C1[0].Inst, C2[0].Inst);
-    if (!DeltaOpt)
-      continue;
-
-    // Check that rebasing C2 into C1's coordinate space will not overlap C1.
-    if (chainsOverlapAfterRebase(C1, C2, *DeltaOpt))
-      continue;
-
-    // Determine the common integer cast type for normalization and ensure total
-    // bitwidth matches across all elements of both chains.
-    Type *C1ElemTy = getLoadStoreType(C1[0].Inst);
-    unsigned TotalBits = DL.getTypeSizeInBits(C1ElemTy);
-    auto AllElemsMatchTotalBits = [&](const Chain &C) {
-      return llvm::all_of(C, [&](const ChainElem &E) {
-        return DL.getTypeSizeInBits(getLoadStoreType(E.Inst)) == TotalBits;
-      });
-    };
-    if (!AllElemsMatchTotalBits(C1) || !AllElemsMatchTotalBits(C2))
-      continue;
+  return Changed;
+}
 
-    // Power-of-two span ensures we can form a legal, single vector access
-    // without padding or splitting. Many targets and cost models assume POT
-    // widths, and it guarantees an integral element count for the chosen
-    // VecElemTy.
-    APInt Sz = C2.front().OffsetFromLeader +
-               DL.getTypeStoreSize(getLoadStoreType(C2.front().Inst)) -
-               C1.back().OffsetFromLeader + *DeltaOpt;
-    if (!Sz.isPowerOf2())
-      continue;
+bool Vectorizer::runOnEquivalenceClass(const EqClassKey &EqClassKey,
+                                       ArrayRef<Instruction *> EqClass) {
+  bool Changed = false;
 
-    // Rebase C2's offsets into C1's coordinate space prior to merging and
-    // merge C2 into C1 by appending all elements of C2 to C1, then erase C2
-    // from ContiguousSubChains.
-    rebaseChain(C2, *DeltaOpt);
-    C1.insert(C1.end(), C2.begin(), C2.end());
-    ContiguousSubChains.erase(ContiguousSubChains.begin() + I);
-
-    // Normalize the value operand/result type of each instruction in C1 to
-    // C1CastTy.
-    Type *C1CastTy =
-        Type::getIntNTy(C1ElemTy->getContext(), DL.getTypeSizeInBits(C1ElemTy));
-    normalizeChainToType(C1, C1CastTy);
-  }
+  LLVM_DEBUG({
+    dbgs() << "LSV: Running on equivalence class of size " << EqClass.size()
+           << " keyed on " << EqClassKey << ":\n";
+    for (Instruction *I : EqClass)
+      dbgs() << "  " << *I << "\n";
+  });
 
-  for (auto &C : ContiguousSubChains) {
-    if (C.size() <= 1)
-      continue;
-    for (auto &AlignedSubChain : splitChainByAlignment(C))
-      Changed |= vectorizeChain(AlignedSubChain);
-  }
+  std::vector<Chain> Chains = gatherChains(EqClass);
+  LLVM_DEBUG(dbgs() << "LSV: Got " << Chains.size()
+                    << " nontrivial chains.\n";);
+  for (Chain &C : Chains)
+    Changed |= runOnChain(C);
+  return Changed;
+}
 
-  // Erase all instructions scheduled for deletion in this pseudo-BB.
-  for (Instruction *I : ToErase) {
-    auto *PtrOperand = getLoadStorePointerOperand(I);
-    if (I->use_empty())
-      I->eraseFromParent();
-    RecursivelyDeleteTriviallyDeadInstructions(PtrOperand);
-  }
-  ToErase.clear();
+bool Vectorizer::runOnChain(Chain &C) {
+  LLVM_DEBUG({
+    dbgs() << "LSV: Running on chain with " << C.size() << " instructions:\n";
+    dumpChain(C);
+  });
 
+  // Split up the chain into increasingly smaller chains, until we can finally
+  // vectorize the chains.
+  //
+  // (Don't be scared by the depth of the loop nest here.  These operations are
+  // all at worst O(n lg n) in the number of instructions, and splitting chains
+  // doesn't change the number of instrs.  So the whole loop nest is O(n lg n).)
+  bool Changed = false;
+  for (auto &C : splitChainByMayAliasInstrs(C))
+    for (auto &C : splitChainByContiguity(C))
+      for (auto &C : splitChainByAlignment(C))
+        Changed |= vectorizeChain(C);
   return Changed;
 }
 
@@ -756,7 +583,7 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
         LLVM_DEBUG(
             dbgs() << "LSV: Found intervening may-alias instrs; cannot merge "
                    << *ChainIt->Inst << " into " << *ChainBegin->Inst << "\n");
-        if (!NewChain.empty()) {
+        if (NewChain.size() > 1) {
           LLVM_DEBUG({
             dbgs() << "LSV: got nontrivial chain without aliasing instrs:\n";
             dumpChain(NewChain);
@@ -768,7 +595,7 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
         NewChain = SmallVector<ChainElem, 1>({*ChainIt});
       }
     }
-    if (!NewChain.empty()) {
+    if (NewChain.size() > 1) {
       LLVM_DEBUG({
         dbgs() << "LSV: got ...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/170381