[llvm] [MergeICmps] Merge adjacent comparisons to constants (PR #133817)

Philipp Rados via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 31 16:01:37 PDT 2025


https://github.com/PhilippRados created https://github.com/llvm/llvm-project/pull/133817

This pull request aims to fix #117853.

### General idea
It extends the existing `MergeICmps` pass to not only merge comparisons like: `a.a == b.a && a.b == b.b` but also comparisons with arbitrary constants such as `a.a == 245 && a.b == -1`.

### Changes
Since the original pass only worked under the assumption that a single comparison could happen per basic block this had to be altered to allow multiple comparisons in a single basic block. This is because constant comparisons get flattened into a single block using a `select` instruction before the `MergeICmps` pass is run.

### How it works
Whenever a matching comparison is encountered it adds it to the cmp-chain. Then when all comparisons have been found it 
sorts them meaning all const-comparisons are followed by all bce-comparisons (depends on which was first). Then it goes through all comparisons in the chain and merges the ones adjacent to each other. Comparisons inside a flattened select-block can only be merged if every comparison in that block is merged (this is a rather defensive approach).

The const merging works by building a global constant struct for every merge. This needs to be a global-const in order to be constant folded by the expand-memcmp pass where it is then also removed.

### Example
A single comparison chain can now be made up of both BCE-comparisons (two offsets to the same base) and const-comparisons (contiguous offsets to the same base with a constant).
This means that the expression:
```
struct S {
    int a;
    unsigned char b;
    unsigned char c;
    uint_16_t d;
    int e;
    int f;
    int g;
};
bool cmp(S& a, S& b) {
    return a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
}
```
can be turned into:
```
// simplified representation, for exact implementation see llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@memcmp_const_op = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
define @cmp(...) {
BB1:
   memcmp(a,b,6);
   br BB2, BB_end
BB2:
   offset = gep ptr %a, 8
   memcmp(offset, memcmp_const_op, 12)
   br BB_end
...
}
```

### Issues in the current implementation, waiting for feedback on this
- This implementation currently doesn't handle single select block like the one mentioned in the issue above. The only question I have is when to launch it since for this all instructions would have to be traversed again which is a slowdown for all functions running -O3. I think a good tradeoff would be to only check if the branch condition is a select and then start the optimization from there instead of checking every single instruction.
- This pattern is pretty strict, it only works when the function returns a bool and the parameters have to dereferenceable. These restrictions basically render this optimization obsolete in C. Otherwise this implementation could also merge vectors/arrays.
- Some testcases fail alive2 when the memory is accessed differently. I think these are known issues though (https://github.com/llvm/llvm-project/issues/62459 and https://github.com/llvm/llvm-project/issues/51187) 



>From 77578ba1f64f633776a2bcc49765f9fd9a08f1ff Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Thu, 30 Jan 2025 17:25:07 +0100
Subject: [PATCH 01/23] [MergeICmps] First implementation of merging
 comparisons that compare adjacent memory blocks with constants

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 142 ++++++++++++++++++++++
 1 file changed, 142 insertions(+)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 4291f3aee0cd1..d9b2456d40b8e 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -52,6 +52,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/ValueMap.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
 #include "llvm/Transforms/Scalar.h"
@@ -842,6 +843,119 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   return CmpChain.simplify(TLI, AA, DTU);
 }
 
+void removeUnusedOperands(SmallVector<Value *, 8> toCheck) {
+  while (!toCheck.empty()) {
+    Value *V = toCheck.pop_back_val();
+    
+    // Only process instructions (skip constants, globals, etc.)
+    if (Instruction *OpI = dyn_cast<Instruction>(V)) {
+      if (OpI->use_empty()) {
+        toCheck.append(OpI->operands().begin(),OpI->operands().end());
+        OpI->eraseFromParent();
+      }
+    }
+  }
+}
+
+struct CommonCmp {
+  ICmpInst* CmpI;
+  unsigned Offset;
+};
+
+void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
+  auto First = AdjacentMem[0];
+  IRBuilder<> Builder(SelectI);
+  LLVMContext &Context = First.CmpI->getContext();
+  const auto &DL = First.CmpI->getDataLayout();
+
+  auto *CmpType = First.CmpI->getOperand(0)->getType();
+  auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
+  auto ArraySize = DL.getTypeAllocSize(ArrayType);
+  // TODO: check for alignment
+  auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+
+  std::vector<Constant*> Constants;
+  for (const auto& CI : AdjacentMem) {
+    // safe since we checked before that second operand is constantint
+    Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
+  }
+  auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+  Builder.CreateStore(ArrayConstant,ArrayAlloca);
+
+  // TODO: adjust base-ptr to point to start of load-offset
+  // TODO: also have to handle !=
+  Value *const MemCmpCall = emitMemCmp(
+      Base, ArrayAlloca,
+      ConstantInt::get(Type::getInt64Ty(Context), ArraySize),
+      Builder, DL, &TLI);
+  auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
+
+  BasicBlock::iterator ii(SelectI);
+  SmallVector<Value *, 8> deadOperands(SelectI->operands());
+  ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
+  removeUnusedOperands(deadOperands);
+
+  dbgs() << "DONE merging";
+}
+
+// Combines Icmp instructions if they operate on adjacent memory
+// TODO: check that base address' memory isn't modified between comparisons
+bool tryMergeIcmps(SelectInst* SelectI, Value* Base, std::vector<CommonCmp> &Icmps,const TargetLibraryInfo &TLI) {
+  assert(!Icmps.empty() && "if entry exists then has at least one cmp");
+  bool hasMerged = false;
+
+  std::vector<CommonCmp> AdjacentMem{Icmps[0]};
+  auto Prev = Icmps[0];
+  for (auto& Cmp : llvm::drop_begin(Icmps)) {
+    if (Cmp.Offset == (Prev.Offset + 1)) {
+      AdjacentMem.emplace_back(Cmp);
+    } else if (AdjacentMem.size() > 1) {
+      mergeAdjacentComparisons(SelectI,Base, AdjacentMem,TLI);
+      hasMerged = true;
+      AdjacentMem.clear();
+      AdjacentMem.emplace_back(Cmp);
+    }
+    Prev = Cmp;
+  }
+
+  if (AdjacentMem.size() > 1) {
+    mergeAdjacentComparisons(SelectI, Base, AdjacentMem,TLI);
+    hasMerged = true;
+  }
+
+  return hasMerged;
+}
+
+// Given an operand from a load, return the original base pointer and
+// if operand is GEP also it's offset from base pointer
+// but only if offset is known at compile time
+std::tuple<Value*, std::optional<unsigned>> findPtrAndOffset(Value* V, unsigned Offset) {
+  if (const auto& GepI = dyn_cast<GetElementPtrInst>(V)){
+    if (const auto& Index = dyn_cast<ConstantInt>(GepI->getOperand(1))) {
+      if (Index->getBitWidth() <= 64) {
+        return findPtrAndOffset(GepI->getPointerOperand(), Offset + Index->getZExtValue());
+      }
+    }
+    return {V,std::nullopt};
+  }
+
+  return {V,Offset};
+}
+
+    
+std::optional<Value*>  constantCmp(ICmpInst* CmpI,std::vector<CommonCmp>* cmps) {
+  auto const& LoadI = dyn_cast<LoadInst>(CmpI->getOperand(0));
+  auto const& ConstantI = dyn_cast<ConstantInt>(CmpI->getOperand(1));
+  if (!LoadI || !ConstantI)
+    return std::nullopt;
+
+  auto [BasePtr, Offset] = findPtrAndOffset(LoadI->getOperand(0),0);
+  if (Offset)
+    cmps->emplace_back(CommonCmp {CmpI, *Offset});
+
+  return BasePtr;
+}
+
 static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
                     const TargetTransformInfo &TTI, AliasAnalysis &AA,
                     DominatorTree *DT) {
@@ -867,6 +981,34 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
       MadeChange |= processPhi(*Phi, TLI, AA, DTU);
   }
 
+  // merge cmps that load from same address and compare with constant
+  for (BasicBlock &BB : F) {
+    // from bottom up to find the root result of all comparisons
+    for (Instruction &I : llvm::reverse(BB)) {
+      if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
+        auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+        auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+        auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+        if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
+          continue;
+
+        Value* BasePtr;
+        std::vector<CommonCmp> cmps;
+        if (auto bp = constantCmp(Cmp1,&cmps))
+          BasePtr = *bp;
+        if (auto bp = constantCmp(Cmp2,&cmps)) {
+          if (BasePtr != bp) continue;
+        }
+
+        MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
+        break;
+      }
+    }
+  }
+
+  F.dump();
+
   return MadeChange;
 }
 

>From 23042079fd84c1edbea65d4fdd09e9194fa1dfb4 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Fri, 14 Feb 2025 20:43:55 +0100
Subject: [PATCH 02/23] [MergeICmps] SelectCmp checkpoint

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 204 ++++++++++++++++++----
 1 file changed, 167 insertions(+), 37 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index d9b2456d40b8e..2194c4a925162 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -50,6 +50,7 @@
 #include "llvm/Analysis/TargetTransformInfo.h"
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalValue.h"
 #include "llvm/IR/Instruction.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/ValueMap.h"
@@ -176,24 +177,58 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
   return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
 }
 
+struct Comparison {
+  int SizeBits;
+  const ICmpInst *CmpI;
+
+  using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
+
+  Comparison(int SizeBits, const ICmpInst *CmpI);
+  virtual ~Comparison() {};
+  virtual LoadOperands getLoads() = 0;
+};
+
+// A comparison between a BCE atom and an integer constant.
+// If these BCE atoms are chained and access adjacent memory then they too can be merged, e.g.
+// ```
+// int *p = ...;
+// int a = p[0];
+// int b = p[1];
+// return a == 100 && b == 2;
+// ```
+struct BCEConstCmp : public Comparison {
+  BCEAtom Lhs;
+  Constant* Const;
+
+  BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
+      : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+  
+  Comparison::LoadOperands getLoads() override {
+    return std::make_pair(&Lhs,std::nullopt);
+  }
+};
+
 // A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
 // top.
 // Note: the terminology is misleading: the comparison is symmetric, so there
 // is no real {l/r}hs. What we want though is to have the same base on the
 // left (resp. right), so that we can detect consecutive loads. To ensure this
 // we put the smallest atom on the left.
-struct BCECmp {
+struct BCECmp : public Comparison {
   BCEAtom Lhs;
   BCEAtom Rhs;
-  int SizeBits;
-  const ICmpInst *CmpI;
 
   BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
-      : Lhs(std::move(L)), Rhs(std::move(R)), SizeBits(SizeBits), CmpI(CmpI) {
+      : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R))  {
     if (Rhs < Lhs) std::swap(Rhs, Lhs);
   }
+
+  Comparison::LoadOperands getLoads() override {
+    return std::make_pair(&Lhs,&Rhs);
+  }
 };
 
+
 // A basic block with a comparison between two BCE atoms.
 // The block might do extra work besides the atom comparison, in which case
 // doesOtherWork() returns true. Under some conditions, the block can be
@@ -203,12 +238,12 @@ class BCECmpBlock {
  public:
   typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
 
-  BCECmpBlock(BCECmp Cmp, BasicBlock *BB, InstructionSet BlockInsts)
-      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+  BCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
 
-  const BCEAtom &Lhs() const { return Cmp.Lhs; }
-  const BCEAtom &Rhs() const { return Cmp.Rhs; }
-  int SizeBits() const { return Cmp.SizeBits; }
+  // const BCEAtom &Lhs() const { return Cmp.Lhs; }
+  // const BCEAtom &Rhs() const { return Cmp.Rhs; }
+  // int SizeBits() const { return Cmp.SizeBits; }
 
   // Returns true if the block does other works besides comparison.
   bool doesOtherWork() const;
@@ -238,7 +273,7 @@ class BCECmpBlock {
   unsigned OrigOrder = 0;
 
 private:
-  BCECmp Cmp;
+  std::vector<Comparison*> Cmps;
 };
 
 bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
@@ -301,9 +336,50 @@ bool BCECmpBlock::doesOtherWork() const {
   return false;
 }
 
+class IntraCmpChain {
+  std::vector<Comparison*> CmpChain;
+
+public:
+  IntraCmpChain(Comparison* C) : CmpChain{C} {}
+  IntraCmpChain concat(const IntraCmpChain OtherChain) {
+    CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+    return *this;
+  }
+  BCECmpBlock::InstructionSet getAllInsts() {
+    BCECmpBlock::InstructionSet Insts;
+    for (auto Cmp : CmpChain) {
+      // TODO: this mess should be able to get OOP'd
+      if (auto* BceCmpI = dyn_cast<BCECmp>(&Cmp)) {
+        Insts.insert(BceCmpI->Lhs.LoadI);
+        Insts.insert(BceCmpI->Rhs.LoadI);
+        Insts.insert(BceCmpI->CmpI);
+        if (BceCmpI->Lhs.GEP)
+          Insts.insert(BceCmpI->Lhs.GEP);
+        if (BceCmpI->Rhs.GEP)
+          Insts.insert(BceCmpI->Rhs.GEP);
+      } else if (auto* BceConstCmpI = dyn_cast<BCEConstCmp>(&Cmp)) {
+        Insts.insert(BceCmpI->Lhs.LoadI);
+        Insts.insert(BceCmpI->CmpI);
+        if (BceCmpI->Lhs.GEP)
+          Insts.insert(BceCmpI->Lhs.GEP);
+      }
+    }
+    return Insts;
+  }
+  std::vector<Comparison*> getCmpChain() const {
+    return CmpChain;
+  }
+
+  // Determines if all comparisons in the comparison chain are all either `BCECmp` or all `BCEConstCmp`
+  bool isAllSameCmp() {
+    return llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCECmp>(c);}) || 
+           llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCEConstCmp>(c);});
+  }
+};
+
 // Visit the given comparison. If this is a comparison between two valid
-// BCE atoms, returns the comparison.
-std::optional<BCECmp> visitICmp(const ICmpInst *const CmpI,
+// BCE atoms, or between a BCE atom and a constant, returns the comparison.
+std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
                                 const ICmpInst::Predicate ExpectedPredicate,
                                 BaseIdentifier &BaseId) {
   // The comparison can only be used once:
@@ -320,17 +396,63 @@ std::optional<BCECmp> visitICmp(const ICmpInst *const CmpI,
   LLVM_DEBUG(dbgs() << "cmp "
                     << (ExpectedPredicate == ICmpInst::ICMP_EQ ? "eq" : "ne")
                     << "\n");
+  // First operand is always a load
   auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId);
   if (!Lhs.BaseId)
     return std::nullopt;
-  auto Rhs = visitICmpLoadOperand(CmpI->getOperand(1), BaseId);
+
+  // Second operand can either be load if doing compare between two BCE atoms or 
+  // can be constant if comparing adjacent memory to constant
+  auto* RhsOperand = CmpI->getOperand(1);
+  const auto &DL = CmpI->getDataLayout();
+  int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
+
+  if (auto const& Const = dyn_cast<Constant>(RhsOperand))
+    return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
+
+  auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId);
   if (!Rhs.BaseId)
     return std::nullopt;
-  const auto &DL = CmpI->getDataLayout();
-  return BCECmp(std::move(Lhs), std::move(Rhs),
-                DL.getTypeSizeInBits(CmpI->getOperand(0)->getType()), CmpI);
+  return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
 }
 
+// Chain of comparisons inside a single basic block connected using `select` nodes.
+std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+
+std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+                                  ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
+  if (!SelectI->hasOneUse()) {
+    LLVM_DEBUG(dbgs() << "select has several uses\n");
+    return std::nullopt;
+  }
+  auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+  auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+  auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+  if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+    return std::nullopt;
+
+  auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
+  if (!Lhs)
+    return std::nullopt;
+  auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
+  if (!Rhs)
+    return std::nullopt;
+
+  return Lhs->concat(*Rhs);
+}
+
+std::optional<IntraCmpChain> visitComparison(Value *Cond,
+            ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
+  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+    return visitICmp(CmpI, ExpectedPredicate, BaseId);
+  if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+    return visitSelect(SelectI, ExpectedPredicate, BaseId);
+
+  return std::nullopt;
+}
+
+
 // Visit the given comparison block. If this is a comparison between two valid
 // BCE atoms, returns the comparison.
 std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
@@ -367,22 +489,21 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
         FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
   }
 
-  auto *CmpI = dyn_cast<ICmpInst>(Cond);
-  if (!CmpI)
+  std::optional<IntraCmpChain> CmpChain = visitComparison(Cond, ExpectedPredicate, BaseId);
+  if (!CmpChain)
     return std::nullopt;
-  LLVM_DEBUG(dbgs() << "icmp\n");
 
-  std::optional<BCECmp> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
-  if (!Result)
+  if (!CmpChain->isAllSameCmp())
     return std::nullopt;
 
-  BCECmpBlock::InstructionSet BlockInsts(
-      {Result->Lhs.LoadI, Result->Rhs.LoadI, Result->CmpI, BranchI});
-  if (Result->Lhs.GEP)
-    BlockInsts.insert(Result->Lhs.GEP);
-  if (Result->Rhs.GEP)
-    BlockInsts.insert(Result->Rhs.GEP);
-  return BCECmpBlock(std::move(*Result), Block, BlockInsts);
+  std::vector<Comparison*> SortedCmpChain(CmpChain->getCmpChain());
+  llvm::sort(SortedCmpChain, [](Comparison* l, Comparison* r) {
+    return l->getLoads() < r->getLoads();
+  });
+
+  BCECmpBlock::InstructionSet BlockInsts(CmpChain->getAllInsts());
+  BlockInsts.insert(BranchI);
+  return BCECmpBlock(SortedCmpChain, Block, BlockInsts);
 }
 
 static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
@@ -832,6 +953,7 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
 
   const auto Blocks =
       getOrderedBlocks(Phi, LastBlock, Phi.getNumIncomingValues());
+
   if (Blocks.empty()) return false;
   BCECmpChain CmpChain(Blocks, Phi, AA);
 
@@ -863,16 +985,18 @@ struct CommonCmp {
 };
 
 void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
-  auto First = AdjacentMem[0];
   IRBuilder<> Builder(SelectI);
-  LLVMContext &Context = First.CmpI->getContext();
-  const auto &DL = First.CmpI->getDataLayout();
+  auto* M = SelectI->getModule();
+  LLVMContext &Context = SelectI->getContext();
+  const auto &DL = SelectI->getDataLayout();
 
+  auto First = AdjacentMem[0];
   auto *CmpType = First.CmpI->getOperand(0)->getType();
   auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
-  auto ArraySize = DL.getTypeAllocSize(ArrayType);
+  auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
   // TODO: check for alignment
-  auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+  // auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+  // Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
 
   std::vector<Constant*> Constants;
   for (const auto& CI : AdjacentMem) {
@@ -880,14 +1004,20 @@ void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<Common
     Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
   }
   auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
-  Builder.CreateStore(ArrayConstant,ArrayAlloca);
+M->getOrInsertGlobal("globalKey", ArrayType);
+    GlobalVariable* gVar = M->getNamedGlobal("globalKey");
+    gVar->setLinkage(GlobalValue::PrivateLinkage);
+    gVar->setInitializer(ArrayConstant);
+    gVar->setConstant(true);
+  // Builder.CreateStore(ArrayConstant,ArrayAlloca);
 
   // TODO: adjust base-ptr to point to start of load-offset
   // TODO: also have to handle !=
   Value *const MemCmpCall = emitMemCmp(
-      Base, ArrayAlloca,
-      ConstantInt::get(Type::getInt64Ty(Context), ArraySize),
+      Base, gVar,
+      ArraySize,
       Builder, DL, &TLI);
+  // Builder.CreateLifetimeEnd(ArrayAlloca,ArraySize);
   auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
 
   BasicBlock::iterator ii(SelectI);
@@ -981,7 +1111,7 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
       MadeChange |= processPhi(*Phi, TLI, AA, DTU);
   }
 
-  // merge cmps that load from same address and compare with constant
+  // Try to merge remaining select nodes that haven't been merged from phi-node merging
   for (BasicBlock &BB : F) {
     // from bottom up to find the root result of all comparisons
     for (Instruction &I : llvm::reverse(BB)) {

>From 845569139580b005f3446895ec915ff3e1d5c25a Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Sun, 16 Feb 2025 20:53:58 +0100
Subject: [PATCH 03/23] [MergeICmps] Implemented merge with constant across
 basic blocks

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 270 +++++++++++-------
 .../Transforms/MergeICmps/X86/const-cmp-bb.ll |  37 +++
 2 files changed, 202 insertions(+), 105 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2194c4a925162..93ecfc5d780e4 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -183,9 +183,19 @@ struct Comparison {
 
   using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
 
-  Comparison(int SizeBits, const ICmpInst *CmpI);
-  virtual ~Comparison() {};
+  Comparison(int SizeBits, const ICmpInst *CmpI) : SizeBits(SizeBits), CmpI(CmpI) {}
+  virtual ~Comparison() = default;
   virtual LoadOperands getLoads() = 0;
+  virtual std::optional<Constant*> getConstant() = 0;
+  virtual bool isConstCmp()const = 0;
+  bool operator<(Comparison &O) {
+    auto [Lhs,Rhs] = getLoads();
+    auto [OtherLhs,OtherRhs] = O.getLoads();
+
+    if (!isConstCmp()) 
+      return std::tie(*Lhs,**Rhs) < std::tie(*OtherLhs,**OtherRhs);
+    return *Lhs < *OtherLhs;
+  }
 };
 
 // A comparison between a BCE atom and an integer constant.
@@ -206,6 +216,12 @@ struct BCEConstCmp : public Comparison {
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,std::nullopt);
   }
+  std::optional<Constant*> getConstant() override {
+    return Const;
+  }
+  bool isConstCmp() const override {
+    return true;
+  }
 };
 
 // A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -226,6 +242,12 @@ struct BCECmp : public Comparison {
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,&Rhs);
   }
+  std::optional<Constant*> getConstant() override {
+    return std::nullopt;
+  }
+  bool isConstCmp() const override {
+    return false;
+  }
 };
 
 
@@ -238,12 +260,25 @@ class BCECmpBlock {
  public:
   typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
 
-  BCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
-      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
+  BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
+      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+
+  const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
+  const std::optional<BCEAtom*> Rhs() const { return Cmp->getLoads().second; }
+  std::optional<Constant*> getConstant() const {
+    return Cmp->getConstant();
+  }
+  bool isConstCmp() const {
+    return Cmp->isConstCmp();
+  }
+  Comparison* getCmp() const {
+    return Cmp;
+  }
+  bool operator<(const BCECmpBlock &O) const {
+    return *Cmp < *O.getCmp();
+  }
 
-  // const BCEAtom &Lhs() const { return Cmp.Lhs; }
-  // const BCEAtom &Rhs() const { return Cmp.Rhs; }
-  // int SizeBits() const { return Cmp.SizeBits; }
+  int SizeBits() const { return Cmp->SizeBits; }
 
   // Returns true if the block does other works besides comparison.
   bool doesOtherWork() const;
@@ -273,7 +308,7 @@ class BCECmpBlock {
   unsigned OrigOrder = 0;
 
 private:
-  std::vector<Comparison*> Cmps;
+  Comparison* Cmp;
 };
 
 bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
@@ -287,7 +322,8 @@ bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
     };
-    if (MayClobber(Cmp.Lhs.LoadI) || MayClobber(Cmp.Rhs.LoadI))
+    auto [Lhs,Rhs] = Cmp->getLoads();
+    if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
       return false;
   }
   // Make sure this instruction does not use any of the BCE cmp block
@@ -345,36 +381,9 @@ class IntraCmpChain {
     CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
     return *this;
   }
-  BCECmpBlock::InstructionSet getAllInsts() {
-    BCECmpBlock::InstructionSet Insts;
-    for (auto Cmp : CmpChain) {
-      // TODO: this mess should be able to get OOP'd
-      if (auto* BceCmpI = dyn_cast<BCECmp>(&Cmp)) {
-        Insts.insert(BceCmpI->Lhs.LoadI);
-        Insts.insert(BceCmpI->Rhs.LoadI);
-        Insts.insert(BceCmpI->CmpI);
-        if (BceCmpI->Lhs.GEP)
-          Insts.insert(BceCmpI->Lhs.GEP);
-        if (BceCmpI->Rhs.GEP)
-          Insts.insert(BceCmpI->Rhs.GEP);
-      } else if (auto* BceConstCmpI = dyn_cast<BCEConstCmp>(&Cmp)) {
-        Insts.insert(BceCmpI->Lhs.LoadI);
-        Insts.insert(BceCmpI->CmpI);
-        if (BceCmpI->Lhs.GEP)
-          Insts.insert(BceCmpI->Lhs.GEP);
-      }
-    }
-    return Insts;
-  }
   std::vector<Comparison*> getCmpChain() const {
     return CmpChain;
   }
-
-  // Determines if all comparisons in the comparison chain are all either `BCECmp` or all `BCEConstCmp`
-  bool isAllSameCmp() {
-    return llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCECmp>(c);}) || 
-           llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCEConstCmp>(c);});
-  }
 };
 
 // Visit the given comparison. If this is a comparison between two valid
@@ -489,32 +498,39 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
         FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
   }
 
-  std::optional<IntraCmpChain> CmpChain = visitComparison(Cond, ExpectedPredicate, BaseId);
-  if (!CmpChain)
+  auto* CmpI = dyn_cast<ICmpInst>(Cond);
+  if (!CmpI)
     return std::nullopt;
+  LLVM_DEBUG(dbgs() << "icmp\n");
 
-  if (!CmpChain->isAllSameCmp())
+  std::optional<Comparison*> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
+  if (!Result)
     return std::nullopt;
 
-  std::vector<Comparison*> SortedCmpChain(CmpChain->getCmpChain());
-  llvm::sort(SortedCmpChain, [](Comparison* l, Comparison* r) {
-    return l->getLoads() < r->getLoads();
-  });
-
-  BCECmpBlock::InstructionSet BlockInsts(CmpChain->getAllInsts());
+  BCECmpBlock::InstructionSet BlockInsts;
+  auto [Lhs,Rhs] = (*Result)->getLoads();
+  BlockInsts.insert(Lhs->LoadI);
+  if (Lhs->GEP)
+    BlockInsts.insert(Lhs->GEP);
+  if (Rhs) {
+    BlockInsts.insert((*Rhs)->LoadI);
+    if ((*Rhs)->GEP)
+      BlockInsts.insert((*Rhs)->GEP);
+  }
+  BlockInsts.insert((*Result)->CmpI);
   BlockInsts.insert(BranchI);
-  return BCECmpBlock(SortedCmpChain, Block, BlockInsts);
+  return BCECmpBlock(std::move(*Result), Block, BlockInsts);
 }
 
 static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
                                 BCECmpBlock &&Comparison) {
-  LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-                    << "': Found cmp of " << Comparison.SizeBits()
-                    << " bits between " << Comparison.Lhs().BaseId << " + "
-                    << Comparison.Lhs().Offset << " and "
-                    << Comparison.Rhs().BaseId << " + "
-                    << Comparison.Rhs().Offset << "\n");
-  LLVM_DEBUG(dbgs() << "\n");
+  // LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+  //                   << "': Found cmp of " << Comparison.SizeBits()
+  //                   << " bits between " << Comparison.Lhs().BaseId << " + "
+  //                   << Comparison.Lhs().Offset << " and "
+  //                   << Comparison.Rhs().BaseId << " + "
+  //                   << Comparison.Rhs().Offset << "\n");
+  // LLVM_DEBUG(dbgs() << "\n");
   Comparison.OrigOrder = Comparisons.size();
   Comparisons.push_back(std::move(Comparison));
 }
@@ -544,10 +560,16 @@ class BCECmpChain {
 };
 
 static bool areContiguous(const BCECmpBlock &First, const BCECmpBlock &Second) {
-  return First.Lhs().BaseId == Second.Lhs().BaseId &&
-         First.Rhs().BaseId == Second.Rhs().BaseId &&
-         First.Lhs().Offset + First.SizeBits() / 8 == Second.Lhs().Offset &&
-         First.Rhs().Offset + First.SizeBits() / 8 == Second.Rhs().Offset;
+  bool HasContigLhs = First.Lhs()->BaseId == Second.Lhs()->BaseId &&
+                      First.Lhs()->Offset + First.SizeBits() / 8 == Second.Lhs()->Offset;
+  bool HasContigRhs = true;
+  auto FirstRhs = First.Rhs();
+  auto SecondRhs = Second.Rhs();
+  if (FirstRhs && SecondRhs)
+    HasContigRhs = (*FirstRhs)->BaseId == (*SecondRhs)->BaseId &&
+                   (*FirstRhs)->Offset + First.SizeBits() / 8 == (*SecondRhs)->Offset;
+
+  return HasContigLhs && HasContigRhs;
 }
 
 static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
@@ -566,8 +588,7 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
   // Sort to detect continuous offsets.
   llvm::sort(Blocks,
              [](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
-               return std::tie(LhsBlock.Lhs(), LhsBlock.Rhs()) <
-                      std::tie(RhsBlock.Lhs(), RhsBlock.Rhs());
+              return LhsBlock < RhsBlock;
              });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
@@ -592,6 +613,26 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
   return MergedBlocks;
 }
 
+// A valid comparison chain means that all comparisons are of the same kind (either all `BCECmp` or all `BCEConstCmp`).
+// Additionally if all comparisons are `BCEConstCmp` they all need to have the same type to build a valid LLVM constant array.
+// TODO: Could even build a memory chain of different types using seperate allocations
+bool isValidCmpChain(std::vector<BCECmpBlock> Comparisons) {
+  BCECmpBlock* PrevCmp = nullptr;
+  for (BCECmpBlock BceCmpBlock : Comparisons) {
+    if (PrevCmp) {
+      if (PrevCmp->isConstCmp() != BceCmpBlock.isConstCmp())
+        return false;
+      if (PrevCmp->isConstCmp()){
+        if (PrevCmp->Lhs()->LoadI->getType() != BceCmpBlock.Lhs()->LoadI->getType())
+          return false;
+      }
+    }
+
+    PrevCmp = &BceCmpBlock;
+  }
+  return true;
+}
+
 BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
                          AliasAnalysis &AA)
     : Phi_(Phi) {
@@ -670,6 +711,12 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
     LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
     return;
   }
+
+  if(!isValidCmpChain(Comparisons)) {
+    LLVM_DEBUG(dbgs() << "invalid comparison chain");
+    return;
+  }
+
   EntryBlock_ = Comparisons[0].BB;
   MergedBlocks_ = mergeBlocks(std::move(Comparisons));
 }
@@ -738,14 +785,34 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
   IRBuilder<> Builder(BB);
   // Add the GEPs from the first BCECmpBlock.
   Value *Lhs, *Rhs;
-  if (FirstCmp.Lhs().GEP)
-    Lhs = Builder.Insert(FirstCmp.Lhs().GEP->clone());
+
+  // memcmp expects a 'size_t' argument and returns 'int'.
+  unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+  unsigned IntBits = TLI.getIntSize();
+  const unsigned TotalSizeBits = std::accumulate(
+      Comparisons.begin(), Comparisons.end(), 0u,
+      [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
+
+  if (FirstCmp.Lhs()->GEP)
+    Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
   else
-    Lhs = FirstCmp.Lhs().LoadI->getPointerOperand();
-  if (FirstCmp.Rhs().GEP)
-    Rhs = Builder.Insert(FirstCmp.Rhs().GEP->clone());
+    Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
+  // Build constant-array to compare to
+  if (FirstCmp.isConstCmp()) {
+    auto* ArrayType = ArrayType::get(FirstCmp.Lhs()->LoadI->getType(),TotalSizeBits / 8);
+    auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+    std::vector<Constant*> Constants;
+    for (const auto& BceBlock : Comparisons) {
+      // safe since we checked before that second operand is constant-int
+      Constants.emplace_back(*BceBlock.getConstant());
+    }
+    auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+    Builder.CreateStore(ArrayConstant,ArrayAlloca);
+    Rhs = ArrayAlloca;
+  } else if ((*FirstCmp.Rhs())->GEP)
+    Rhs = Builder.Insert((*FirstCmp.Rhs())->GEP->clone());
   else
-    Rhs = FirstCmp.Rhs().LoadI->getPointerOperand();
+    Rhs = (*FirstCmp.Rhs())->LoadI->getPointerOperand();
 
   Value *IsEqual = nullptr;
   LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
@@ -764,21 +831,17 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
   if (Comparisons.size() == 1) {
     LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
     // Use clone to keep the metadata
-    Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs().LoadI->clone());
-    Instruction *const RhsLoad = Builder.Insert(FirstCmp.Rhs().LoadI->clone());
+    Instruction *const LhsLoad = Builder.Insert((*FirstCmp.Lhs()).LoadI->clone());
     LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
-    RhsLoad->replaceUsesOfWith(RhsLoad->getOperand(0), Rhs);
     // There are no blocks to merge, just do the comparison.
-    IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
+    if (FirstCmp.isConstCmp())
+      IsEqual = Builder.CreateICmpEQ(LhsLoad, *FirstCmp.getConstant());
+    else {
+      Instruction *const RhsLoad = Builder.Insert((*FirstCmp.Rhs())->LoadI->clone());
+      RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
+      IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
+    }
   } else {
-    const unsigned TotalSizeBits = std::accumulate(
-        Comparisons.begin(), Comparisons.end(), 0u,
-        [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
-
-    // memcmp expects a 'size_t' argument and returns 'int'.
-    unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
-    unsigned IntBits = TLI.getIntSize();
-
     // Create memcmp() == 0.
     const auto &DL = Phi.getDataLayout();
     Value *const MemCmpCall = emitMemCmp(
@@ -995,7 +1058,6 @@ void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<Common
   auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
   auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
   // TODO: check for alignment
-  // auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
   // Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
 
   std::vector<Constant*> Constants;
@@ -1025,7 +1087,7 @@ M->getOrInsertGlobal("globalKey", ArrayType);
   ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
   removeUnusedOperands(deadOperands);
 
-  dbgs() << "DONE merging";
+  // dbgs() << "DONE merging";
 }
 
 // Combines Icmp instructions if they operate on adjacent memory
@@ -1112,32 +1174,30 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
   }
 
   // Try to merge remaining select nodes that haven't been merged from phi-node merging
-  for (BasicBlock &BB : F) {
-    // from bottom up to find the root result of all comparisons
-    for (Instruction &I : llvm::reverse(BB)) {
-      if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
-        auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-        auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-        auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
-        if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
-          continue;
-
-        Value* BasePtr;
-        std::vector<CommonCmp> cmps;
-        if (auto bp = constantCmp(Cmp1,&cmps))
-          BasePtr = *bp;
-        if (auto bp = constantCmp(Cmp2,&cmps)) {
-          if (BasePtr != bp) continue;
-        }
-
-        MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
-        break;
-      }
-    }
-  }
-
-  F.dump();
+  // for (BasicBlock &BB : F) {
+  //   // from bottom up to find the root result of all comparisons
+  //   for (Instruction &I : llvm::reverse(BB)) {
+  //     if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
+  //       auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+  //       auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+  //       auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+  //       if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
+  //         continue;
+
+  //       Value* BasePtr;
+  //       std::vector<CommonCmp> cmps;
+  //       if (auto bp = constantCmp(Cmp1,&cmps))
+  //         BasePtr = *bp;
+  //       if (auto bp = constantCmp(Cmp2,&cmps)) {
+  //         if (BasePtr != bp) continue;
+  //       }
+
+  //       MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
+  //       break;
+  //     }
+  //   }
+  // }
 
   return MadeChange;
 }
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
new file mode 100644
index 0000000000000..92c1d187aa08f
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --force-update
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+
+; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
+
+define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
+; CHECK-LABEL: @test(
+; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca [3 x i8], align 1
+; CHECK-NEXT:    store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br label [[IF_END5:%.*]]
+; CHECK:       land.end:
+; CHECK-NEXT:    ret i1 [[TMP1]]
+;
+entry:
+  %0 = load i8, ptr %p, align 1
+  %cmp = icmp eq i8 %0, -1
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %cmp5 = icmp eq i8 %1, -56
+  br i1 %cmp5, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true
+  %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 2
+  %2 = load i8, ptr %arrayidx2, align 1
+  %cmp8 = icmp eq i8 %2, -66
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true, %entry
+  %3 = phi i1 [ false, %land.lhs.true ], [ false, %entry ], [ %cmp8, %land.rhs ]
+  ret i1 %3
+}

>From 79b1565a8546b8272944b0da996e76ac096a3c3b Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Thu, 20 Feb 2025 18:54:07 +0100
Subject: [PATCH 04/23] [MergeICmps] Use RTTI; Can merge mixed comparison
 chains

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 319 +++++++++---------
 .../Transforms/MergeICmps/X86/const-cmp-bb.ll |   2 +-
 .../MergeICmps/X86/mixed-comparisons.ll       |  71 ++++
 3 files changed, 236 insertions(+), 156 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 93ecfc5d780e4..df00fff3194c2 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -177,25 +177,31 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
   return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
 }
 
+typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
+
 struct Comparison {
+public:
+  enum CompKind {
+    CK_ConstCmp,
+    CK_BceCmp,
+  };
+private:
+  const CompKind Kind;
+public:
   int SizeBits;
   const ICmpInst *CmpI;
 
   using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
 
-  Comparison(int SizeBits, const ICmpInst *CmpI) : SizeBits(SizeBits), CmpI(CmpI) {}
+  Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
+        : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
+  CompKind getKind() const { return Kind; }
+
   virtual ~Comparison() = default;
   virtual LoadOperands getLoads() = 0;
-  virtual std::optional<Constant*> getConstant() = 0;
-  virtual bool isConstCmp()const = 0;
-  bool operator<(Comparison &O) {
-    auto [Lhs,Rhs] = getLoads();
-    auto [OtherLhs,OtherRhs] = O.getLoads();
-
-    if (!isConstCmp()) 
-      return std::tie(*Lhs,**Rhs) < std::tie(*OtherLhs,**OtherRhs);
-    return *Lhs < *OtherLhs;
-  }
+  virtual InstructionSet getInsts() = 0;
+  bool areContiguous(const Comparison& Other) const;
+  bool operator<(const Comparison &Other) const;
 };
 
 // A comparison between a BCE atom and an integer constant.
@@ -211,17 +217,21 @@ struct BCEConstCmp : public Comparison {
   Constant* Const;
 
   BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
-      : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+      : Comparison(CK_ConstCmp, SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+  static bool classof(const Comparison* C) {
+    return C->getKind() == CK_ConstCmp;
+  }
   
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,std::nullopt);
   }
-  std::optional<Constant*> getConstant() override {
-    return Const;
-  }
-  bool isConstCmp() const override {
-    return true;
+  InstructionSet getInsts() override {
+    InstructionSet BlockInsts{CmpI,Lhs.LoadI};
+    if (Lhs.GEP)
+      BlockInsts.insert(Lhs.GEP);
+    return BlockInsts;
   }
+
 };
 
 // A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -235,21 +245,55 @@ struct BCECmp : public Comparison {
   BCEAtom Rhs;
 
   BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
-      : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R))  {
+      : Comparison(CK_BceCmp, SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R))  {
     if (Rhs < Lhs) std::swap(Rhs, Lhs);
   }
+  static bool classof(const Comparison* C) {
+    return C->getKind() == CK_BceCmp;
+  }
 
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,&Rhs);
   }
-  std::optional<Constant*> getConstant() override {
-    return std::nullopt;
-  }
-  bool isConstCmp() const override {
-    return false;
+  InstructionSet getInsts() override {
+    InstructionSet BlockInsts{CmpI, Lhs.LoadI, Rhs.LoadI};
+    if (Lhs.GEP)
+      BlockInsts.insert(Lhs.GEP);
+    if (Rhs.GEP)
+      BlockInsts.insert(Rhs.GEP);
+    return BlockInsts;
   }
 };
 
+bool Comparison::areContiguous(const Comparison& Other) const {
+  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+  if (isa<BCEConstCmp>(this)) {
+    const auto& First = cast<BCEConstCmp>(this);
+    const auto& Second = cast<BCEConstCmp>(Other);
+
+    return First->Lhs.BaseId == Second.Lhs.BaseId &&
+           First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset;
+  }
+  const auto& First = cast<BCECmp>(this);
+  const auto& Second = cast<BCECmp>(Other);
+
+  return First->Lhs.BaseId == Second.Lhs.BaseId &&
+         First->Rhs.BaseId == Second.Rhs.BaseId &&
+         First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset &&
+         First->Rhs.Offset + First->SizeBits / 8 == Second.Rhs.Offset;
+}
+bool Comparison::operator<(const Comparison& Other) const {
+  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+  if (isa<BCEConstCmp>(this)) {
+    const auto& First = cast<BCEConstCmp>(this);
+    const auto& Second = cast<BCEConstCmp>(Other);
+    return First->Lhs < Second.Lhs;
+  }
+  const auto& First = cast<BCECmp>(this);
+  const auto& Second = cast<BCECmp>(Other);
+  return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
+}
+
 
 // A basic block with a comparison between two BCE atoms.
 // The block might do extra work besides the atom comparison, in which case
@@ -258,27 +302,13 @@ struct BCECmp : public Comparison {
 // (see canSplit()).
 class BCECmpBlock {
  public:
-  typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
-
   BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
       : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
-  const std::optional<BCEAtom*> Rhs() const { return Cmp->getLoads().second; }
-  std::optional<Constant*> getConstant() const {
-    return Cmp->getConstant();
-  }
-  bool isConstCmp() const {
-    return Cmp->isConstCmp();
-  }
-  Comparison* getCmp() const {
+  const Comparison* getCmp() const {
     return Cmp;
   }
-  bool operator<(const BCECmpBlock &O) const {
-    return *Cmp < *O.getCmp();
-  }
-
-  int SizeBits() const { return Cmp->SizeBits; }
 
   // Returns true if the block does other works besides comparison.
   bool doesOtherWork() const;
@@ -426,41 +456,40 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
 }
 
 // Chain of comparisons inside a single basic block connected using `select` nodes.
-std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
-
-std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
-                                  ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
-  if (!SelectI->hasOneUse()) {
-    LLVM_DEBUG(dbgs() << "select has several uses\n");
-    return std::nullopt;
-  }
-  auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-  auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-  auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
-  if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
-    return std::nullopt;
-
-  auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
-  if (!Lhs)
-    return std::nullopt;
-  auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
-  if (!Rhs)
-    return std::nullopt;
-
-  return Lhs->concat(*Rhs);
-}
-
-std::optional<IntraCmpChain> visitComparison(Value *Cond,
-            ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
-  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
-    return visitICmp(CmpI, ExpectedPredicate, BaseId);
-  if (auto *SelectI = dyn_cast<SelectInst>(Cond))
-    return visitSelect(SelectI, ExpectedPredicate, BaseId);
-
-  return std::nullopt;
-}
-
+// std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+
+// std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+//                                   ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
+//   if (!SelectI->hasOneUse()) {
+//     LLVM_DEBUG(dbgs() << "select has several uses\n");
+//     return std::nullopt;
+//   }
+//   auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+//   auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+//   auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+//   if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+//     return std::nullopt;
+
+//   auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
+//   if (!Lhs)
+//     return std::nullopt;
+//   auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
+//   if (!Rhs)
+//     return std::nullopt;
+
+//   return Lhs->concat(*Rhs);
+// }
+
+// std::optional<IntraCmpChain> visitComparison(Value *Cond,
+//             ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
+//   if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+//     return visitICmp(CmpI, ExpectedPredicate, BaseId);
+//   if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+//     return visitSelect(SelectI, ExpectedPredicate, BaseId);
+
+//   return std::nullopt;
+// }
 
 // Visit the given comparison block. If this is a comparison between two valid
 // BCE atoms, returns the comparison.
@@ -507,30 +536,29 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
   if (!Result)
     return std::nullopt;
 
-  BCECmpBlock::InstructionSet BlockInsts;
-  auto [Lhs,Rhs] = (*Result)->getLoads();
-  BlockInsts.insert(Lhs->LoadI);
-  if (Lhs->GEP)
-    BlockInsts.insert(Lhs->GEP);
-  if (Rhs) {
-    BlockInsts.insert((*Rhs)->LoadI);
-    if ((*Rhs)->GEP)
-      BlockInsts.insert((*Rhs)->GEP);
-  }
-  BlockInsts.insert((*Result)->CmpI);
+  InstructionSet BlockInsts((*Result)->getInsts());
   BlockInsts.insert(BranchI);
   return BCECmpBlock(std::move(*Result), Block, BlockInsts);
 }
 
+// void emitDebugInfo(BCECmpBlock &&Comparison) {
+//   LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+//                     << "': Found constant-cmp of " << Comparison.getCmp().SizeBits
+//                     << " bits including " << Comparison.getCmp()->Lhs.BaseId << " + "
+//                     << Comparison.getCmp().Lhs.Offset << "\n");
+
+//   LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+//                     << "': Found cmp of " << Comparison.getCmp().SizeBits
+//                     << " bits between " << Comparison.getCmp().Lhs.BaseId << " + "
+//                     << Comparison.Lhs.Offset << " and "
+//                     << Comparison.Rhs.BaseId << " + "
+//                     << Comparison.Rhs.Offset << "\n");
+//   LLVM_DEBUG(dbgs() << "\n");
+// }
+
 static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
                                 BCECmpBlock &&Comparison) {
-  // LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-  //                   << "': Found cmp of " << Comparison.SizeBits()
-  //                   << " bits between " << Comparison.Lhs().BaseId << " + "
-  //                   << Comparison.Lhs().Offset << " and "
-  //                   << Comparison.Rhs().BaseId << " + "
-  //                   << Comparison.Rhs().Offset << "\n");
-  // LLVM_DEBUG(dbgs() << "\n");
+  // emitDebugInfo(Comparison);
   Comparison.OrigOrder = Comparisons.size();
   Comparisons.push_back(std::move(Comparison));
 }
@@ -554,24 +582,12 @@ class BCECmpChain {
 private:
   PHINode &Phi_;
   // The list of all blocks in the chain, grouped by contiguity.
+  // First all BCE comparisons then all BCE-Const comparisons.
   std::vector<ContiguousBlocks> MergedBlocks_;
   // The original entry block (before sorting);
   BasicBlock *EntryBlock_;
 };
 
-static bool areContiguous(const BCECmpBlock &First, const BCECmpBlock &Second) {
-  bool HasContigLhs = First.Lhs()->BaseId == Second.Lhs()->BaseId &&
-                      First.Lhs()->Offset + First.SizeBits() / 8 == Second.Lhs()->Offset;
-  bool HasContigRhs = true;
-  auto FirstRhs = First.Rhs();
-  auto SecondRhs = Second.Rhs();
-  if (FirstRhs && SecondRhs)
-    HasContigRhs = (*FirstRhs)->BaseId == (*SecondRhs)->BaseId &&
-                   (*FirstRhs)->Offset + First.SizeBits() / 8 == (*SecondRhs)->Offset;
-
-  return HasContigLhs && HasContigRhs;
-}
-
 static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
   unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
   for (const BCECmpBlock &Block : Blocks)
@@ -579,7 +595,7 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
   return MinOrigOrder;
 }
 
-/// Given a chain of comparison blocks, groups the blocks into contiguous
+/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
 /// ranges that can be merged together into a single comparison.
 static std::vector<BCECmpChain::ContiguousBlocks>
 mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
@@ -588,12 +604,12 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
   // Sort to detect continuous offsets.
   llvm::sort(Blocks,
              [](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
-              return LhsBlock < RhsBlock;
+              return *LhsBlock.getCmp() < *RhsBlock.getCmp();
              });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
   for (BCECmpBlock &Block : Blocks) {
-    if (!LastMergedBlock || !areContiguous(LastMergedBlock->back(), Block)) {
+    if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
       MergedBlocks.emplace_back();
       LastMergedBlock = &MergedBlocks.back();
     } else {
@@ -613,26 +629,6 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
   return MergedBlocks;
 }
 
-// A valid comparison chain means that all comparisons are of the same kind (either all `BCECmp` or all `BCEConstCmp`).
-// Additionally if all comparisons are `BCEConstCmp` they all need to have the same type to build a valid LLVM constant array.
-// TODO: Could even build a memory chain of different types using seperate allocations
-bool isValidCmpChain(std::vector<BCECmpBlock> Comparisons) {
-  BCECmpBlock* PrevCmp = nullptr;
-  for (BCECmpBlock BceCmpBlock : Comparisons) {
-    if (PrevCmp) {
-      if (PrevCmp->isConstCmp() != BceCmpBlock.isConstCmp())
-        return false;
-      if (PrevCmp->isConstCmp()){
-        if (PrevCmp->Lhs()->LoadI->getType() != BceCmpBlock.Lhs()->LoadI->getType())
-          return false;
-      }
-    }
-
-    PrevCmp = &BceCmpBlock;
-  }
-  return true;
-}
-
 BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
                          AliasAnalysis &AA)
     : Phi_(Phi) {
@@ -705,20 +701,28 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
     }
     enqueueBlock(Comparisons, std::move(*Comparison));
   }
-
+  
   // It is possible we have no suitable comparison to merge.
   if (Comparisons.empty()) {
     LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
     return;
   }
 
-  if(!isValidCmpChain(Comparisons)) {
-    LLVM_DEBUG(dbgs() << "invalid comparison chain");
-    return;
-  }
-
   EntryBlock_ = Comparisons[0].BB;
-  MergedBlocks_ = mergeBlocks(std::move(Comparisons));
+
+  std::vector<BCECmpBlock> ConstComparisons, BceComparisons;
+  auto isConstCmp = [](BCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
+  // TODO: too many copies here
+  std::partition_copy(Comparisons.begin(), Comparisons.end(), 
+                      std::back_inserter(ConstComparisons), 
+                      std::back_inserter(BceComparisons),
+                      isConstCmp);
+
+  auto MergedConstCmpBlocks = mergeBlocks(std::move(ConstComparisons));
+  auto MergedBCECmpBlocks = mergeBlocks(std::move(BceComparisons));
+
+  MergedBlocks_.insert(MergedBlocks_.end(),MergedBCECmpBlocks.begin(),MergedBCECmpBlocks.end());
+  MergedBlocks_.insert(MergedBlocks_.end(),MergedConstCmpBlocks.begin(),MergedConstCmpBlocks.end());
 }
 
 namespace {
@@ -786,34 +790,27 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
   // Add the GEPs from the first BCECmpBlock.
   Value *Lhs, *Rhs;
 
-  // memcmp expects a 'size_t' argument and returns 'int'.
-  unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
-  unsigned IntBits = TLI.getIntSize();
-  const unsigned TotalSizeBits = std::accumulate(
-      Comparisons.begin(), Comparisons.end(), 0u,
-      [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
-
   if (FirstCmp.Lhs()->GEP)
     Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
   else
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
   // Build constant-array to compare to
-  if (FirstCmp.isConstCmp()) {
-    auto* ArrayType = ArrayType::get(FirstCmp.Lhs()->LoadI->getType(),TotalSizeBits / 8);
+  if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
+    auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
     auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
     std::vector<Constant*> Constants;
     for (const auto& BceBlock : Comparisons) {
-      // safe since we checked before that second operand is constant-int
-      Constants.emplace_back(*BceBlock.getConstant());
+      Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
     }
     auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
     Builder.CreateStore(ArrayConstant,ArrayAlloca);
     Rhs = ArrayAlloca;
-  } else if ((*FirstCmp.Rhs())->GEP)
-    Rhs = Builder.Insert((*FirstCmp.Rhs())->GEP->clone());
-  else
-    Rhs = (*FirstCmp.Rhs())->LoadI->getPointerOperand();
-
+  } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+    if (FirstBceCmp->Rhs.GEP)
+      Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
+    else
+      Rhs = FirstBceCmp->Rhs.LoadI->getPointerOperand();
+  }
   Value *IsEqual = nullptr;
   LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
                     << BB->getName() << "\n");
@@ -831,17 +828,25 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
   if (Comparisons.size() == 1) {
     LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
     // Use clone to keep the metadata
-    Instruction *const LhsLoad = Builder.Insert((*FirstCmp.Lhs()).LoadI->clone());
+    Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs()->LoadI->clone());
     LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
     // There are no blocks to merge, just do the comparison.
-    if (FirstCmp.isConstCmp())
-      IsEqual = Builder.CreateICmpEQ(LhsLoad, *FirstCmp.getConstant());
-    else {
-      Instruction *const RhsLoad = Builder.Insert((*FirstCmp.Rhs())->LoadI->clone());
+    if (auto* ConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp()))
+      IsEqual = Builder.CreateICmpEQ(LhsLoad, ConstCmp->Const);
+    else if (const auto& BceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+      Instruction *const RhsLoad = Builder.Insert(BceCmp->Rhs.LoadI->clone());
       RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
       IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
     }
   } else {
+    // memcmp expects a 'size_t' argument and returns 'int'.
+    unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+    unsigned IntBits = TLI.getIntSize();
+    const unsigned TotalSizeBits = std::accumulate(
+        Comparisons.begin(), Comparisons.end(), 0u,
+        [](int Size, const BCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+
+
     // Create memcmp() == 0.
     const auto &DL = Phi.getDataLayout();
     Value *const MemCmpCall = emitMemCmp(
@@ -1153,6 +1158,10 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
                     DominatorTree *DT) {
   LLVM_DEBUG(dbgs() << "MergeICmpsLegacyPass: " << F.getName() << "\n");
 
+
+  dbgs() << "after target\n";
+  dbgs() << TTI.enableMemCmpExpansion(F.hasOptSize(), true);
+
   // We only try merging comparisons if the target wants to expand memcmp later.
   // The rationale is to avoid turning small chains into memcmp calls.
   if (!TTI.enableMemCmpExpansion(F.hasOptSize(), true))
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 92c1d187aa08f..24cbceae9173d 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -10,7 +10,7 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
 ; CHECK-NEXT:    store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT:    br label [[IF_END5:%.*]]
+; CHECK-NEXT:    br label [[LAND_END5:%.*]]
 ; CHECK:       land.end:
 ; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
new file mode 100644
index 0000000000000..150a0300de947
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -0,0 +1,71 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+%S = type { i32, i1, i1, i16, i32, i32, i32 }
+
+; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
+; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
+
+define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed(
+; This is the classic BCE comparison block
+; CHECK:   "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; This is the new BCE to constant comparison block
+; CHECK:  "entry+land.rhs+land.lhs.true8":
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca [3 x i32], align 4
+; CHECK-NEXT:    store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:    [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[ENTRY_LAND_RHS]] ], [ false, [[LAND_LHS_TRUE10:%.*]] ]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+entry:
+  %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %0 = load i32, ptr %e, align 4
+  %cmp = icmp eq i32 %0, 255
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr %a, align 4
+  %2 = load i32, ptr %b, align 4
+  %cmp3 = icmp eq i32 %1, %2
+  br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4:                                   ; preds = %land.lhs.true
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+  %3 = load i8, ptr %c, align 1
+  %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+  %4 = load i8, ptr %c5, align 1
+  %cmp7 = icmp eq i8 %3, %4
+  br i1 %cmp7, label %land.lhs.true8, label %land.end
+
+land.lhs.true8:                                   ; preds = %land.lhs.true4
+  %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+  %5 = load i32, ptr %g, align 4
+  %cmp9 = icmp eq i32 %5, 100
+  br i1 %cmp9, label %land.lhs.true10, label %land.end
+
+land.lhs.true10:                                  ; preds = %land.lhs.true8
+  %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %6 = load i8, ptr %b11, align 4
+  %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %7 = load i8, ptr %b13, align 4
+  %cmp15 = icmp eq i8 %6, %7
+  br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true10
+  %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+  %8 = load i32, ptr %f, align 4
+  %cmp16 = icmp eq i32 %8, 200
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true8, %land.lhs.true4, %land.lhs.true, %entry
+  %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true8 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+  ret i1 %9
+}

>From fdc482fecf42018596ee99f6b4e0b339d32bfbc9 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 26 Feb 2025 20:14:33 +0100
Subject: [PATCH 05/23] [MergeIcmps] Supports basic blocks using select insts

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 567 +++++++-----------
 .../Transforms/MergeICmps/X86/const-cmp-bb.ll |   2 +-
 .../MergeICmps/X86/many-const-cmp-select.ll   |  69 +++
 3 files changed, 300 insertions(+), 338 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index df00fff3194c2..4456fbfb9a60a 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -294,46 +294,88 @@ bool Comparison::operator<(const Comparison& Other) const {
   return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
 }
 
+// Represents multiple comparisons inside of a single basic block.
+// This happens if multiple basic blocks have previously been merged into a single using a select node.
+class IntraCmpChain {
+  std::vector<Comparison*> CmpChain;
+
+public:
+  IntraCmpChain(Comparison* C) : CmpChain{C} {}
+  IntraCmpChain combine(const IntraCmpChain OtherChain) {
+    CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+    return *this;
+  }
+  std::vector<Comparison*> getCmpChain() const {
+    return CmpChain;
+  }
+};
+
+
+// A basic block that contains one or more comparisons
+class MultBCECmpBlock {
+ public:
+  MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
+
+  // // Returns true if each comparison in this basic block is being merged.
+  // // Necessary because otherwise would leave basic block in invalid state.
+  // bool hasAllCmpsMerged() const;
+
+  // Returns true if the block does other works besides comparison.
+  bool doesOtherWork() const;
+
+  std::vector<Comparison*> getCmps() {
+    return Cmps;
+  }
+
+  // // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
+  // // instructions in the block.
+  // bool canSplit(AliasAnalysis &AA) const;
+
+  // // Return true if this all the relevant instructions in the BCE-cmp-block can
+  // // be sunk below this instruction. By doing this, we know we can separate the
+  // // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
+  // // block.
+  // bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
+
+  // The basic block where this comparison happens.
+  BasicBlock *BB;
+  // Instructions relating to the BCECmp and branch.
+  InstructionSet BlockInsts;
+  // The block requires splitting.
+  bool RequireSplit = false;
+  // Original order of this block in the chain.
+  unsigned OrigOrder = 0;
+
+private:
+  std::vector<Comparison*> Cmps;
+};
 
-// A basic block with a comparison between two BCE atoms.
+// A basic block with single a comparison between two BCE atoms.
 // The block might do extra work besides the atom comparison, in which case
 // doesOtherWork() returns true. Under some conditions, the block can be
 // split into the atom comparison part and the "other work" part
 // (see canSplit()).
-class BCECmpBlock {
+class SingleBCECmpBlock {
  public:
-  BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
-      : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+  SingleBCECmpBlock(MultBCECmpBlock M, unsigned i) {
+    BB = M.BB;
+    Cmp = M.getCmps()[i];
+    OrigOrder = M.OrigOrder;
+  }
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
   const Comparison* getCmp() const {
     return Cmp;
   }
 
-  // Returns true if the block does other works besides comparison.
-  bool doesOtherWork() const;
-
-  // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
-  // instructions in the block.
-  bool canSplit(AliasAnalysis &AA) const;
-
-  // Return true if this all the relevant instructions in the BCE-cmp-block can
-  // be sunk below this instruction. By doing this, we know we can separate the
-  // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
-  // block.
-  bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
-
   // We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
   // instructions. Split the old block and move all non-BCE-cmp-insts into the
   // new parent block.
-  void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
+  // void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
 
   // The basic block where this comparison happens.
   BasicBlock *BB;
-  // Instructions relating to the BCECmp and branch.
-  InstructionSet BlockInsts;
-  // The block requires splitting.
-  bool RequireSplit = false;
   // Original order of this block in the chain.
   unsigned OrigOrder = 0;
 
@@ -341,56 +383,58 @@ class BCECmpBlock {
   Comparison* Cmp;
 };
 
-bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
-                                    AliasAnalysis &AA) const {
-  // If this instruction may clobber the loads and is in middle of the BCE cmp
-  // block instructions, then bail for now.
-  if (Inst->mayWriteToMemory()) {
-    auto MayClobber = [&](LoadInst *LI) {
-      // If a potentially clobbering instruction comes before the load,
-      // we can still safely sink the load.
-      return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
-             isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
-    };
-    auto [Lhs,Rhs] = Cmp->getLoads();
-    if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
-      return false;
-  }
-  // Make sure this instruction does not use any of the BCE cmp block
-  // instructions as operand.
-  return llvm::none_of(Inst->operands(), [&](const Value *Op) {
-    const Instruction *OpI = dyn_cast<Instruction>(Op);
-    return OpI && BlockInsts.contains(OpI);
-  });
-}
+// bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
+//                                     AliasAnalysis &AA) const {
+//   // If this instruction may clobber the loads and is in middle of the BCE cmp
+//   // block instructions, then bail for now.
+//   if (Inst->mayWriteToMemory()) {
+//     auto MayClobber = [&](LoadInst *LI) {
+//       // If a potentially clobbering instruction comes before the load,
+//       // we can still safely sink the load.
+//       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
+//              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
+//     };
+//     for (auto* Cmp : Cmps.getCmpChain()) {
+//       auto [Lhs,Rhs] = Cmp->getLoads();
+//       if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
+//         return false;
+//     }
+//   }
+//   // Make sure this instruction does not use any of the BCE cmp block
+//   // instructions as operand.
+//   return llvm::none_of(Inst->operands(), [&](const Value *Op) {
+//     const Instruction *OpI = dyn_cast<Instruction>(Op);
+//     return OpI && BlockInsts.contains(OpI);
+//   });
+// }
 
-void BCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
-  llvm::SmallVector<Instruction *, 4> OtherInsts;
-  for (Instruction &Inst : *BB) {
-    if (BlockInsts.count(&Inst))
-      continue;
-    assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
-    // This is a non-BCE-cmp-block instruction. And it can be separated
-    // from the BCE-cmp-block instruction.
-    OtherInsts.push_back(&Inst);
-  }
+// void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
+//   llvm::SmallVector<Instruction *, 4> OtherInsts;
+//   for (Instruction &Inst : *BB) {
+//     if (BlockInsts.count(&Inst))
+//       continue;
+//     assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+//     // This is a non-BCE-cmp-block instruction. And it can be separated
+//     // from the BCE-cmp-block instruction.
+//     OtherInsts.push_back(&Inst);
+//   }
 
-  // Do the actual spliting.
-  for (Instruction *Inst : reverse(OtherInsts))
-    Inst->moveBeforePreserving(*NewParent, NewParent->begin());
-}
+//   // Do the actual splitting.
+//   for (Instruction *Inst : reverse(OtherInsts))
+//     Inst->moveBeforePreserving(*NewParent, NewParent->begin());
+// }
 
-bool BCECmpBlock::canSplit(AliasAnalysis &AA) const {
-  for (Instruction &Inst : *BB) {
-    if (!BlockInsts.count(&Inst)) {
-      if (!canSinkBCECmpInst(&Inst, AA))
-        return false;
-    }
-  }
-  return true;
-}
+// bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
+//   for (Instruction &Inst : *BB) {
+//     if (!BlockInsts.count(&Inst)) {
+//       if (!canSinkBCECmpInst(&Inst, AA))
+//         return false;
+//     }
+//   }
+//   return true;
+// }
 
-bool BCECmpBlock::doesOtherWork() const {
+bool MultBCECmpBlock::doesOtherWork() const {
   // TODO(courbet): Can we allow some other things ? This is very conservative.
   // We might be able to get away with anything does not have any side
   // effects outside of the basic block.
@@ -402,25 +446,11 @@ bool BCECmpBlock::doesOtherWork() const {
   return false;
 }
 
-class IntraCmpChain {
-  std::vector<Comparison*> CmpChain;
-
-public:
-  IntraCmpChain(Comparison* C) : CmpChain{C} {}
-  IntraCmpChain concat(const IntraCmpChain OtherChain) {
-    CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
-    return *this;
-  }
-  std::vector<Comparison*> getCmpChain() const {
-    return CmpChain;
-  }
-};
-
 // Visit the given comparison. If this is a comparison between two valid
 // BCE atoms, or between a BCE atom and a constant, returns the comparison.
 std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
                                 const ICmpInst::Predicate ExpectedPredicate,
-                                BaseIdentifier &BaseId) {
+                                BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
   // The comparison can only be used once:
   //  - For intermediate blocks, as a branch condition.
   //  - For the final block, as an incoming value for the Phi.
@@ -456,44 +486,46 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
 }
 
 // Chain of comparisons inside a single basic block connected using `select` nodes.
-// std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&, InstructionSet*);
 
-// std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
-//                                   ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
-//   if (!SelectI->hasOneUse()) {
-//     LLVM_DEBUG(dbgs() << "select has several uses\n");
-//     return std::nullopt;
-//   }
-//   auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-//   auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-//   auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+                                  ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId, InstructionSet *BlockInsts) {
+  if (!SelectI->hasOneUse()) {
+    LLVM_DEBUG(dbgs() << "select has several uses\n");
+    return std::nullopt;
+  }
+  auto* Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+  auto* Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
+  auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+  auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
 
-//   if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
-//     return std::nullopt;
+  if (!(Cmp1 || Sel1) || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+    return std::nullopt;
 
-//   auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
-//   if (!Lhs)
-//     return std::nullopt;
-//   auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
-//   if (!Rhs)
-//     return std::nullopt;
+  auto Lhs = visitComparison(SelectI->getOperand(0),ExpectedPredicate,BaseId,BlockInsts);
+  if (!Lhs)
+    return std::nullopt;
+  auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId,BlockInsts);
+  if (!Rhs)
+    return std::nullopt;
 
-//   return Lhs->concat(*Rhs);
-// }
+  BlockInsts->insert(SelectI);
+  return Lhs->combine(std::move(*Rhs));
+}
 
-// std::optional<IntraCmpChain> visitComparison(Value *Cond,
-//             ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
-//   if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
-//     return visitICmp(CmpI, ExpectedPredicate, BaseId);
-//   if (auto *SelectI = dyn_cast<SelectInst>(Cond))
-//     return visitSelect(SelectI, ExpectedPredicate, BaseId);
+std::optional<IntraCmpChain> visitComparison(Value *Cond,
+            ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+    return visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+  if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+    return visitSelect(SelectI, ExpectedPredicate, BaseId, BlockInsts);
 
-//   return std::nullopt;
-// }
+  return std::nullopt;
+}
 
 // Visit the given comparison block. If this is a comparison between two valid
 // BCE atoms, returns the comparison.
-std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
+std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
                                          BasicBlock *const Block,
                                          const BasicBlock *const PhiBlock,
                                          BaseIdentifier &BaseId) {
@@ -527,18 +559,19 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
         FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
   }
 
-  auto* CmpI = dyn_cast<ICmpInst>(Cond);
-  if (!CmpI)
-    return std::nullopt;
-  LLVM_DEBUG(dbgs() << "icmp\n");
-
-  std::optional<Comparison*> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
-  if (!Result)
+  InstructionSet BlockInsts;
+  std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
+  if (!Result) {
+    dbgs() << "invalid result\n";
     return std::nullopt;
+  }
 
-  InstructionSet BlockInsts((*Result)->getInsts());
+  for (auto* Cmp : Result->getCmpChain()) {
+    auto CmpInsts = Cmp->getInsts();
+    BlockInsts.insert(CmpInsts.begin(), CmpInsts.end());
+  }
   BlockInsts.insert(BranchI);
-  return BCECmpBlock(std::move(*Result), Block, BlockInsts);
+  return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
 }
 
 // void emitDebugInfo(BCECmpBlock &&Comparison) {
@@ -556,17 +589,18 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
 //   LLVM_DEBUG(dbgs() << "\n");
 // }
 
-static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
-                                BCECmpBlock &&Comparison) {
+static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
+                                MultBCECmpBlock &&CmpBlock) {
   // emitDebugInfo(Comparison);
-  Comparison.OrigOrder = Comparisons.size();
-  Comparisons.push_back(std::move(Comparison));
+  CmpBlock.OrigOrder = Comparisons.size();
+  for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++)
+    Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i));
 }
 
 // A chain of comparisons.
 class BCECmpChain {
 public:
-  using ContiguousBlocks = std::vector<BCECmpBlock>;
+  using ContiguousBlocks = std::vector<SingleBCECmpBlock>;
 
   BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
               AliasAnalysis &AA);
@@ -582,7 +616,7 @@ class BCECmpChain {
 private:
   PHINode &Phi_;
   // The list of all blocks in the chain, grouped by contiguity.
-  // First all BCE comparisons then all BCE-Const comparisons.
+  // First all BCE comparisons followed by all BCE-Const comparisons.
   std::vector<ContiguousBlocks> MergedBlocks_;
   // The original entry block (before sorting);
   BasicBlock *EntryBlock_;
@@ -590,7 +624,7 @@ class BCECmpChain {
 
 static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
   unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
-  for (const BCECmpBlock &Block : Blocks)
+  for (const SingleBCECmpBlock &Block : Blocks)
     MinOrigOrder = std::min(MinOrigOrder, Block.OrigOrder);
   return MinOrigOrder;
 }
@@ -598,17 +632,17 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
 /// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
 /// ranges that can be merged together into a single comparison.
 static std::vector<BCECmpChain::ContiguousBlocks>
-mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
+mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
   std::vector<BCECmpChain::ContiguousBlocks> MergedBlocks;
 
   // Sort to detect continuous offsets.
   llvm::sort(Blocks,
-             [](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
+             [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
               return *LhsBlock.getCmp() < *RhsBlock.getCmp();
              });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
-  for (BCECmpBlock &Block : Blocks) {
+  for (SingleBCECmpBlock &Block : Blocks) {
     if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
       MergedBlocks.emplace_back();
       LastMergedBlock = &MergedBlocks.back();
@@ -634,46 +668,46 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
     : Phi_(Phi) {
   assert(!Blocks.empty() && "a chain should have at least one block");
   // Now look inside blocks to check for BCE comparisons.
-  std::vector<BCECmpBlock> Comparisons;
+  std::vector<SingleBCECmpBlock> Comparisons;
   BaseIdentifier BaseId;
   for (BasicBlock *const Block : Blocks) {
     assert(Block && "invalid block");
-    std::optional<BCECmpBlock> Comparison = visitCmpBlock(
+    std::optional<MultBCECmpBlock> CmpBlock = visitCmpBlock(
         Phi.getIncomingValueForBlock(Block), Block, Phi.getParent(), BaseId);
-    if (!Comparison) {
+    if (!CmpBlock) {
       LLVM_DEBUG(dbgs() << "chain with invalid BCECmpBlock, no merge.\n");
       return;
     }
-    if (Comparison->doesOtherWork()) {
-      LLVM_DEBUG(dbgs() << "block '" << Comparison->BB->getName()
+    if (CmpBlock->doesOtherWork()) {
+      LLVM_DEBUG(dbgs() << "block '" << CmpBlock->BB->getName()
                         << "' does extra work besides compare\n");
-      if (Comparisons.empty()) {
-        // This is the initial block in the chain, in case this block does other
-        // work, we can try to split the block and move the irrelevant
-        // instructions to the predecessor.
-        //
-        // If this is not the initial block in the chain, splitting it wont
-        // work.
-        //
-        // As once split, there will still be instructions before the BCE cmp
-        // instructions that do other work in program order, i.e. within the
-        // chain before sorting. Unless we can abort the chain at this point
-        // and start anew.
-        //
-        // NOTE: we only handle blocks a with single predecessor for now.
-        if (Comparison->canSplit(AA)) {
-          LLVM_DEBUG(dbgs()
-                     << "Split initial block '" << Comparison->BB->getName()
-                     << "' that does extra work besides compare\n");
-          Comparison->RequireSplit = true;
-          enqueueBlock(Comparisons, std::move(*Comparison));
-        } else {
-          LLVM_DEBUG(dbgs()
-                     << "ignoring initial block '" << Comparison->BB->getName()
-                     << "' that does extra work besides compare\n");
-        }
-        continue;
-      }
+      // if (Comparisons.empty()) {
+      //   // This is the initial block in the chain, in case this block does other
+      //   // work, we can try to split the block and move the irrelevant
+      //   // instructions to the predecessor.
+      //   //
+      //   // If this is not the initial block in the chain, splitting it wont
+      //   // work.
+      //   //
+      //   // As once split, there will still be instructions before the BCE cmp
+      //   // instructions that do other work in program order, i.e. within the
+      //   // chain before sorting. Unless we can abort the chain at this point
+      //   // and start anew.
+      //   //
+      //   // NOTE: we only handle blocks a with single predecessor for now.
+      //   if (Comparison->canSplit(AA)) {
+      //     LLVM_DEBUG(dbgs()
+      //                << "Split initial block '" << Comparison->BB->getName()
+      //                << "' that does extra work besides compare\n");
+      //     Comparison->RequireSplit = true;
+      //     enqueueBlock(Comparisons, std::move(*Comparison));
+      //   } else {
+      //     LLVM_DEBUG(dbgs()
+      //                << "ignoring initial block '" << Comparison->BB->getName()
+      //                << "' that does extra work besides compare\n");
+      //   }
+      //   continue;
+      // }
       // TODO(courbet): Right now we abort the whole chain. We could be
       // merging only the blocks that don't do other work and resume the
       // chain from there. For example:
@@ -699,7 +733,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
       // We could still merge bb1 and bb2 though.
       return;
     }
-    enqueueBlock(Comparisons, std::move(*Comparison));
+    enqueueBlock(Comparisons, std::move(*CmpBlock));
   }
   
   // It is possible we have no suitable comparison to merge.
@@ -710,8 +744,11 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
 
   EntryBlock_ = Comparisons[0].BB;
 
-  std::vector<BCECmpBlock> ConstComparisons, BceComparisons;
-  auto isConstCmp = [](BCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
+  // TODO: check for contiguous comparisons across all blocks and if all cmps in a
+  // bb are part of contiguous then split that block inato multiple
+
+  std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
+  auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
   // TODO: too many copies here
   std::partition_copy(Comparisons.begin(), Comparisons.end(), 
                       std::back_inserter(ConstComparisons), 
@@ -734,18 +771,18 @@ class MergedBlockName {
   SmallString<16> Scratch;
 
 public:
-  explicit MergedBlockName(ArrayRef<BCECmpBlock> Comparisons)
+  explicit MergedBlockName(ArrayRef<SingleBCECmpBlock> Comparisons)
       : Name(makeName(Comparisons)) {}
   const StringRef Name;
 
 private:
-  StringRef makeName(ArrayRef<BCECmpBlock> Comparisons) {
+  StringRef makeName(ArrayRef<SingleBCECmpBlock> Comparisons) {
     assert(!Comparisons.empty() && "no basic block");
     // Fast path: only one block, or no names at all.
     if (Comparisons.size() == 1)
       return Comparisons[0].BB->getName();
     const int size = std::accumulate(Comparisons.begin(), Comparisons.end(), 0,
-                                     [](int i, const BCECmpBlock &Cmp) {
+                                     [](int i, const SingleBCECmpBlock &Cmp) {
                                        return i + Cmp.BB->getName().size();
                                      });
     if (size == 0)
@@ -773,14 +810,14 @@ class MergedBlockName {
 } // namespace
 
 // Merges the given contiguous comparison blocks into one memcmp block.
-static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
+static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
                                     BasicBlock *const InsertBefore,
                                     BasicBlock *const NextCmpBlock,
                                     PHINode &Phi, const TargetLibraryInfo &TLI,
                                     AliasAnalysis &AA, DomTreeUpdater &DTU) {
   assert(!Comparisons.empty() && "merging zero comparisons");
   LLVMContext &Context = NextCmpBlock->getContext();
-  const BCECmpBlock &FirstCmp = Comparisons[0];
+  const SingleBCECmpBlock &FirstCmp = Comparisons[0];
 
   // Create a new cmp block before next cmp block.
   BasicBlock *const BB =
@@ -796,15 +833,17 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
   // Build constant-array to compare to
   if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
-    auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
-    auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
-    std::vector<Constant*> Constants;
-    for (const auto& BceBlock : Comparisons) {
-      Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+    if (Comparisons.size() > 1) {
+      auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
+      auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+      std::vector<Constant*> Constants;
+      for (const auto& BceBlock : Comparisons) {
+        Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+      }
+      auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+      Builder.CreateStore(ArrayConstant,ArrayAlloca);
+      Rhs = ArrayAlloca;
     }
-    auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
-    Builder.CreateStore(ArrayConstant,ArrayAlloca);
-    Rhs = ArrayAlloca;
   } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
     if (FirstBceCmp->Rhs.GEP)
       Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
@@ -818,12 +857,12 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
   // If there is one block that requires splitting, we do it now, i.e.
   // just before we know we will collapse the chain. The instructions
   // can be executed before any of the instructions in the chain.
-  const auto ToSplit = llvm::find_if(
-      Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
-  if (ToSplit != Comparisons.end()) {
-    LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
-    ToSplit->split(BB, AA);
-  }
+  // const auto ToSplit = llvm::find_if(
+  //     Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
+  // if (ToSplit != Comparisons.end()) {
+  //   LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
+  //   ToSplit->split(BB, AA);
+  // }
 
   if (Comparisons.size() == 1) {
     LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
@@ -844,7 +883,7 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
     unsigned IntBits = TLI.getIntSize();
     const unsigned TotalSizeBits = std::accumulate(
         Comparisons.begin(), Comparisons.end(), 0u,
-        [](int Size, const BCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+        [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
 
 
     // Create memcmp() == 0.
@@ -916,7 +955,11 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   // Delete merged blocks. This also removes incoming values in phi.
   SmallVector<BasicBlock *, 16> DeadBlocks;
   for (const auto &Blocks : MergedBlocks_) {
-    for (const BCECmpBlock &Block : Blocks) {
+    for (const SingleBCECmpBlock &Block : Blocks) {
+      // Many single blocks can refer to the same multblock coming from an select instruction
+      // TODO: preferrably use a set instead
+      if (llvm::is_contained(DeadBlocks, Block.BB))
+        continue;
       LLVM_DEBUG(dbgs() << "Deleting merged block " << Block.BB->getName()
                         << "\n");
       DeadBlocks.push_back(Block.BB);
@@ -1033,135 +1076,11 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   return CmpChain.simplify(TLI, AA, DTU);
 }
 
-void removeUnusedOperands(SmallVector<Value *, 8> toCheck) {
-  while (!toCheck.empty()) {
-    Value *V = toCheck.pop_back_val();
-    
-    // Only process instructions (skip constants, globals, etc.)
-    if (Instruction *OpI = dyn_cast<Instruction>(V)) {
-      if (OpI->use_empty()) {
-        toCheck.append(OpI->operands().begin(),OpI->operands().end());
-        OpI->eraseFromParent();
-      }
-    }
-  }
-}
-
-struct CommonCmp {
-  ICmpInst* CmpI;
-  unsigned Offset;
-};
-
-void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
-  IRBuilder<> Builder(SelectI);
-  auto* M = SelectI->getModule();
-  LLVMContext &Context = SelectI->getContext();
-  const auto &DL = SelectI->getDataLayout();
-
-  auto First = AdjacentMem[0];
-  auto *CmpType = First.CmpI->getOperand(0)->getType();
-  auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
-  auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
-  // TODO: check for alignment
-  // Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
-
-  std::vector<Constant*> Constants;
-  for (const auto& CI : AdjacentMem) {
-    // safe since we checked before that second operand is constantint
-    Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
-  }
-  auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
-M->getOrInsertGlobal("globalKey", ArrayType);
-    GlobalVariable* gVar = M->getNamedGlobal("globalKey");
-    gVar->setLinkage(GlobalValue::PrivateLinkage);
-    gVar->setInitializer(ArrayConstant);
-    gVar->setConstant(true);
-  // Builder.CreateStore(ArrayConstant,ArrayAlloca);
-
-  // TODO: adjust base-ptr to point to start of load-offset
-  // TODO: also have to handle !=
-  Value *const MemCmpCall = emitMemCmp(
-      Base, gVar,
-      ArraySize,
-      Builder, DL, &TLI);
-  // Builder.CreateLifetimeEnd(ArrayAlloca,ArraySize);
-  auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
-
-  BasicBlock::iterator ii(SelectI);
-  SmallVector<Value *, 8> deadOperands(SelectI->operands());
-  ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
-  removeUnusedOperands(deadOperands);
-
-  // dbgs() << "DONE merging";
-}
-
-// Combines Icmp instructions if they operate on adjacent memory
-// TODO: check that base address' memory isn't modified between comparisons
-bool tryMergeIcmps(SelectInst* SelectI, Value* Base, std::vector<CommonCmp> &Icmps,const TargetLibraryInfo &TLI) {
-  assert(!Icmps.empty() && "if entry exists then has at least one cmp");
-  bool hasMerged = false;
-
-  std::vector<CommonCmp> AdjacentMem{Icmps[0]};
-  auto Prev = Icmps[0];
-  for (auto& Cmp : llvm::drop_begin(Icmps)) {
-    if (Cmp.Offset == (Prev.Offset + 1)) {
-      AdjacentMem.emplace_back(Cmp);
-    } else if (AdjacentMem.size() > 1) {
-      mergeAdjacentComparisons(SelectI,Base, AdjacentMem,TLI);
-      hasMerged = true;
-      AdjacentMem.clear();
-      AdjacentMem.emplace_back(Cmp);
-    }
-    Prev = Cmp;
-  }
-
-  if (AdjacentMem.size() > 1) {
-    mergeAdjacentComparisons(SelectI, Base, AdjacentMem,TLI);
-    hasMerged = true;
-  }
-
-  return hasMerged;
-}
-
-// Given an operand from a load, return the original base pointer and
-// if operand is GEP also it's offset from base pointer
-// but only if offset is known at compile time
-std::tuple<Value*, std::optional<unsigned>> findPtrAndOffset(Value* V, unsigned Offset) {
-  if (const auto& GepI = dyn_cast<GetElementPtrInst>(V)){
-    if (const auto& Index = dyn_cast<ConstantInt>(GepI->getOperand(1))) {
-      if (Index->getBitWidth() <= 64) {
-        return findPtrAndOffset(GepI->getPointerOperand(), Offset + Index->getZExtValue());
-      }
-    }
-    return {V,std::nullopt};
-  }
-
-  return {V,Offset};
-}
-
-    
-std::optional<Value*>  constantCmp(ICmpInst* CmpI,std::vector<CommonCmp>* cmps) {
-  auto const& LoadI = dyn_cast<LoadInst>(CmpI->getOperand(0));
-  auto const& ConstantI = dyn_cast<ConstantInt>(CmpI->getOperand(1));
-  if (!LoadI || !ConstantI)
-    return std::nullopt;
-
-  auto [BasePtr, Offset] = findPtrAndOffset(LoadI->getOperand(0),0);
-  if (Offset)
-    cmps->emplace_back(CommonCmp {CmpI, *Offset});
-
-  return BasePtr;
-}
-
 static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
                     const TargetTransformInfo &TTI, AliasAnalysis &AA,
                     DominatorTree *DT) {
   LLVM_DEBUG(dbgs() << "MergeICmpsLegacyPass: " << F.getName() << "\n");
 
-
-  dbgs() << "after target\n";
-  dbgs() << TTI.enableMemCmpExpansion(F.hasOptSize(), true);
-
   // We only try merging comparisons if the target wants to expand memcmp later.
   // The rationale is to avoid turning small chains into memcmp calls.
   if (!TTI.enableMemCmpExpansion(F.hasOptSize(), true))
@@ -1182,32 +1101,6 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
       MadeChange |= processPhi(*Phi, TLI, AA, DTU);
   }
 
-  // Try to merge remaining select nodes that haven't been merged from phi-node merging
-  // for (BasicBlock &BB : F) {
-  //   // from bottom up to find the root result of all comparisons
-  //   for (Instruction &I : llvm::reverse(BB)) {
-  //     if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
-  //       auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-  //       auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-  //       auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
-  //       if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
-  //         continue;
-
-  //       Value* BasePtr;
-  //       std::vector<CommonCmp> cmps;
-  //       if (auto bp = constantCmp(Cmp1,&cmps))
-  //         BasePtr = *bp;
-  //       if (auto bp = constantCmp(Cmp2,&cmps)) {
-  //         if (BasePtr != bp) continue;
-  //       }
-
-  //       MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
-  //       break;
-  //     }
-  //   }
-  // }
-
   return MadeChange;
 }
 
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 24cbceae9173d..f05422fd9aea1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -7,7 +7,7 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
 ; CHECK-NEXT:    [[TMP0:%.*]] = alloca [3 x i8], align 1
-; CHECK-NEXT:    store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
+; CHECK-NEXT:    store [3 x i8] c"\FF\C8\BE", ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
new file mode 100644
index 0000000000000..4a91947b0086b
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -0,0 +1,69 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+
+; Can merge contiguous const-comparison basic blocks that include a select statement.
+
+define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
+; CHECK-LABEL: @is_all_ones_many(
+; CHECK-NEXT:  "entry+entry+entry+land.lhs.true11":
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca [4 x i8], align 1
+; CHECK-NEXT:    store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
+; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
+; CHECK:  "land.lhs.true16+land.lhs.true21":
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
+; CHECK-NEXT:    store [2 x i8] c"\02\07", ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
+; CHECK:  land.rhs1:
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
+; CHECK-NEXT:    [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
+; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:    [[TMP8:%.*]] = phi i1 [ [[TMP7]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret i1 [[TMP8]]
+;
+entry:
+  %0 = load i8, ptr %p, align 1
+  %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 2
+  %2 = load i8, ptr %arrayidx2, align 1
+  %cmp = icmp eq i8 %0, -1
+  %cmp5 = icmp eq i8 %1, -56
+  %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+  %cmp9 = icmp eq i8 %2, -66
+  %or.cond28 = select i1 %or.cond, i1 %cmp9, i1 false
+  br i1 %or.cond28, label %land.lhs.true11, label %land.end
+
+land.lhs.true11:                                  ; preds = %entry
+  %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 3
+  %3 = load i8, ptr %arrayidx12, align 1
+  %cmp14 = icmp eq i8 %3, 1
+  br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16:                                  ; preds = %land.lhs.true11
+  %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+  %4 = load i8, ptr %arrayidx17, align 1
+  %cmp19 = icmp eq i8 %4, 2
+  br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21:                                  ; preds = %land.lhs.true16
+  %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 7
+  %5 = load i8, ptr %arrayidx22, align 1
+  %cmp24 = icmp eq i8 %5, 7
+  br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true21
+  %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 9
+  %6 = load i8, ptr %arrayidx26, align 1
+  %cmp28 = icmp eq i8 %6, 9
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+  %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+  ret i1 %7
+}

>From 95ccfccf83ee7631de38e01d24754987edf6c86d Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 26 Feb 2025 20:52:10 +0100
Subject: [PATCH 06/23] [MergeIcmps] Only print merged bb-name once

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 41 +++++++++++--------
 .../MergeICmps/X86/many-const-cmp-select.ll   |  2 +-
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 4456fbfb9a60a..f60c3aabd7547 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -43,6 +43,7 @@
 
 #include "llvm/Transforms/Scalar/MergeICmps.h"
 #include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/UniqueVector.h"
 #include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/GlobalsModRef.h"
 #include "llvm/Analysis/Loads.h"
@@ -358,17 +359,18 @@ class MultBCECmpBlock {
 // (see canSplit()).
 class SingleBCECmpBlock {
  public:
-  SingleBCECmpBlock(MultBCECmpBlock M, unsigned i) {
-    BB = M.BB;
-    Cmp = M.getCmps()[i];
-    OrigOrder = M.OrigOrder;
-  }
+  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I)
+      : BB(M.BB), OrigOrder(M.OrigOrder), Cmp(M.getCmps()[I]) {}
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
   const Comparison* getCmp() const {
     return Cmp;
   }
 
+  bool operator<(const SingleBCECmpBlock &O) const {
+    return *Cmp < *O.Cmp;
+  }
+
   // We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
   // instructions. Split the old block and move all non-BCE-cmp-insts into the
   // new parent block.
@@ -638,7 +640,7 @@ mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
   // Sort to detect continuous offsets.
   llvm::sort(Blocks,
              [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
-              return *LhsBlock.getCmp() < *RhsBlock.getCmp();
+              return LhsBlock < RhsBlock;
              });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
@@ -744,9 +746,6 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
 
   EntryBlock_ = Comparisons[0].BB;
 
-  // TODO: check for contiguous comparisons across all blocks and if all cmps in a
-  // bb are part of contiguous then split that block inato multiple
-
   std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
   auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
   // TODO: too many copies here
@@ -781,9 +780,14 @@ class MergedBlockName {
     // Fast path: only one block, or no names at all.
     if (Comparisons.size() == 1)
       return Comparisons[0].BB->getName();
-    const int size = std::accumulate(Comparisons.begin(), Comparisons.end(), 0,
-                                     [](int i, const SingleBCECmpBlock &Cmp) {
-                                       return i + Cmp.BB->getName().size();
+    // Since multiple comparisons can come from the same basic block
+    // (when using select inst) don't want to repeat same name twice
+    UniqueVector<StringRef> UniqueNames;
+    for (const auto& B : Comparisons)
+      UniqueNames.insert(B.BB->getName());
+    const int size = std::accumulate(UniqueNames.begin(), UniqueNames.end(), 0,
+                                     [](int i, const StringRef &Name) {
+                                       return i + Name.size();
                                      });
     if (size == 0)
       return StringRef("", 0);
@@ -792,16 +796,17 @@ class MergedBlockName {
     Scratch.clear();
     // We'll have `size` bytes for name and `Comparisons.size() - 1` bytes for
     // separators.
-    Scratch.reserve(size + Comparisons.size() - 1);
+    Scratch.reserve(size + UniqueNames.size() - 1);
     const auto append = [this](StringRef str) {
       Scratch.append(str.begin(), str.end());
     };
-    append(Comparisons[0].BB->getName());
-    for (int I = 1, E = Comparisons.size(); I < E; ++I) {
-      const BasicBlock *const BB = Comparisons[I].BB;
-      if (!BB->getName().empty()) {
+    // UniqueVector's index starts at 1
+    append(UniqueNames[1]);
+    for (int I = 2, E = UniqueNames.size(); I <= E; ++I) {
+      StringRef BBName = UniqueNames[I];
+      if (!BBName.empty()) {
         append("+");
-        append(BB->getName());
+        append(BBName);
       }
     }
     return Scratch.str();
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index 4a91947b0086b..ce8de31134e0f 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -4,7 +4,7 @@
 
 define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
 ; CHECK-LABEL: @is_all_ones_many(
-; CHECK-NEXT:  "entry+entry+entry+land.lhs.true11":
+; CHECK-NEXT:  "entry+land.lhs.true11":
 ; CHECK-NEXT:    [[TMP0:%.*]] = alloca [4 x i8], align 1
 ; CHECK-NEXT:    store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)

>From 9c2c3869a9941dc3e27ebc4aad919a3e52e7317e Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 28 Feb 2025 17:10:34 +0100
Subject: [PATCH 07/23] [MergeIcmps] Added tests for merging
 const-/bce-comparisons using select blocks

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     |  11 +-
 .../MergeICmps/X86/mixed-cmp-bb-select.ll     |  67 +++++
 .../MergeICmps/X86/mixed-comparisons.ll       |   2 +-
 .../X86/not-split-unmerged-select.ll          | 204 ++++++++++++++++
 .../MergeICmps/X86/partial-select-merge.ll    | 230 ++++++++++++++++++
 .../Transforms/MergeICmps/X86/single-block.ll |  23 ++
 6 files changed, 533 insertions(+), 4 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/single-block.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index f60c3aabd7547..779e9325a311a 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -1011,6 +1011,13 @@ std::vector<BasicBlock *> getOrderedBlocks(PHINode &Phi,
   return Blocks;
 }
 
+template<typename T>
+bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
+  auto* IncomingValue = Phi.getIncomingValue(I);
+  return !isa<T>(IncomingValue) ||
+    cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
+}
+
 bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
                 DomTreeUpdater &DTU) {
   LLVM_DEBUG(dbgs() << "processPhi()\n");
@@ -1042,9 +1049,7 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
       LLVM_DEBUG(dbgs() << "skip: several non-constant values\n");
       return false;
     }
-    if (!isa<ICmpInst>(Phi.getIncomingValue(I)) ||
-        cast<ICmpInst>(Phi.getIncomingValue(I))->getParent() !=
-            Phi.getIncomingBlock(I)) {
+    if (isInvalidPrevBlock<ICmpInst>(Phi,I) && isInvalidPrevBlock<SelectInst>(Phi,I)) {
       // Non-constant incoming value is not from a cmp instruction or not
       // produced by the last block. We could end up processing the value
       // producing block more than once.
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
new file mode 100644
index 0000000000000..ad3326cc4df90
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -0,0 +1,67 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
+
+%S = type { i32, i8, i8, i16, i32, i32, i32 }
+
+define dso_local noundef zeroext i1 @cmp_mixed(
+    ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
+    ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed(
+; CHECK:   "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CHECK:  "entry+land.rhs+land.lhs.true4":
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca [3 x i32], align 4
+; CHECK-NEXT:    store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:    [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[ENTRY_LAND_RHS]] ], [ false, [[LAND_LHS_TRUE10:%.*]] ]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+entry:
+  %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %0 = load i32, ptr %e, align 4
+  %cmp = icmp eq i32 %0, 255
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr %a, align 4
+  %2 = load i32, ptr %b, align 4
+  %cmp3 = icmp eq i32 %1, %2
+  br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4:                                   ; preds = %land.lhs.true
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+  %3 = load i8, ptr %c, align 1
+  %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+  %4 = load i8, ptr %c5, align 1
+  %cmp7 = icmp eq i8 %3, %4
+  %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+  %5 = load i32, ptr %g, align 4
+  %cmp9 = icmp eq i32 %5, 100
+  %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+  br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10:                                  ; preds = %land.lhs.true4
+  %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %6 = load i8, ptr %b11, align 4
+  %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %7 = load i8, ptr %b13, align 4
+  %cmp15 = icmp eq i8 %6, %7
+  br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true10
+  %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+  %8 = load i32, ptr %f, align 4
+  %cmp16 = icmp eq i32 %8, 200
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+  %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+  ret i1 %9
+}
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 150a0300de947..0470a24b0ce6c 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
 
-%S = type { i32, i1, i1, i16, i32, i32, i32 }
+%S = type { i32, i8, i8, i16, i32, i32, i32 }
 
 ; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
 ; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
new file mode 100644
index 0000000000000..c160647271fb7
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -0,0 +1,204 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
+
+; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
+
+define dso_local noundef zeroext i1 @unmergable_select(
+    ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @unmergable_select(
+; REG:       entry:
+; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; REG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
+; REG:       land.lhs.true11:
+; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; REG-NEXT:    [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; REG-NEXT:    [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true16:
+; REG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; REG-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true21:
+; REG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT:    br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG:       land.rhs:
+; REG-NEXT:    [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
+; REG-NEXT:    [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; REG-NEXT:    [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; REG-NEXT:    br label [[LAND_END]]
+; REG:  land.end:
+; REG-NEXT:    [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
+; REG-NEXT:    ret i1 [[RES]]
+;
+entry:
+  %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
+  %0 = load i8, ptr %arrayidx, align 1
+  %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+  %2 = load i8, ptr %arrayidx2, align 1
+  %cmp = icmp eq i8 %0, -1
+  %cmp5 = icmp eq i8 %1, -56
+  %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+  %cmp9 = icmp eq i8 %2, -66
+  %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+  br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11:                                  ; preds = %entry
+  %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 12
+  %3 = load i8, ptr %arrayidx12, align 1
+  %cmp14 = icmp eq i8 %3, 1
+  br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16:                                  ; preds = %land.lhs.true11
+  %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+  %4 = load i8, ptr %arrayidx17, align 1
+  %cmp19 = icmp eq i8 %4, 2
+  br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21:                                  ; preds = %land.lhs.true16
+  %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+  %5 = load i8, ptr %arrayidx22, align 1
+  %cmp24 = icmp eq i8 %5, 7
+  br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true21
+  %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 14
+  %6 = load i8, ptr %arrayidx26, align 1
+  %cmp28 = icmp eq i8 %6, 9
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+  %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+  ret i1 %7
+}
+
+; p[12] and p[13] mergable, select blocks are split even though they aren't merged. simplifycfg merges them back.
+; NOTE: Ideally wouldn't always split and thus not rely on simplifycfg.
+
+define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @partial_merge_not_select(
+; REG:       entry5:
+; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; REG-NEXT:    br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
+; REG:       entry4:
+; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; REG-NEXT:    br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END]]
+; REG:       entry3:
+; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; REG-NEXT:    br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
+; REG:       "land.lhs.true11+land.rhs":
+; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; REG-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT:    store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; REG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
+; REG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true162:
+; REG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; REG-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true211:
+; REG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT:    br label [[LAND_END]]
+; REG:  land.end:
+; REG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, [[ENTRY_3]] ], [ false, [[ENTRY_4]] ], [ false, %entry5 ]
+; REG-NEXT:    ret i1 [[RES]]
+;
+; CFG-LABEL: @partial_merge_not_select(
+; CFG:       entry5:
+; CFG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CFG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; CFG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CFG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; CFG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CFG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; CFG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CFG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CFG:       "land.lhs.true11+land.rhs":
+; CFG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CFG-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT:    store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; CFG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
+; CFG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CFG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CFG-NEXT:    [[SEL2:%.*]] = select i1 [[CMP3]], i1 [[CMP4]], i1 false
+; CFG-NEXT:    br i1 [[SEL2]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CFG:       land.lhs.true211:
+; CFG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CFG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CFG-NEXT:    br label [[LAND_END]]
+; CFG:  land.end:
+; CFG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry5 ]
+; CFG-NEXT:    ret i1 [[RES]]
+entry:
+  %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
+  %0 = load i8, ptr %arrayidx, align 1
+  %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+  %2 = load i8, ptr %arrayidx2, align 1
+  %cmp = icmp eq i8 %0, -1
+  %cmp5 = icmp eq i8 %1, -56
+  %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+  %cmp9 = icmp eq i8 %2, -66
+  %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+  br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11:                                  ; preds = %entry
+  %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 12
+  %3 = load i8, ptr %arrayidx12, align 1
+  %cmp14 = icmp eq i8 %3, 1
+  br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16:                                  ; preds = %land.lhs.true11
+  %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+  %4 = load i8, ptr %arrayidx17, align 1
+  %cmp19 = icmp eq i8 %4, 2
+  br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21:                                  ; preds = %land.lhs.true16
+  %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+  %5 = load i8, ptr %arrayidx22, align 1
+  %cmp24 = icmp eq i8 %5, 7
+  br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true21
+  %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 13
+  %6 = load i8, ptr %arrayidx26, align 1
+  %cmp28 = icmp eq i8 %6, 9
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+  %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+  ret i1 %7
+}
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
new file mode 100644
index 0000000000000..7cf05d5159b66
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -0,0 +1,230 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
+
+; REG checks the IR when only mergeicmps is run.
+; CFG checks the IR when simplifycfg is run afterwards to merge distinct blocks back together.
+
+; Can merge part of a select block even if not entire block mergable.
+
+%S = type { i32, i8, i8, i16, i32, i32, i32, i8 }
+
+define zeroext i1 @cmp_partially_mergable_select(
+    ptr nocapture readonly align 4 dereferenceable(24) %a,
+    ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
+; REG-LABEL: @cmp_partially_mergable_select(
+; REG:      "land.lhs.true+land.rhs+land.lhs.true4":
+; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; REG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
+; REG:      land.lhs.true103:
+; REG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; REG-NEXT:   [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; REG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
+; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
+; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
+; REG-NEXT:   br i1 [[CMP2]], label [[ENTRY2:%.*]], label [[LAND_END]]
+; REG:      entry2:
+; REG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; REG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
+; REG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
+; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
+; REG:      land.lhs.true41:
+; REG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; REG-NEXT:   [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; REG-NEXT:   [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
+; REG-NEXT:   br label %land.end
+; REG:      land.end:
+; REG-NEXT:   [[TMP8:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[ENTRY2]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
+; REG-NEXT:   ret i1 [[TMP8]]
+;
+; CFG-LABEL: @cmp_partially_mergable_select(
+; CFG:      "land.lhs.true+land.rhs+land.lhs.true4":
+; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
+; CFG:      land.lhs.true103:
+; CFG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; CFG-NEXT:   [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; CFG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
+; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
+; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
+; CFG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CFG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
+; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
+; CFG-NEXT:   [[SEL:%.*]] = select i1 %5, i1 %8, i1 false
+; CFG-NEXT:   br i1 [[SEL]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
+; CFG:      land.lhs.true41:
+; CFG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; CFG-NEXT:   [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
+; CFG-NEXT:   br label %land.end
+; CFG:      land.end:
+; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
+; CFG-NEXT:   ret i1 [[RES]]
+;
+entry:
+  %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %0 = load i32, ptr %e, align 4
+  %cmp = icmp eq i32 %0, 255
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr %a, align 4
+  %2 = load i32, ptr %b, align 4
+  %cmp3 = icmp eq i32 %1, %2
+  br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4:                                   ; preds = %land.lhs.true
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+  %3 = load i8, ptr %c, align 1
+  %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+  %4 = load i8, ptr %c5, align 1
+  %cmp7 = icmp eq i8 %3, %4
+  %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+  %5 = load i32, ptr %g, align 4
+  %cmp9 = icmp eq i32 %5, 100
+  %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+  br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10:                                  ; preds = %land.lhs.true4
+  %h = getelementptr inbounds nuw i8, ptr %a, i64 20
+  %6 = load i8, ptr %h, align 4
+  %h12 = getelementptr inbounds nuw i8, ptr %b, i64 20
+  %7 = load i8, ptr %h12, align 4
+  %cmp14 = icmp eq i8 %6, %7
+  br i1 %cmp14, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true10
+  %b15 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %8 = load i8, ptr %b15, align 4
+  %b17 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %9 = load i8, ptr %b17, align 4
+  %cmp19 = icmp eq i8 %8, %9
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+  %10 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp19, %land.rhs ]
+  ret i1 %10
+}
+
+
+; p[12] and p[13] are mergable. p[12] is inside of a select block which will be split up.
+; MergeICmps always splits up matching select blocks. The following simplifycfg pass merges them back together.
+
+define dso_local zeroext i1 @cmp_partially_mergable_select_array(
+    ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @cmp_partially_mergable_select_array(
+; REG: entry5:
+; REG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; REG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; REG-NEXT:   br i1 %2, label [[ENTRY4:%.*]], label [[LAND_END:%.*]]
+; REG: entry4:
+; REG-NEXT:   [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
+; REG-NEXT:   br i1 [[CMP1]], label [[ENTRY_LAND:%.*]], label [[LAND_END]]
+; REG: "entry+land.rhs":
+; REG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
+; REG-NEXT:   [[TMP5:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT:   store [2 x i8] c"\FF\09", ptr %7, align 1
+; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
+; REG-NEXT:   [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT:   br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
+; REG: land.lhs.true113:
+; REG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
+; REG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
+; REG: land.lhs.true162:
+; REG-NEXT:   [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
+; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; REG-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; REG: land.lhs.true211:
+; REG-NEXT:   [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT:   [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
+; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; REG-NEXT:   br label [[LAND_END]]
+; REG: land.end:
+; REG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_LAND]] ], [ false, [[ENTRY4]] ], [ false, %entry5 ]
+; REG-NEXT:   ret i1 [[RES]]
+;
+;
+; CFG-LABEL: @cmp_partially_mergable_select_array(
+; CFG:      entry5:
+; CFG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; CFG-NEXT:   [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
+; CFG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT:   br i1 [[SEL0]], label [[ENTRY_LAND:%.*]], label [[LAND_END:%.*]]
+; CFG:      "entry+land.rhs":
+; CFG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
+; CFG-NEXT:   [[TMP5:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT:   store [2 x i8] c"\FF\09", ptr %7, align 1
+; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
+; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
+; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; CFG-NEXT:   [[SEL1:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; CFG-NEXT:   [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
+; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; CFG-NEXT:   [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP4]], i1 false
+; CFG-NEXT:   br i1 [[SEL2]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; CFG:      land.lhs.true211:
+; CFG-NEXT:   [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT:   [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
+; CFG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; CFG-NEXT:   br label [[LAND_END]]
+; CFG:      land.end:
+; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[ENTRY_LAND]] ], [ false, %entry5 ]
+; CFG-NEXT:   ret i1 [[RES]]
+;
+entry:
+  %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
+  %0 = load i8, ptr %arrayidx, align 1
+  %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+  %2 = load i8, ptr %arrayidx2, align 1
+  %cmp = icmp eq i8 %0, -1
+  %cmp5 = icmp eq i8 %1, -56
+  %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+  %cmp9 = icmp eq i8 %2, -66
+  %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+  br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11:
+  %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 10
+  %3 = load i8, ptr %arrayidx12, align 1
+  %cmp14 = icmp eq i8 %3, 1
+  br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16:
+  %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+  %4 = load i8, ptr %arrayidx17, align 1
+  %cmp19 = icmp eq i8 %4, 2
+  br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21:
+  %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+  %5 = load i8, ptr %arrayidx22, align 1
+  %cmp24 = icmp eq i8 %5, 7
+  br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs:
+  %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 13
+  %6 = load i8, ptr %arrayidx26, align 1
+  %cmp28 = icmp eq i8 %6, 9
+  br label %land.end
+
+land.end:
+  %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+  ret i1 %7
+}
+
diff --git a/llvm/test/Transforms/MergeICmps/X86/single-block.ll b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
new file mode 100644
index 0000000000000..b5735c73ced4c
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
@@ -0,0 +1,23 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Merges adjacent comparisons with constants even if only in single basic block
+
+define i1 @merge_single(ptr nocapture noundef readonly dereferenceable(2) %p) {
+; CHECK-LABEL: @merge_single(
+; CHECK:       entry:
+; CHECK-NEXT:   [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[P:%.*]], i64 1
+; CHECK-NEXT:   [[TMP1:%.*]] = alloca [2 x i8], align 1
+; CHECK-NEXT:   store [2 x i8] c"\FF\FF", ptr [[TMP1]], align 1
+; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP1]], i64 2)
+; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:   ret i1 [[CMP0]]
+;
+entry:
+  %0 = load i8, ptr %p, align 1
+  %arrayidx1 = getelementptr inbounds i8, ptr %p, i64 1
+  %1 = load i8, ptr %arrayidx1, align 1
+  %cmp = icmp eq i8 %0, -1
+  %cmp3 = icmp eq i8 %1, -1
+  %2 = select i1 %cmp, i1 %cmp3, i1 false
+  ret i1 %2
+}

>From 52e03dfc88705f20c4b985fcfae776644a00f729 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 28 Feb 2025 19:25:46 +0100
Subject: [PATCH 08/23] [MergeIcmps] Reimplemented block-splitting for
 multbceblocks; fixed block reordering

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 214 +++++++++---------
 .../MergeICmps/X86/mixed-cmp-bb-select.ll     |   2 -
 .../MergeICmps/X86/mixed-comparisons.ll       |   2 -
 .../X86/not-split-unmerged-select.ll          |  28 +--
 .../MergeICmps/X86/partial-select-merge.ll    | 103 ++++-----
 5 files changed, 172 insertions(+), 177 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 779e9325a311a..2bf2eaaf3abcc 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -312,16 +312,12 @@ class IntraCmpChain {
 };
 
 
-// A basic block that contains one or more comparisons
+// A basic block that contains one or more comparisons.
 class MultBCECmpBlock {
  public:
   MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
       : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
 
-  // // Returns true if each comparison in this basic block is being merged.
-  // // Necessary because otherwise would leave basic block in invalid state.
-  // bool hasAllCmpsMerged() const;
-
   // Returns true if the block does other works besides comparison.
   bool doesOtherWork() const;
 
@@ -329,24 +325,20 @@ class MultBCECmpBlock {
     return Cmps;
   }
 
-  // // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
-  // // instructions in the block.
-  // bool canSplit(AliasAnalysis &AA) const;
+  // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
+  // instructions in the block.
+  bool canSplit(AliasAnalysis &AA) const;
 
-  // // Return true if this all the relevant instructions in the BCE-cmp-block can
-  // // be sunk below this instruction. By doing this, we know we can separate the
-  // // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
-  // // block.
-  // bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
+  // Return true if this all the relevant instructions in the BCE-cmp-block can
+  // be sunk below this instruction. By doing this, we know we can separate the
+  // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
+  // block.
+  bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
 
   // The basic block where this comparison happens.
   BasicBlock *BB;
   // Instructions relating to the BCECmp and branch.
   InstructionSet BlockInsts;
-  // The block requires splitting.
-  bool RequireSplit = false;
-  // Original order of this block in the chain.
-  unsigned OrigOrder = 0;
 
 private:
   std::vector<Comparison*> Cmps;
@@ -359,8 +351,11 @@ class MultBCECmpBlock {
 // (see canSplit()).
 class SingleBCECmpBlock {
  public:
-  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I)
-      : BB(M.BB), OrigOrder(M.OrigOrder), Cmp(M.getCmps()[I]) {}
+  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
+      : BB(M.BB), OrigOrder(OrigOrder), Cmp(M.getCmps()[I]) {}
+
+  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
+      : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts)  {}
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
   const Comparison* getCmp() const {
@@ -374,67 +369,60 @@ class SingleBCECmpBlock {
   // We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
   // instructions. Split the old block and move all non-BCE-cmp-insts into the
   // new parent block.
-  // void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
+  void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
 
   // The basic block where this comparison happens.
   BasicBlock *BB;
   // Original order of this block in the chain.
   unsigned OrigOrder = 0;
+  // The block requires splitting.
+  bool RequireSplit = false;
 
 private:
   Comparison* Cmp;
+  llvm::SmallVector<Instruction *, 4> SplitInsts;
 };
 
-// bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
-//                                     AliasAnalysis &AA) const {
-//   // If this instruction may clobber the loads and is in middle of the BCE cmp
-//   // block instructions, then bail for now.
-//   if (Inst->mayWriteToMemory()) {
-//     auto MayClobber = [&](LoadInst *LI) {
-//       // If a potentially clobbering instruction comes before the load,
-//       // we can still safely sink the load.
-//       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
-//              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
-//     };
-//     for (auto* Cmp : Cmps.getCmpChain()) {
-//       auto [Lhs,Rhs] = Cmp->getLoads();
-//       if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
-//         return false;
-//     }
-//   }
-//   // Make sure this instruction does not use any of the BCE cmp block
-//   // instructions as operand.
-//   return llvm::none_of(Inst->operands(), [&](const Value *Op) {
-//     const Instruction *OpI = dyn_cast<Instruction>(Op);
-//     return OpI && BlockInsts.contains(OpI);
-//   });
-// }
+bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
+                                    AliasAnalysis &AA) const {
+  // If this instruction may clobber the loads and is in middle of the BCE cmp
+  // block instructions, then bail for now.
+  if (Inst->mayWriteToMemory()) {
+    auto MayClobber = [&](LoadInst *LI) {
+      // If a potentially clobbering instruction comes before the load,
+      // we can still safely sink the load.
+      return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
+             isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
+    };
+    for (auto* Cmp : Cmps) {
+      auto [Lhs,Rhs] = Cmp->getLoads();
+      if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
+        return false;
+    }
+  }
+  // Make sure this instruction does not use any of the BCE cmp block
+  // instructions as operand.
+  return llvm::none_of(Inst->operands(), [&](const Value *Op) {
+    const Instruction *OpI = dyn_cast<Instruction>(Op);
+    return OpI && BlockInsts.contains(OpI);
+  });
+}
 
-// void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
-//   llvm::SmallVector<Instruction *, 4> OtherInsts;
-//   for (Instruction &Inst : *BB) {
-//     if (BlockInsts.count(&Inst))
-//       continue;
-//     assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
-//     // This is a non-BCE-cmp-block instruction. And it can be separated
-//     // from the BCE-cmp-block instruction.
-//     OtherInsts.push_back(&Inst);
-//   }
-
-//   // Do the actual splitting.
-//   for (Instruction *Inst : reverse(OtherInsts))
-//     Inst->moveBeforePreserving(*NewParent, NewParent->begin());
-// }
+void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
+  // Do the actual splitting.
+  for (Instruction *Inst : reverse(SplitInsts))
+    Inst->moveBeforePreserving(*NewParent, NewParent->begin());
+}
 
-// bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
-//   for (Instruction &Inst : *BB) {
-//     if (!BlockInsts.count(&Inst)) {
-//       if (!canSinkBCECmpInst(&Inst, AA))
-//         return false;
-//     }
-//   }
-//   return true;
-// }
+bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
+  for (Instruction &Inst : *BB) {
+    if (!BlockInsts.count(&Inst)) {
+      if (!canSinkBCECmpInst(&Inst, AA))
+        return false;
+    }
+  }
+  return true;
+}
 
 bool MultBCECmpBlock::doesOtherWork() const {
   // TODO(courbet): Can we allow some other things ? This is very conservative.
@@ -592,11 +580,26 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
 // }
 
 static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
-                                MultBCECmpBlock &&CmpBlock) {
+                                MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
   // emitDebugInfo(Comparison);
-  CmpBlock.OrigOrder = Comparisons.size();
-  for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++)
-    Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i));
+  for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
+    unsigned OrigOrder = Comparisons.size();
+    if (!RequireSplit || i != 0) {
+      Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder));
+      continue;
+    }
+    // If should split mult block then put all instructions at the beginning of the first block
+    llvm::SmallVector<Instruction *, 4> OtherInsts;
+    for (Instruction &Inst : *CmpBlock.BB) {
+      if (CmpBlock.BlockInsts.count(&Inst))
+        continue;
+      assert(CmpBlock.canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+      // This is a non-BCE-cmp-block instruction. And it can be separated
+      // from the BCE-cmp-block instruction.
+      OtherInsts.push_back(&Inst);
+    }
+    Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder, OtherInsts));
+  }
 }
 
 // A chain of comparisons.
@@ -683,33 +686,32 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
     if (CmpBlock->doesOtherWork()) {
       LLVM_DEBUG(dbgs() << "block '" << CmpBlock->BB->getName()
                         << "' does extra work besides compare\n");
-      // if (Comparisons.empty()) {
-      //   // This is the initial block in the chain, in case this block does other
-      //   // work, we can try to split the block and move the irrelevant
-      //   // instructions to the predecessor.
-      //   //
-      //   // If this is not the initial block in the chain, splitting it wont
-      //   // work.
-      //   //
-      //   // As once split, there will still be instructions before the BCE cmp
-      //   // instructions that do other work in program order, i.e. within the
-      //   // chain before sorting. Unless we can abort the chain at this point
-      //   // and start anew.
-      //   //
-      //   // NOTE: we only handle blocks a with single predecessor for now.
-      //   if (Comparison->canSplit(AA)) {
-      //     LLVM_DEBUG(dbgs()
-      //                << "Split initial block '" << Comparison->BB->getName()
-      //                << "' that does extra work besides compare\n");
-      //     Comparison->RequireSplit = true;
-      //     enqueueBlock(Comparisons, std::move(*Comparison));
-      //   } else {
-      //     LLVM_DEBUG(dbgs()
-      //                << "ignoring initial block '" << Comparison->BB->getName()
-      //                << "' that does extra work besides compare\n");
-      //   }
-      //   continue;
-      // }
+      if (Comparisons.empty()) {
+        // This is the initial block in the chain, in case this block does other
+        // work, we can try to split the block and move the irrelevant
+        // instructions to the predecessor.
+        //
+        // If this is not the initial block in the chain, splitting it wont
+        // work.
+        //
+        // As once split, there will still be instructions before the BCE cmp
+        // instructions that do other work in program order, i.e. within the
+        // chain before sorting. Unless we can abort the chain at this point
+        // and start anew.
+        //
+        // NOTE: we only handle blocks a with single predecessor for now.
+        if (CmpBlock->canSplit(AA)) {
+          LLVM_DEBUG(dbgs()
+                     << "Split initial block '" << CmpBlock->BB->getName()
+                     << "' that does extra work besides compare\n");
+          enqueueBlock(Comparisons, std::move(*CmpBlock), AA, true);
+        } else {
+          LLVM_DEBUG(dbgs()
+                     << "ignoring initial block '" << CmpBlock->BB->getName()
+                     << "' that does extra work besides compare\n");
+        }
+        continue;
+      }
       // TODO(courbet): Right now we abort the whole chain. We could be
       // merging only the blocks that don't do other work and resume the
       // chain from there. For example:
@@ -735,7 +737,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
       // We could still merge bb1 and bb2 though.
       return;
     }
-    enqueueBlock(Comparisons, std::move(*CmpBlock));
+    enqueueBlock(Comparisons, std::move(*CmpBlock), AA, false);
   }
   
   // It is possible we have no suitable comparison to merge.
@@ -862,12 +864,12 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
   // If there is one block that requires splitting, we do it now, i.e.
   // just before we know we will collapse the chain. The instructions
   // can be executed before any of the instructions in the chain.
-  // const auto ToSplit = llvm::find_if(
-  //     Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
-  // if (ToSplit != Comparisons.end()) {
-  //   LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
-  //   ToSplit->split(BB, AA);
-  // }
+  const auto ToSplit = llvm::find_if(
+      Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
+  if (ToSplit != Comparisons.end()) {
+    LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
+    ToSplit->split(BB, AA);
+  }
 
   if (Comparisons.size() == 1) {
     LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index ad3326cc4df90..74e7a9ce705de 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -2,8 +2,6 @@
 
 ; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
 
-%S = type { i32, i8, i8, i16, i32, i32, i32 }
-
 define dso_local noundef zeroext i1 @cmp_mixed(
     ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
     ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 0470a24b0ce6c..ec1c8660fde86 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -1,8 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
 
-%S = type { i32, i8, i8, i16, i32, i32, i32 }
-
 ; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
 ; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
 
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index c160647271fb7..cd409b0f007ee 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -92,19 +92,19 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
 ; REG-LABEL: @partial_merge_not_select(
 ; REG:       entry5:
-; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
 ; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
-; REG-NEXT:    br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
+; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT:    br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END]]
 ; REG:       entry4:
-; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
 ; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
-; REG-NEXT:    br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END]]
+; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT:    br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END:%.*]]
 ; REG:       entry3:
-; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
 ; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
 ; REG-NEXT:    br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
 ; REG:       "land.lhs.true11+land.rhs":
 ; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
@@ -129,16 +129,16 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ;
 ; CFG-LABEL: @partial_merge_not_select(
 ; CFG:       entry5:
-; CFG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
 ; CFG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; CFG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
-; CFG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CFG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
 ; CFG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; CFG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
 ; CFG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
 ; CFG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; CFG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
 ; CFG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
 ; CFG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CFG:       "land.lhs.true11+land.rhs":
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 7cf05d5159b66..9fabe7fb2fc61 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -6,8 +6,6 @@
 
 ; Can merge part of a select block even if not entire block mergable.
 
-%S = type { i32, i8, i8, i16, i32, i32, i32, i8 }
-
 define zeroext i1 @cmp_partially_mergable_select(
     ptr nocapture readonly align 4 dereferenceable(24) %a,
     ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
@@ -114,75 +112,74 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 define dso_local zeroext i1 @cmp_partially_mergable_select_array(
     ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
 ; REG-LABEL: @cmp_partially_mergable_select_array(
+; REG: "entry+land.rhs":
+; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; REG-NEXT:   [[TMP0:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT:   store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
+; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT:   br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
 ; REG: entry5:
-; REG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
-; REG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
-; REG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; REG-NEXT:   br i1 %2, label [[ENTRY4:%.*]], label [[LAND_END:%.*]]
+; REG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; REG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT:   br i1 [[CMP1]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
 ; REG: entry4:
-; REG-NEXT:   [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
-; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
-; REG-NEXT:   br i1 [[CMP1]], label [[ENTRY_LAND:%.*]], label [[LAND_END]]
-; REG: "entry+land.rhs":
-; REG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
-; REG-NEXT:   [[TMP5:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT:   store [2 x i8] c"\FF\09", ptr %7, align 1
-; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
-; REG-NEXT:   [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
 ; REG-NEXT:   br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
 ; REG: land.lhs.true113:
-; REG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; REG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
-; REG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; REG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; REG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
 ; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
 ; REG: land.lhs.true162:
-; REG-NEXT:   [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
-; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; REG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
 ; REG-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
 ; REG: land.lhs.true211:
-; REG-NEXT:   [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT:   [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
-; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; REG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
 ; REG-NEXT:   br label [[LAND_END]]
 ; REG: land.end:
-; REG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_LAND]] ], [ false, [[ENTRY4]] ], [ false, %entry5 ]
+; REG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_4]] ], [ false, [[ENTRY_5]] ], [ false, %"entry+land.rhs" ]
 ; REG-NEXT:   ret i1 [[RES]]
 ;
 ;
 ; CFG-LABEL: @cmp_partially_mergable_select_array(
-; CFG:      entry5:
-; CFG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
-; CFG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
-; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; CFG-NEXT:   [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
-; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
-; CFG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT:   br i1 [[SEL0]], label [[ENTRY_LAND:%.*]], label [[LAND_END:%.*]]
 ; CFG:      "entry+land.rhs":
-; CFG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
-; CFG-NEXT:   [[TMP5:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT:   store [2 x i8] c"\FF\09", ptr %7, align 1
-; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
-; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; CFG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
-; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
-; CFG-NEXT:   [[SEL1:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
-; CFG-NEXT:   [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
-; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
-; CFG-NEXT:   [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP4]], i1 false
-; CFG-NEXT:   br i1 [[SEL2]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; CFG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; CFG-NEXT:   [[TMP0:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT:   store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
+; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CFG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; CFG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CFG-NEXT:   [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CFG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CFG-NEXT:   [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP3]], i1 false
+; CFG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CFG-NEXT:   [[SEL3:%.*]] = select i1 [[SEL2]], i1 [[CMP4]], i1 false
+; CFG-NEXT:   br i1 [[SEL3]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
 ; CFG:      land.lhs.true211:
-; CFG-NEXT:   [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT:   [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
-; CFG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; CFG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CFG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
 ; CFG-NEXT:   br label [[LAND_END]]
 ; CFG:      land.end:
-; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[ENTRY_LAND]] ], [ false, %entry5 ]
+; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, %"entry+land.rhs" ]
 ; CFG-NEXT:   ret i1 [[RES]]
 ;
 entry:

>From a3005c017b1388c44ea3ec515b284ec5248e73bd Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 12:33:36 +0100
Subject: [PATCH 09/23] [MergeICmps] Added tests for splitting const and select
 blocks

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 17 ++--
 .../MergeICmps/X86/split-block-does-work.ll   | 87 +++++++++++++++++++
 2 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2bf2eaaf3abcc..18ee7a877d985 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -318,13 +318,13 @@ class MultBCECmpBlock {
   MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
       : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
 
-  // Returns true if the block does other works besides comparison.
-  bool doesOtherWork() const;
-
   std::vector<Comparison*> getCmps() {
     return Cmps;
   }
 
+  // Returns true if the block does other works besides comparison.
+  bool doesOtherWork() const;
+
   // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
   // instructions in the block.
   bool canSplit(AliasAnalysis &AA) const;
@@ -358,9 +358,7 @@ class SingleBCECmpBlock {
       : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts)  {}
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
-  const Comparison* getCmp() const {
-    return Cmp;
-  }
+  const Comparison* getCmp() const { return Cmp; }
 
   bool operator<(const SingleBCECmpBlock &O) const {
     return *Cmp < *O.Cmp;
@@ -579,7 +577,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
 //   LLVM_DEBUG(dbgs() << "\n");
 // }
 
-static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
+// Enqueues a single comparison and if it's the first comparison of the first block then adds the `OtherInsts` to the block too. To split it.
+static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
                                 MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
   // emitDebugInfo(Comparison);
   for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
@@ -704,7 +703,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
           LLVM_DEBUG(dbgs()
                      << "Split initial block '" << CmpBlock->BB->getName()
                      << "' that does extra work besides compare\n");
-          enqueueBlock(Comparisons, std::move(*CmpBlock), AA, true);
+          enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, true);
         } else {
           LLVM_DEBUG(dbgs()
                      << "ignoring initial block '" << CmpBlock->BB->getName()
@@ -737,7 +736,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
       // We could still merge bb1 and bb2 though.
       return;
     }
-    enqueueBlock(Comparisons, std::move(*CmpBlock), AA, false);
+    enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, false);
   }
   
   // It is possible we have no suitable comparison to merge.
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index c53d86d76ff3b..61304694548a2 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -4,6 +4,7 @@
 %S = type { i32, i32, i32, i32 }
 
 declare void @foo(...)
+declare void @bar(...)
 
 ; We can split %entry and create a memcmp(16 bytes).
 define zeroext i1 @opeq1(
@@ -240,3 +241,89 @@ opeq1.exit:
   %8 = phi i1 [ false, %entry ], [ false, %land.rhs.i] , [ false, %land.rhs.i.2 ], [ %cmp4.i, %land.rhs.i.3 ]
   ret i1 %8
 }
+
+; Call instruction mixed in with select block but doesn't clobber memory, so can safely sink and merge all comparisons.
+; Make sure that call order stays the same.
+define dso_local noundef zeroext i1 @unclobbered_select_cmp(
+; X86-LABEL: @unclobbered_select_cmp(
+; X86-NEXT:       "entry+land.rhs":
+; X86-NEXT:    call void (...) @foo() #[[ATTR2]]
+; X86-NEXT:    call void (...) @bar() #[[ATTR2]]
+; X86-NEXT:    [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
+; X86-NEXT:    [[TMP0:%.*]] = alloca [3 x i8], align 1
+; X86-NEXT:    store [3 x i8] c"d\03\C8", ptr [[TMP0]], align 1
+; X86-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
+; X86-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; X86-NEXT:    br label [[LAND_END:%.*]]
+; X86:       land.end:
+; X86-NEXT:    ret i1 [[TMP1]]
+;
+  ptr nocapture readonly dereferenceable(5) %a) local_unnamed_addr nofree nosync {
+entry:
+  %q = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %0 = load i8, ptr %q, align 1
+  call void (...) @foo() inaccessiblememonly
+  %cmp = icmp eq i8 %0, 200
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 2
+  %1 = load i8, ptr %c, align 1
+  %cmp2 = icmp eq i8 %1, 100
+  call void (...) @bar() inaccessiblememonly
+  %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+  br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %entry
+  %b3 = getelementptr inbounds nuw i8, ptr %a, i64 3
+  %2 = load i8, ptr %b3, align 1
+  %cmp5 = icmp eq i8 %2, 3
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %entry
+  %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+  ret i1 %3
+}
+
+
+; Can only split first block. If subsequent block contains a clobber instruction then don't merge.
+define dso_local noundef zeroext i1 @not_split_sec_block(
+; X86-LABEL: @not_split_sec_block(
+; X86-NEXT:  entry:
+; X86-NEXT:    [[TMP0:%.*]] = load i8, ptr [[A:%.*]], align 1
+; X86-NEXT:    call void (...) @foo() #[[ATTR2]]
+; X86-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; X86-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 2
+; X86-NEXT:    [[TMP2:%.*]] = load i8, ptr [[TMP1]], align 1
+; X86-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP2]], 100
+; X86-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; X86-NEXT:    br i1 [[SEL0]], label [[LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; X86:       land.rhs:
+; X86-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 1
+; X86-NEXT:    [[TMP4:%.*]] = load i8, ptr [[TMP3]], align 1
+; X86-NEXT:    call void (...) @bar() #[[ATTR2]]
+; X86-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP4]], 3
+; X86-NEXT:    br label [[LAND_END]]
+; X86:       land.end:
+; X86-NEXT:    [[RES:%.*]] = phi i1 [ false, %entry ], [ [[CMP2]], [[LAND_RHS]] ]
+; X86-NEXT:    ret i1 [[RES]]
+;
+  ptr nocapture readonly dereferenceable(3) %a) local_unnamed_addr nofree nosync {
+entry:
+  %0 = load i8, ptr %a, align 1
+  call void (...) @foo() inaccessiblememonly
+  %cmp = icmp eq i8 %0, 200
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 2
+  %1 = load i8, ptr %c, align 1
+  %cmp2 = icmp eq i8 %1, 100
+  %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+  br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %entry
+  %b3 = getelementptr inbounds nuw i8, ptr %a, i64 1
+  %2 = load i8, ptr %b3, align 1
+; Even though this call doesn't clobber any memory, can only sink instructions from first block.
+  call void (...) @bar() inaccessiblememonly
+  %cmp5 = icmp eq i8 %2, 3
+  br label %land.end
+land.end:
+  %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+  ret i1 %3
+}

>From e3caafc42d23bbc0ec9ebfe9943b10af9b7410bc Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 17:06:06 +0100
Subject: [PATCH 10/23] [MergeICmps] Can build const-cmp-chains of different
 types using llvm.structs

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 26 +++---
 .../X86/mixed-type-const-comparisons.ll       | 79 +++++++++++++++++++
 2 files changed, 93 insertions(+), 12 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 18ee7a877d985..e75836e895175 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -266,6 +266,7 @@ struct BCECmp : public Comparison {
   }
 };
 
+// TODO: this can be improved to take alignment into account.
 bool Comparison::areContiguous(const Comparison& Other) const {
   assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
   if (isa<BCEConstCmp>(this)) {
@@ -549,10 +550,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
 
   InstructionSet BlockInsts;
   std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
-  if (!Result) {
-    dbgs() << "invalid result\n";
+  if (!Result)
     return std::nullopt;
-  }
 
   for (auto* Cmp : Result->getCmpChain()) {
     auto CmpInsts = Cmp->getInsts();
@@ -577,7 +576,7 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
 //   LLVM_DEBUG(dbgs() << "\n");
 // }
 
-// Enqueues a single comparison and if it's the first comparison of the first block then adds the `OtherInsts` to the block too. To split it.
+// Enqueues a single comparison and if it's the first comparison block then adds the `OtherInsts` to the block too to split it.
 static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
                                 MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
   // emitDebugInfo(Comparison);
@@ -837,18 +836,21 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
     Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
   else
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
-  // Build constant-array to compare to
-  if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
+  // Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
+  if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
     if (Comparisons.size() > 1) {
-      auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
-      auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
       std::vector<Constant*> Constants;
+      std::vector<Type*> Types;
       for (const auto& BceBlock : Comparisons) {
-        Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+        auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+        Constants.emplace_back(ConstCmp->Const);
+        Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
       }
-      auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
-      Builder.CreateStore(ArrayConstant,ArrayAlloca);
-      Rhs = ArrayAlloca;
+      auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+      auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+      auto *StructConstant = ConstantStruct::get(StructType, Constants);
+      Builder.CreateStore(StructConstant, StructAlloca);
+      Rhs = StructAlloca;
     }
   } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
     if (FirstBceCmp->Rhs.GEP)
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
new file mode 100644
index 0000000000000..05aa99d31c5a1
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -0,0 +1,79 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Tests if a const-cmp-chain of different types can still be merged.
+; This is usually the case when comparing different struct fields to constants.
+
+; Can only merge gep 0 with gep 4 due to alignment since gep 8 is not directly adjacent to gep 4.
+define dso_local zeroext i1 @is_all_ones_struct(
+; CHECK-LABEL: @is_all_ones_struct(
+; CHECK:      entry1:
+; CHECK-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 8
+; CHECK-NEXT:   [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
+; CHECK-NEXT:   br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
+; CHECK:      "land.rhs+land.lhs.true":
+; CHECK-NEXT:   [[TMP2:%.*]] = alloca { i32, i8 }
+; CHECK-NEXT:   store { i32, i8 } { i32 3, i8 100 }, ptr [[TMP2]]
+; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
+; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:   br label [[LAND_END]]
+; CHECK:      land.end:
+; CHECK-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP1]], [[MERGED]] ], [ false, %entry1 ]
+; CHECK-NEXT:   ret i1 [[RES]]
+;
+  ptr noundef nonnull readonly align 4 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+entry:
+  %c = getelementptr inbounds nuw i8, ptr %p, i64 8
+  %0 = load i32, ptr %c, align 4
+  %cmp = icmp eq i32 %0, 200
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %b = getelementptr inbounds nuw i8, ptr %p, i64 4
+  %1 = load i8, ptr %b, align 4
+  %cmp1 = icmp eq i8 %1, 100
+  br i1 %cmp1, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true
+  %2 = load i32, ptr %p, align 4
+  %cmp3 = icmp eq i32 %2, 3
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true, %entry
+  %3 = phi i1 [ false, %land.lhs.true ], [ false, %entry ], [ %cmp3, %land.rhs ]
+  ret i1 %3
+}
+
+
+; Can also still merge select blocks with different types.
+define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
+; CHECK-LABEL: @is_all_ones_struct_select_block(
+; CHECK:      "entry+land.rhs":
+; CHECK-NEXT:   [[TMP0:%.*]] = alloca { i32, i8, i8 }
+; CHECK-NEXT:   store { i32, i8, i8 } { i32 200, i8 3, i8 100 }, ptr [[TMP0]]
+; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
+; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:   br label [[LAND_END]]
+; CHECK:      land.end:
+; CHECK-NEXT:   ret i1 [[CMP1]]
+;
+  ptr noundef nonnull readonly align 4 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+entry:
+  %0 = load i32, ptr %p, align 4
+  %cmp = icmp eq i32 %0, 200
+  %c = getelementptr inbounds nuw i8, ptr %p, i64 5
+  %1 = load i8, ptr %c, align 1
+  %cmp2 = icmp eq i8 %1, 100
+  %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+  br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %entry
+  %b3 = getelementptr inbounds nuw i8, ptr %p, i64 4
+  %2 = load i8, ptr %b3, align 4
+  %cmp5 = icmp eq i8 %2, 3
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %entry
+  %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+  ret i1 %3
+}

>From 9f18a021fe589306898a66e7b74c4cc17d615770 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 18:38:39 +0100
Subject: [PATCH 11/23] [MergeICmps] Changed tests to allocate structs instead
 of arrays for const-cmp

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp                 | 1 +
 llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll       | 4 ++--
 .../Transforms/MergeICmps/X86/many-const-cmp-select.ll    | 8 ++++----
 .../test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll | 4 ++--
 llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll  | 4 ++--
 .../MergeICmps/X86/not-split-unmerged-select.ll           | 8 ++++----
 .../Transforms/MergeICmps/X86/partial-select-merge.ll     | 8 ++++----
 .../Transforms/MergeICmps/X86/split-block-does-work.ll    | 4 ++--
 8 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index e75836e895175..690ad4d26d8ef 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -846,6 +846,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
         Constants.emplace_back(ConstCmp->Const);
         Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
       }
+      // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
       auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
       auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
       auto *StructConstant = ConstantStruct::get(StructType, Constants);
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index f05422fd9aea1..fd9faf2d343f9 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -6,8 +6,8 @@
 define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca [3 x i8], align 1
-; CHECK-NEXT:    store [3 x i8] c"\FF\C8\BE", ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8 }, align 8
+; CHECK-NEXT:    store { i8, i8, i8 } { i8 -1, i8 -56, i8 -66 }, ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index ce8de31134e0f..aa0e0e1763c3d 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -5,15 +5,15 @@
 define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
 ; CHECK-LABEL: @is_all_ones_many(
 ; CHECK-NEXT:  "entry+land.lhs.true11":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca [4 x i8], align 1
-; CHECK-NEXT:    store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8, i8 }
+; CHECK-NEXT:    store { i8, i8, i8, i8 } { i8 -1, i8 -56, i8 -66, i8 1 }, ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "land.lhs.true16+land.lhs.true21":
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
-; CHECK-NEXT:    store [2 x i8] c"\02\07", ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
+; CHECK-NEXT:    store { i8, i8 } { i8 2, i8 7 }, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
 ; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 74e7a9ce705de..55b1587bb7651 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -12,8 +12,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "entry+land.rhs+land.lhs.true4":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca [3 x i32], align 4
-; CHECK-NEXT:    store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca { i32, i32, i32 }
+; CHECK-NEXT:    store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
 ; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index ec1c8660fde86..1e8f307c2a4df 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -14,8 +14,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
 ; This is the new BCE to constant comparison block
 ; CHECK:  "entry+land.rhs+land.lhs.true8":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca [3 x i32], align 4
-; CHECK-NEXT:    store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca { i32, i32, i32 }
+; CHECK-NEXT:    store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
 ; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index cd409b0f007ee..a9fc2ff64205e 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -108,8 +108,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ; REG-NEXT:    br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
 ; REG:       "land.lhs.true11+land.rhs":
 ; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT:    store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; REG-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
+; REG-NEXT:    store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
 ; REG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
 ; REG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
@@ -143,8 +143,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ; CFG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CFG:       "land.lhs.true11+land.rhs":
 ; CFG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT:    [[TMP3:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT:    store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; CFG-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
+; CFG-NEXT:    store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
 ; CFG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
 ; CFG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CFG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 9fabe7fb2fc61..55562a47153e1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -114,8 +114,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
 ; REG-LABEL: @cmp_partially_mergable_select_array(
 ; REG: "entry+land.rhs":
 ; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT:   [[TMP0:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT:   store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; REG-NEXT:   [[TMP0:%.*]] = alloca { i8, i8 }
+; REG-NEXT:   store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
 ; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
 ; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; REG-NEXT:   br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
@@ -152,8 +152,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
 ; CFG-LABEL: @cmp_partially_mergable_select_array(
 ; CFG:      "entry+land.rhs":
 ; CFG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT:   [[TMP0:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT:   store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; CFG-NEXT:   [[TMP0:%.*]] = alloca { i8, i8 }
+; CFG-NEXT:   store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
 ; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
 ; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CFG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 61304694548a2..c496740bfc7cf 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -250,8 +250,8 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
 ; X86-NEXT:    call void (...) @foo() #[[ATTR2]]
 ; X86-NEXT:    call void (...) @bar() #[[ATTR2]]
 ; X86-NEXT:    [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT:    [[TMP0:%.*]] = alloca [3 x i8], align 1
-; X86-NEXT:    store [3 x i8] c"d\03\C8", ptr [[TMP0]], align 1
+; X86-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8 }
+; X86-NEXT:    store { i8, i8, i8 } { i8 100, i8 3, i8 -56 }, ptr [[TMP0]], align 1
 ; X86-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
 ; X86-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; X86-NEXT:    br label [[LAND_END:%.*]]

>From a4d9733ca0f7f6c61f76b85fa580f39c10387573 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 19 Mar 2025 20:41:52 +0100
Subject: [PATCH 12/23] [MergeICmps] Changed tests to use packed structs

---
 llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll       | 4 ++--
 .../Transforms/MergeICmps/X86/many-const-cmp-select.ll    | 8 ++++----
 .../test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll | 4 ++--
 llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll  | 4 ++--
 .../MergeICmps/X86/mixed-type-const-comparisons.ll        | 8 ++++----
 .../MergeICmps/X86/not-split-unmerged-select.ll           | 8 ++++----
 .../Transforms/MergeICmps/X86/partial-select-merge.ll     | 8 ++++----
 llvm/test/Transforms/MergeICmps/X86/single-block.ll       | 4 ++--
 .../Transforms/MergeICmps/X86/split-block-does-work.ll    | 4 ++--
 9 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index fd9faf2d343f9..51c3c27583602 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -6,8 +6,8 @@
 define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8 }, align 8
-; CHECK-NEXT:    store { i8, i8, i8 } { i8 -1, i8 -56, i8 -66 }, ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>, align 8
+; CHECK-NEXT:    store <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>, ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index aa0e0e1763c3d..0ca0f671d98a4 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -5,15 +5,15 @@
 define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
 ; CHECK-LABEL: @is_all_ones_many(
 ; CHECK-NEXT:  "entry+land.lhs.true11":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8, i8 }
-; CHECK-NEXT:    store { i8, i8, i8, i8 } { i8 -1, i8 -56, i8 -66, i8 1 }, ptr [[TMP0]], align 1
+; CHECK-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8, i8 }>
+; CHECK-NEXT:    store <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>, ptr [[TMP0]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "land.lhs.true16+land.lhs.true21":
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
-; CHECK-NEXT:    store { i8, i8 } { i8 2, i8 7 }, ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; CHECK-NEXT:    store <{ i8, i8 }> <{ i8 2, i8 7 }>, ptr [[TMP3]], align 1
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
 ; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 55b1587bb7651..dfe57e6ef930a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -12,8 +12,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "entry+land.rhs+land.lhs.true4":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca { i32, i32, i32 }
-; CHECK-NEXT:    store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 1e8f307c2a4df..d88d7d824b5ed 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -14,8 +14,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
 ; This is the new BCE to constant comparison block
 ; CHECK:  "entry+land.rhs+land.lhs.true8":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca { i32, i32, i32 }
-; CHECK-NEXT:    store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
index 05aa99d31c5a1..15c5a382d1f46 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -12,8 +12,8 @@ define dso_local zeroext i1 @is_all_ones_struct(
 ; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
 ; CHECK-NEXT:   br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
 ; CHECK:      "land.rhs+land.lhs.true":
-; CHECK-NEXT:   [[TMP2:%.*]] = alloca { i32, i8 }
-; CHECK-NEXT:   store { i32, i8 } { i32 3, i8 100 }, ptr [[TMP2]]
+; CHECK-NEXT:   [[TMP2:%.*]] = alloca <{ i32, i8 }>
+; CHECK-NEXT:   store <{ i32, i8 }> <{ i32 3, i8 100 }>, ptr [[TMP2]]
 ; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
 ; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:   br label [[LAND_END]]
@@ -49,8 +49,8 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
 ; CHECK-LABEL: @is_all_ones_struct_select_block(
 ; CHECK:      "entry+land.rhs":
-; CHECK-NEXT:   [[TMP0:%.*]] = alloca { i32, i8, i8 }
-; CHECK-NEXT:   store { i32, i8, i8 } { i32 200, i8 3, i8 100 }, ptr [[TMP0]]
+; CHECK-NEXT:   [[TMP0:%.*]] = alloca <{ i32, i8, i8 }>
+; CHECK-NEXT:   store <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>, ptr [[TMP0]]
 ; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
 ; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:   br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index a9fc2ff64205e..874ea22e75106 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -108,8 +108,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ; REG-NEXT:    br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
 ; REG:       "land.lhs.true11+land.rhs":
 ; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
-; REG-NEXT:    store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
+; REG-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; REG-NEXT:    store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
 ; REG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
 ; REG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
@@ -143,8 +143,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ; CFG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CFG:       "land.lhs.true11+land.rhs":
 ; CFG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT:    [[TMP3:%.*]] = alloca { i8, i8 }
-; CFG-NEXT:    store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
+; CFG-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; CFG-NEXT:    store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
 ; CFG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
 ; CFG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CFG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 55562a47153e1..20a3faa854836 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -114,8 +114,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
 ; REG-LABEL: @cmp_partially_mergable_select_array(
 ; REG: "entry+land.rhs":
 ; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT:   [[TMP0:%.*]] = alloca { i8, i8 }
-; REG-NEXT:   store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
+; REG-NEXT:   [[TMP0:%.*]] = alloca <{ i8, i8 }>
+; REG-NEXT:   store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
 ; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
 ; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; REG-NEXT:   br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
@@ -152,8 +152,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
 ; CFG-LABEL: @cmp_partially_mergable_select_array(
 ; CFG:      "entry+land.rhs":
 ; CFG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT:   [[TMP0:%.*]] = alloca { i8, i8 }
-; CFG-NEXT:   store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
+; CFG-NEXT:   [[TMP0:%.*]] = alloca <{ i8, i8 }>
+; CFG-NEXT:   store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
 ; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
 ; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CFG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/single-block.ll b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
index b5735c73ced4c..cd321f435d1f3 100644
--- a/llvm/test/Transforms/MergeICmps/X86/single-block.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
@@ -6,8 +6,8 @@ define i1 @merge_single(ptr nocapture noundef readonly dereferenceable(2) %p) {
 ; CHECK-LABEL: @merge_single(
 ; CHECK:       entry:
 ; CHECK-NEXT:   [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[P:%.*]], i64 1
-; CHECK-NEXT:   [[TMP1:%.*]] = alloca [2 x i8], align 1
-; CHECK-NEXT:   store [2 x i8] c"\FF\FF", ptr [[TMP1]], align 1
+; CHECK-NEXT:   [[TMP1:%.*]] = alloca <{ i8, i8 }>, align 1
+; CHECK-NEXT:   store <{ i8, i8 }> <{ i8 -1, i8 -1 }>, ptr [[TMP1]], align 1
 ; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP1]], i64 2)
 ; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:   ret i1 [[CMP0]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index c496740bfc7cf..5381d88ed7f52 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -250,8 +250,8 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
 ; X86-NEXT:    call void (...) @foo() #[[ATTR2]]
 ; X86-NEXT:    call void (...) @bar() #[[ATTR2]]
 ; X86-NEXT:    [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT:    [[TMP0:%.*]] = alloca { i8, i8, i8 }
-; X86-NEXT:    store { i8, i8, i8 } { i8 100, i8 3, i8 -56 }, ptr [[TMP0]], align 1
+; X86-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>
+; X86-NEXT:    store <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>, ptr [[TMP0]], align 1
 ; X86-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
 ; X86-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; X86-NEXT:    br label [[LAND_END:%.*]]

>From d15a2ce0e6f6d850f67a205eee6ffbfa7f63c50b Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 19:19:36 +0100
Subject: [PATCH 13/23] [MergeICmps] Refactored how cmp-instructions are stored
 per block

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 34 +++++++----------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 690ad4d26d8ef..1626d35fb65d2 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -132,10 +132,14 @@ class BaseIdentifier {
   DenseMap<const Value*, int> BaseToIndex;
 };
 
+
+// All Instructions related to a comparison.
+typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
+
 // If this value is a load from a constant offset w.r.t. a base address, and
 // there are no other users of the load or address, returns the base address and
 // the offset.
-BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
+BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, InstructionSet* BlockInsts) {
   auto *const LoadI = dyn_cast<LoadInst>(Val);
   if (!LoadI)
     return {};
@@ -174,11 +178,12 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
     if (!GEP->accumulateConstantOffset(DL, Offset))
       return {};
     Base = GEP->getPointerOperand();
+    BlockInsts->insert(GEP);
   }
+  BlockInsts->insert(LoadI);
   return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
 }
 
-typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
 
 struct Comparison {
 public:
@@ -200,7 +205,6 @@ struct Comparison {
 
   virtual ~Comparison() = default;
   virtual LoadOperands getLoads() = 0;
-  virtual InstructionSet getInsts() = 0;
   bool areContiguous(const Comparison& Other) const;
   bool operator<(const Comparison &Other) const;
 };
@@ -226,13 +230,6 @@ struct BCEConstCmp : public Comparison {
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,std::nullopt);
   }
-  InstructionSet getInsts() override {
-    InstructionSet BlockInsts{CmpI,Lhs.LoadI};
-    if (Lhs.GEP)
-      BlockInsts.insert(Lhs.GEP);
-    return BlockInsts;
-  }
-
 };
 
 // A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -256,14 +253,6 @@ struct BCECmp : public Comparison {
   Comparison::LoadOperands getLoads() override {
     return std::make_pair(&Lhs,&Rhs);
   }
-  InstructionSet getInsts() override {
-    InstructionSet BlockInsts{CmpI, Lhs.LoadI, Rhs.LoadI};
-    if (Lhs.GEP)
-      BlockInsts.insert(Lhs.GEP);
-    if (Rhs.GEP)
-      BlockInsts.insert(Rhs.GEP);
-    return BlockInsts;
-  }
 };
 
 // TODO: this can be improved to take alignment into account.
@@ -455,7 +444,7 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
                     << (ExpectedPredicate == ICmpInst::ICMP_EQ ? "eq" : "ne")
                     << "\n");
   // First operand is always a load
-  auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId);
+  auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId, BlockInsts);
   if (!Lhs.BaseId)
     return std::nullopt;
 
@@ -465,10 +454,11 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
   const auto &DL = CmpI->getDataLayout();
   int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
 
+  BlockInsts->insert(CmpI);
   if (auto const& Const = dyn_cast<Constant>(RhsOperand))
     return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
 
-  auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId);
+  auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
   if (!Rhs.BaseId)
     return std::nullopt;
   return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
@@ -553,10 +543,6 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
   if (!Result)
     return std::nullopt;
 
-  for (auto* Cmp : Result->getCmpChain()) {
-    auto CmpInsts = Cmp->getInsts();
-    BlockInsts.insert(CmpInsts.begin(), CmpInsts.end());
-  }
   BlockInsts.insert(BranchI);
   return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
 }

>From e5d3e57bfe487e45d4c3feaaf43c9c05e05ae516 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 21:57:31 +0100
Subject: [PATCH 14/23] [MergeICmps] Use shared-ptr to avoid leaking memory

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 37 +++++++++++++----------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 1626d35fb65d2..67def6b0f09da 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -288,15 +288,16 @@ bool Comparison::operator<(const Comparison& Other) const {
 // Represents multiple comparisons inside of a single basic block.
 // This happens if multiple basic blocks have previously been merged into a single using a select node.
 class IntraCmpChain {
-  std::vector<Comparison*> CmpChain;
+  // TODO: this could probably be a unique-ptr but current impl relies on some copies
+  std::vector<std::shared_ptr<Comparison>> CmpChain;
 
 public:
-  IntraCmpChain(Comparison* C) : CmpChain{C} {}
+  IntraCmpChain(std::shared_ptr<Comparison> C) : CmpChain{C} {}
   IntraCmpChain combine(const IntraCmpChain OtherChain) {
     CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
     return *this;
   }
-  std::vector<Comparison*> getCmpChain() const {
+  std::vector<std::shared_ptr<Comparison>> getCmpChain() const {
     return CmpChain;
   }
 };
@@ -305,10 +306,10 @@ class IntraCmpChain {
 // A basic block that contains one or more comparisons.
 class MultBCECmpBlock {
  public:
-  MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+  MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
       : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
 
-  std::vector<Comparison*> getCmps() {
+  std::vector<std::shared_ptr<Comparison>> getCmps() {
     return Cmps;
   }
 
@@ -331,7 +332,7 @@ class MultBCECmpBlock {
   InstructionSet BlockInsts;
 
 private:
-  std::vector<Comparison*> Cmps;
+  std::vector<std::shared_ptr<Comparison>> Cmps;
 };
 
 // A basic block with single a comparison between two BCE atoms.
@@ -342,13 +343,13 @@ class MultBCECmpBlock {
 class SingleBCECmpBlock {
  public:
   SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
-      : BB(M.BB), OrigOrder(OrigOrder), Cmp(M.getCmps()[I]) {}
+      : BB(M.BB), OrigOrder(OrigOrder), Cmp(std::move(M.getCmps()[I])) {}
 
   SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
-      : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts)  {}
+      : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(M.getCmps()[I])), SplitInsts(SplitInsts)  {}
 
   const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
-  const Comparison* getCmp() const { return Cmp; }
+  const Comparison* getCmp() const { return Cmp.get(); }
 
   bool operator<(const SingleBCECmpBlock &O) const {
     return *Cmp < *O.Cmp;
@@ -367,7 +368,7 @@ class SingleBCECmpBlock {
   bool RequireSplit = false;
 
 private:
-  Comparison* Cmp;
+  std::shared_ptr<Comparison> Cmp;
   llvm::SmallVector<Instruction *, 4> SplitInsts;
 };
 
@@ -382,7 +383,7 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
     };
-    for (auto* Cmp : Cmps) {
+    for (auto& Cmp : Cmps) {
       auto [Lhs,Rhs] = Cmp->getLoads();
       if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
         return false;
@@ -426,7 +427,7 @@ bool MultBCECmpBlock::doesOtherWork() const {
 
 // Visit the given comparison. If this is a comparison between two valid
 // BCE atoms, or between a BCE atom and a constant, returns the comparison.
-std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
+std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
                                 const ICmpInst::Predicate ExpectedPredicate,
                                 BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
   // The comparison can only be used once:
@@ -456,12 +457,12 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
 
   BlockInsts->insert(CmpI);
   if (auto const& Const = dyn_cast<Constant>(RhsOperand))
-    return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
+    return std::make_shared<BCEConstCmp>(BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
 
   auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
   if (!Rhs.BaseId)
     return std::nullopt;
-  return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
+  return std::make_shared<BCECmp>(BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
 }
 
 // Chain of comparisons inside a single basic block connected using `select` nodes.
@@ -494,8 +495,12 @@ std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
 
 std::optional<IntraCmpChain> visitComparison(Value *Cond,
             ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
-  if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
-    return visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+  if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) {
+    auto CmpVisit = visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+    if (!CmpVisit)
+      return std::nullopt;
+    return IntraCmpChain(*CmpVisit);
+  }
   if (auto *SelectI = dyn_cast<SelectInst>(Cond))
     return visitSelect(SelectI, ExpectedPredicate, BaseId, BlockInsts);
 

>From b5b557c735d7b602d2b7c9309fbf46daf4636725 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 23:01:38 +0100
Subject: [PATCH 15/23] [MergeICmps] Reduced copies for mergeBlocks

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 41 +++++++++--------------
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 67def6b0f09da..447c909f84aec 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -625,36 +625,34 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
 
 /// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
 /// ranges that can be merged together into a single comparison.
-static std::vector<BCECmpChain::ContiguousBlocks>
-mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
-  std::vector<BCECmpChain::ContiguousBlocks> MergedBlocks;
-
+template<class RandomIt>
+static void mergeBlocks(RandomIt First, RandomIt Last,
+                        std::vector<BCECmpChain::ContiguousBlocks>* MergedBlocks) {
   // Sort to detect continuous offsets.
-  llvm::sort(Blocks,
+  llvm::sort(First, Last,
              [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
               return LhsBlock < RhsBlock;
              });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
-  for (SingleBCECmpBlock &Block : Blocks) {
-    if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
-      MergedBlocks.emplace_back();
-      LastMergedBlock = &MergedBlocks.back();
+  int Offset = MergedBlocks->size();
+  for (auto& BlockIt = First; BlockIt != Last; ++BlockIt) {
+    if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
+      MergedBlocks->emplace_back();
+      LastMergedBlock = &MergedBlocks->back();
     } else {
-      LLVM_DEBUG(dbgs() << "Merging block " << Block.BB->getName() << " into "
+      LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName() << " into "
                         << LastMergedBlock->back().BB->getName() << "\n");
     }
-    LastMergedBlock->push_back(std::move(Block));
+    LastMergedBlock->push_back(std::move(*BlockIt));
   }
 
   // While we allow reordering for merging, do not reorder unmerged comparisons.
   // Doing so may introduce branch on poison.
-  llvm::sort(MergedBlocks, [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
+  llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(), [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
                               const BCECmpChain::ContiguousBlocks &RhsBlocks) {
     return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
   });
-
-  return MergedBlocks;
 }
 
 BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
@@ -737,19 +735,12 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
 
   EntryBlock_ = Comparisons[0].BB;
 
-  std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
   auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
-  // TODO: too many copies here
-  std::partition_copy(Comparisons.begin(), Comparisons.end(), 
-                      std::back_inserter(ConstComparisons), 
-                      std::back_inserter(BceComparisons),
-                      isConstCmp);
-
-  auto MergedConstCmpBlocks = mergeBlocks(std::move(ConstComparisons));
-  auto MergedBCECmpBlocks = mergeBlocks(std::move(BceComparisons));
+  auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
 
-  MergedBlocks_.insert(MergedBlocks_.end(),MergedBCECmpBlocks.begin(),MergedBCECmpBlocks.end());
-  MergedBlocks_.insert(MergedBlocks_.end(),MergedConstCmpBlocks.begin(),MergedConstCmpBlocks.end());
+  // this will order the merged BCE-comparisons before the BCE-const-comparisons
+  mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+  mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
 }
 
 namespace {

>From 31ea42eb04db0ed0d2f346b364d0517ddcdbd997 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 17:42:11 +0100
Subject: [PATCH 16/23] [MergeICmps] Don't split up select blocks if they
 aren't merged in the cmp-chain

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 200 +++++++++++-------
 .../MergeICmps/X86/entry-block-shuffled.ll    |  16 +-
 .../X86/not-split-unmerged-select.ll          |  53 +----
 .../MergeICmps/X86/partial-select-merge.ll    | 165 +++++----------
 4 files changed, 203 insertions(+), 231 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 447c909f84aec..5943d717276c4 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -602,10 +602,13 @@ class BCECmpChain {
   bool simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
                 DomTreeUpdater &DTU);
 
+  bool multBlockOnlyPartiallyMerged();
+
   bool atLeastOneMerged() const {
     return any_of(MergedBlocks_,
                   [](const auto &Blocks) { return Blocks.size() > 1; });
-  }
+  };
+
 
 private:
   PHINode &Phi_;
@@ -616,6 +619,25 @@ class BCECmpChain {
   BasicBlock *EntryBlock_;
 };
 
+
+// Returns true if a merge in the chain depends on a basic block where not every comparison is merged.
+// NOTE: This is pretty restrictive and could potentially be handled using an improved tradeoff heuristic.
+bool BCECmpChain::multBlockOnlyPartiallyMerged() {
+  llvm::SmallDenseSet<const BasicBlock*, 8> UnmergedBlocks, MergedBB;
+
+  for (auto& Merged : MergedBlocks_) {
+    if (Merged.size() == 1) {
+      UnmergedBlocks.insert(Merged[0].BB);
+      continue;
+    }
+    for (auto& C : Merged)
+      MergedBB.insert(C.BB);
+  }
+  return llvm::any_of(MergedBB, [&](const BasicBlock* BB){
+    return UnmergedBlocks.contains(BB);
+  });
+}
+
 static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
   unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
   for (const SingleBCECmpBlock &Block : Blocks)
@@ -655,6 +677,7 @@ static void mergeBlocks(RandomIt First, RandomIt Last,
   });
 }
 
+
 BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
                          AliasAnalysis &AA)
     : Phi_(Phi) {
@@ -796,14 +819,41 @@ class MergedBlockName {
 };
 } // namespace
 
+
+void updateBranching(Value* CondResult,
+                     IRBuilder<>& Builder,
+                     BasicBlock *BB,
+                     BasicBlock *const NextCmpBlock,
+                     PHINode &Phi,
+                     LLVMContext &Context,
+                     const TargetLibraryInfo &TLI,
+                     AliasAnalysis &AA, DomTreeUpdater &DTU) {
+  BasicBlock *const PhiBB = Phi.getParent();
+  // Add a branch to the next basic block in the chain.
+  if (NextCmpBlock == PhiBB) {
+    // Continue to phi, passing it the comparison result.
+    Builder.CreateBr(PhiBB);
+    Phi.addIncoming(CondResult, BB);
+    DTU.applyUpdates({{DominatorTree::Insert, BB, PhiBB}});
+  } else {
+    // Continue to next block if equal, exit to phi else.
+    Builder.CreateCondBr(CondResult, NextCmpBlock, PhiBB);
+    Phi.addIncoming(ConstantInt::getFalse(Context), BB);
+    DTU.applyUpdates({{DominatorTree::Insert, BB, NextCmpBlock},
+                      {DominatorTree::Insert, BB, PhiBB}});
+  }
+}
+
+
 // Merges the given contiguous comparison blocks into one memcmp block.
 static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
                                     BasicBlock *const InsertBefore,
                                     BasicBlock *const NextCmpBlock,
-                                    PHINode &Phi, const TargetLibraryInfo &TLI,
+                                    PHINode &Phi,
+                                    LLVMContext &Context,
+                                    const TargetLibraryInfo &TLI,
                                     AliasAnalysis &AA, DomTreeUpdater &DTU) {
-  assert(!Comparisons.empty() && "merging zero comparisons");
-  LLVMContext &Context = NextCmpBlock->getContext();
+  assert(Comparisons.size() > 1 && "merging multiple comparisons");
   const SingleBCECmpBlock &FirstCmp = Comparisons[0];
 
   // Create a new cmp block before next cmp block.
@@ -818,92 +868,81 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
     Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
   else
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
+
   // Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
   if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
-    if (Comparisons.size() > 1) {
-      std::vector<Constant*> Constants;
-      std::vector<Type*> Types;
-      for (const auto& BceBlock : Comparisons) {
-        auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
-        Constants.emplace_back(ConstCmp->Const);
-        Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
-      }
-      // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
-      auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
-      auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
-      auto *StructConstant = ConstantStruct::get(StructType, Constants);
-      Builder.CreateStore(StructConstant, StructAlloca);
-      Rhs = StructAlloca;
+    std::vector<Constant*> Constants;
+    std::vector<Type*> Types;
+    for (const auto& BceBlock : Comparisons) {
+      auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+      Constants.emplace_back(ConstCmp->Const);
+      Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
     }
-  } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+    // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
+    auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+    auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+    auto *StructConstant = ConstantStruct::get(StructType, Constants);
+    Builder.CreateStore(StructConstant, StructAlloca);
+    Rhs = StructAlloca;
+  } else {
+    auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
     if (FirstBceCmp->Rhs.GEP)
       Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
     else
       Rhs = FirstBceCmp->Rhs.LoadI->getPointerOperand();
   }
-  Value *IsEqual = nullptr;
   LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
                     << BB->getName() << "\n");
 
   // If there is one block that requires splitting, we do it now, i.e.
   // just before we know we will collapse the chain. The instructions
   // can be executed before any of the instructions in the chain.
-  const auto ToSplit = llvm::find_if(
+  const auto* ToSplit = llvm::find_if(
       Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
   if (ToSplit != Comparisons.end()) {
     LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
     ToSplit->split(BB, AA);
   }
 
-  if (Comparisons.size() == 1) {
-    LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
-    // Use clone to keep the metadata
-    Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs()->LoadI->clone());
-    LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
-    // There are no blocks to merge, just do the comparison.
-    if (auto* ConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp()))
-      IsEqual = Builder.CreateICmpEQ(LhsLoad, ConstCmp->Const);
-    else if (const auto& BceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
-      Instruction *const RhsLoad = Builder.Insert(BceCmp->Rhs.LoadI->clone());
-      RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
-      IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
-    }
-  } else {
-    // memcmp expects a 'size_t' argument and returns 'int'.
-    unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
-    unsigned IntBits = TLI.getIntSize();
-    const unsigned TotalSizeBits = std::accumulate(
-        Comparisons.begin(), Comparisons.end(), 0u,
-        [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
-
-
-    // Create memcmp() == 0.
-    const auto &DL = Phi.getDataLayout();
-    Value *const MemCmpCall = emitMemCmp(
-        Lhs, Rhs,
-        ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
-        Builder, DL, &TLI);
-    IsEqual = Builder.CreateICmpEQ(
-        MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
-  }
-
-  BasicBlock *const PhiBB = Phi.getParent();
-  // Add a branch to the next basic block in the chain.
-  if (NextCmpBlock == PhiBB) {
-    // Continue to phi, passing it the comparison result.
-    Builder.CreateBr(PhiBB);
-    Phi.addIncoming(IsEqual, BB);
-    DTU.applyUpdates({{DominatorTree::Insert, BB, PhiBB}});
-  } else {
-    // Continue to next block if equal, exit to phi else.
-    Builder.CreateCondBr(IsEqual, NextCmpBlock, PhiBB);
-    Phi.addIncoming(ConstantInt::getFalse(Context), BB);
-    DTU.applyUpdates({{DominatorTree::Insert, BB, NextCmpBlock},
-                      {DominatorTree::Insert, BB, PhiBB}});
-  }
+  // memcmp expects a 'size_t' argument and returns 'int'.
+  unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+  unsigned IntBits = TLI.getIntSize();
+  const unsigned TotalSizeBits = std::accumulate(
+      Comparisons.begin(), Comparisons.end(), 0u,
+      [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+
+  // Create memcmp() == 0.
+  const auto &DL = Phi.getDataLayout();
+  Value *const MemCmpCall = emitMemCmp(
+      Lhs, Rhs,
+      ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
+      Builder, DL, &TLI);
+  Value* IsEqual = Builder.CreateICmpEQ(
+      MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
+
+  updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
   return BB;
 }
 
+// Keep existing block if it isn't merged. Only change the branches.
+// Also handles not splitting mult-blocks that use select instructions.
+static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
+                                    BasicBlock *const InsertBefore,
+                                    BasicBlock *const NextCmpBlock,
+                                    PHINode &Phi,
+                                    LLVMContext &Context,
+                                    const TargetLibraryInfo &TLI,
+                                    AliasAnalysis &AA, DomTreeUpdater &DTU) {
+  BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
+                         NextCmpBlock->getParent(), InsertBefore);
+  // Transfer all instructions except the branching terminator to the new block.
+  MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
+  Value* CondResult = cast<Value>(&MultBB->back());
+  IRBuilder<> Builder(MultBB);
+  updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+  return MultBB;
+}
+
 bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
                            DomTreeUpdater &DTU) {
   assert(atLeastOneMerged() && "simplifying trivial BCECmpChain");
@@ -914,9 +953,23 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   // so that the next block is always available to branch to.
   BasicBlock *InsertBefore = EntryBlock_;
   BasicBlock *NextCmpBlock = Phi_.getParent();
-  for (const auto &Blocks : reverse(MergedBlocks_)) {
-    InsertBefore = NextCmpBlock = mergeComparisons(
-        Blocks, InsertBefore, NextCmpBlock, Phi_, TLI, AA, DTU);
+  SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
+  LLVMContext &Context = NextCmpBlock->getContext();
+  for (const auto &Cmps : reverse(MergedBlocks_)) {
+    // TODO: Check if single comparisons should also be split!
+    // If there is only a single comparison then nothing should be merged and can use original block.
+    if (Cmps.size() == 1) {
+      // If a comparison from a mult-block is already handled then don't emit same block again.
+      BasicBlock *const BB = Cmps[0].BB;
+      if (ExistingBlocksToKeep.contains(BB))
+        continue;
+      ExistingBlocksToKeep.insert(BB);
+      InsertBefore = NextCmpBlock = updateOriginalBlock(
+        BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+    } else {
+      InsertBefore = NextCmpBlock = mergeComparisons(
+          Cmps, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+    }
   }
 
   // Replace the original cmp chain with the new cmp chain by pointing all
@@ -947,7 +1000,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   SmallVector<BasicBlock *, 16> DeadBlocks;
   for (const auto &Blocks : MergedBlocks_) {
     for (const SingleBCECmpBlock &Block : Blocks) {
-      // Many single blocks can refer to the same multblock coming from an select instruction
+      // Many single blocks can refer to the same multblock coming from an select instruction.
       // TODO: preferrably use a set instead
       if (llvm::is_contained(DeadBlocks, Block.BB))
         continue;
@@ -1069,6 +1122,11 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
     return false;
   }
 
+  if (CmpChain.multBlockOnlyPartiallyMerged()) {
+    LLVM_DEBUG(dbgs() << "chain uses not fully merged basic block, no merge\n");
+    return false;
+  }
+
   return CmpChain.simplify(TLI, AA, DTU);
 }
 
diff --git a/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll b/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
index bc6beefb2caee..65156697f1892 100644
--- a/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
@@ -11,10 +11,10 @@ define zeroext i1 @opeq1(
 ; CHECK-LABEL: @opeq1(
 ; CHECK-NEXT:  entry2:
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds [[S:%.*]], ptr [[A:%.*]], i64 0, i32 3
-; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds [[S]], ptr [[B:%.*]], i64 0, i32 2
-; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[TMP1]], align 4
-; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[TMP2]], [[TMP3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds [[S]], ptr [[B:%.*]], i64 0, i32 2
+; CHECK-NEXT:    [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[TMP1]], [[TMP3]]
 ; CHECK-NEXT:    br i1 [[TMP4]], label %"land.rhs.i+land.rhs.i.2", label [[OPEQ1_EXIT:%.*]]
 ; CHECK:       "land.rhs.i+land.rhs.i.2":
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A]], ptr [[B]], i64 8)
@@ -22,10 +22,10 @@ define zeroext i1 @opeq1(
 ; CHECK-NEXT:    br i1 [[TMP5]], label [[LAND_RHS_I_31:%.*]], label [[OPEQ1_EXIT]]
 ; CHECK:       land.rhs.i.31:
 ; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds [[S]], ptr [[A]], i64 0, i32 3
-; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds [[S]], ptr [[B]], i64 0, i32 3
-; CHECK-NEXT:    [[TMP8:%.*]] = load i32, ptr [[TMP6]], align 4
-; CHECK-NEXT:    [[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4
-; CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i32 [[TMP8]], [[TMP9]]
+; CHECK-NEXT:    [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds [[S]], ptr [[B]], i64 0, i32 3
+; CHECK-NEXT:    [[TMP9:%.*]] = load i32, ptr [[TMP8]], align 4
+; CHECK-NEXT:    [[TMP10:%.*]] = icmp eq i32 [[TMP7]], [[TMP9]]
 ; CHECK-NEXT:    br label [[OPEQ1_EXIT]]
 ; CHECK:       opeq1.exit:
 ; CHECK-NEXT:    [[TMP11:%.*]] = phi i1 [ [[TMP10]], [[LAND_RHS_I_31]] ], [ false, %"land.rhs.i+land.rhs.i.2" ], [ false, [[ENTRY2:%.*]] ]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index 874ea22e75106..582b57d8c60ce 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,5 +1,4 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
 
 ; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
 
@@ -86,26 +85,23 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
   ret i1 %7
 }
 
-; p[12] and p[13] mergable, select blocks are split even though they aren't merged. simplifycfg merges them back.
-; NOTE: Ideally wouldn't always split and thus not rely on simplifycfg.
+; p[12] and p[13] mergable, select mult-block is part of the chain but isn't merged and won't get split up into its single comparisons.
 
 define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
 ; REG-LABEL: @partial_merge_not_select(
-; REG:       entry5:
+; REG:       entry3:
 ; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
 ; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT:    br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END]]
-; REG:       entry4:
 ; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
 ; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT:    br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END:%.*]]
-; REG:       entry3:
 ; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
 ; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
 ; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT:    br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
+; REG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; REG:       "land.lhs.true11+land.rhs":
 ; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
 ; REG-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
@@ -124,42 +120,9 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
 ; REG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
 ; REG-NEXT:    br label [[LAND_END]]
 ; REG:  land.end:
-; REG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, [[ENTRY_3]] ], [ false, [[ENTRY_4]] ], [ false, %entry5 ]
+; REG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
 ; REG-NEXT:    ret i1 [[RES]]
 ;
-; CFG-LABEL: @partial_merge_not_select(
-; CFG:       entry5:
-; CFG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; CFG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; CFG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; CFG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; CFG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; CFG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; CFG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; CFG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
-; CFG:       "land.lhs.true11+land.rhs":
-; CFG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; CFG-NEXT:    store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
-; CFG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
-; CFG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; CFG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; CFG-NEXT:    [[SEL2:%.*]] = select i1 [[CMP3]], i1 [[CMP4]], i1 false
-; CFG-NEXT:    br i1 [[SEL2]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; CFG:       land.lhs.true211:
-; CFG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; CFG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; CFG-NEXT:    br label [[LAND_END]]
-; CFG:  land.end:
-; CFG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry5 ]
-; CFG-NEXT:    ret i1 [[RES]]
 entry:
   %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
   %0 = load i8, ptr %arrayidx, align 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 20a3faa854836..317a3a1464536 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -1,64 +1,49 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
 
-; REG checks the IR when only mergeicmps is run.
-; CFG checks the IR when simplifycfg is run afterwards to merge distinct blocks back together.
-
-; Can merge part of a select block even if not entire block mergable.
+; Cannot merge only part of a select block if not entire block mergable.
 
 define zeroext i1 @cmp_partially_mergable_select(
     ptr nocapture readonly align 4 dereferenceable(24) %a,
     ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
 ; REG-LABEL: @cmp_partially_mergable_select(
-; REG:      "land.lhs.true+land.rhs+land.lhs.true4":
-; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
-; REG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
-; REG:      land.lhs.true103:
-; REG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; REG-NEXT:   [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; REG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
-; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
-; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
-; REG-NEXT:   br i1 [[CMP2]], label [[ENTRY2:%.*]], label [[LAND_END]]
-; REG:      entry2:
-; REG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; REG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
-; REG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
-; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
-; REG:      land.lhs.true41:
-; REG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; REG-NEXT:   [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
-; REG-NEXT:   [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
-; REG-NEXT:   br label %land.end
+; REG:      entry:
+; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; REG-NEXT:   [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
+; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
+; REG-NEXT:   br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; REG:      land.lhs.true:
+; REG-NEXT:   [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; REG-NEXT:   [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
+; REG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
+; REG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
+; REG:      land.lhs.true4:
+; REG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
+; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
+; REG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
+; REG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; REG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
+; REG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
+; REG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; REG-NEXT:   br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
+; REG:      land.lhs.true10:
+; REG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; REG-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
+; REG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; REG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
+; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
+; REG-NEXT:   br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG:      land.rhs:
+; REG-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
+; REG-NEXT:   [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
+; REG-NEXT:   [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
+; REG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
+; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
+; REG-NEXT:   br label [[LAND_END]]
 ; REG:      land.end:
-; REG-NEXT:   [[TMP8:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[ENTRY2]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
-; REG-NEXT:   ret i1 [[TMP8]]
-;
-; CFG-LABEL: @cmp_partially_mergable_select(
-; CFG:      "land.lhs.true+land.rhs+land.lhs.true4":
-; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
-; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
-; CFG:      land.lhs.true103:
-; CFG-NEXT:   [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; CFG-NEXT:   [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; CFG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
-; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
-; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
-; CFG-NEXT:   [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CFG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
-; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
-; CFG-NEXT:   [[SEL:%.*]] = select i1 %5, i1 %8, i1 false
-; CFG-NEXT:   br i1 [[SEL]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
-; CFG:      land.lhs.true41:
-; CFG-NEXT:   [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; CFG-NEXT:   [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
-; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
-; CFG-NEXT:   br label %land.end
-; CFG:      land.end:
-; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
-; CFG-NEXT:   ret i1 [[RES]]
+; REG-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
+; REG-NEXT:   ret i1 [[RES]]
 ;
 entry:
   %e = getelementptr inbounds nuw i8, ptr %a, i64 8
@@ -106,82 +91,48 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 }
 
 
-; p[12] and p[13] are mergable. p[12] is inside of a select block which will be split up.
-; MergeICmps always splits up matching select blocks. The following simplifycfg pass merges them back together.
+; p[12] and p[13] are mergable. p[12] is inside of a select block which will not be split up, so it shouldn't merge them.
 
 define dso_local zeroext i1 @cmp_partially_mergable_select_array(
     ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
 ; REG-LABEL: @cmp_partially_mergable_select_array(
-; REG: "entry+land.rhs":
+; REG:       entry:
 ; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT:   [[TMP0:%.*]] = alloca <{ i8, i8 }>
-; REG-NEXT:   store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
-; REG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
-; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT:   br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
-; REG: entry5:
+; REG-NEXT:   [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
 ; REG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
 ; REG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT:   br i1 [[CMP1]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
-; REG: entry4:
 ; REG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
 ; REG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
 ; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT:   br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
-; REG: land.lhs.true113:
+; REG-NEXT:   [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT:   br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
+; REG:       land.lhs.true11:
 ; REG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
 ; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
 ; REG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
-; REG: land.lhs.true162:
+; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true16:
 ; REG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
 ; REG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
 ; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
-; REG: land.lhs.true211:
+; REG-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
+; REG:       land.lhs.true21:
 ; REG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
 ; REG-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
 ; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT:   br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG:       land.rhs:
+; REG-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
+; REG-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; REG-NEXT:   [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
 ; REG-NEXT:   br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_4]] ], [ false, [[ENTRY_5]] ], [ false, %"entry+land.rhs" ]
+; REG:       land.end:
+; REG-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
 ; REG-NEXT:   ret i1 [[RES]]
 ;
-;
-; CFG-LABEL: @cmp_partially_mergable_select_array(
-; CFG:      "entry+land.rhs":
-; CFG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT:   [[TMP0:%.*]] = alloca <{ i8, i8 }>
-; CFG-NEXT:   store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
-; CFG-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
-; CFG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; CFG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; CFG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; CFG-NEXT:   [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; CFG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; CFG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; CFG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; CFG-NEXT:   [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP3]], i1 false
-; CFG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; CFG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; CFG-NEXT:   [[SEL3:%.*]] = select i1 [[SEL2]], i1 [[CMP4]], i1 false
-; CFG-NEXT:   br i1 [[SEL3]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
-; CFG:      land.lhs.true211:
-; CFG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; CFG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; CFG-NEXT:   br label [[LAND_END]]
-; CFG:      land.end:
-; CFG-NEXT:   [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, %"entry+land.rhs" ]
-; CFG-NEXT:   ret i1 [[RES]]
-;
 entry:
   %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
   %0 = load i8, ptr %arrayidx, align 1

>From ebf207543d463f2875ecddb032fa9d24a87a8345 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 21:03:44 +0100
Subject: [PATCH 17/23] [MergeICmps] Ensure cmp-chains that require splitting
 come first

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     |  22 ++-
 .../MergeICmps/X86/mixed-cmp-split.ll         | 175 ++++++++++++++++++
 2 files changed, 190 insertions(+), 7 deletions(-)
 create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 5943d717276c4..a5d2895c9f0e1 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -320,7 +320,7 @@ class MultBCECmpBlock {
   // instructions in the block.
   bool canSplit(AliasAnalysis &AA) const;
 
-  // Return true if this all the relevant instructions in the BCE-cmp-block can
+  // Return true if all the relevant instructions in the BCE-cmp-block can
   // be sunk below this instruction. By doing this, we know we can separate the
   // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
   // block.
@@ -761,9 +761,16 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
   auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
   auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
 
-  // this will order the merged BCE-comparisons before the BCE-const-comparisons
-  mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
-  mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+  // The chain that requires splitting should always be first.
+  // If no chain requires splitting then defaults to BCE-comparisons coming first.
+  if (std::any_of(Comparisons.begin(), BceIt,
+                   [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
+    mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+    mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+  } else {
+    mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+    mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+  }
 }
 
 namespace {
@@ -956,10 +963,11 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
   LLVMContext &Context = NextCmpBlock->getContext();
   for (const auto &Cmps : reverse(MergedBlocks_)) {
-    // TODO: Check if single comparisons should also be split!
-    // If there is only a single comparison then nothing should be merged and can use original block.
+    // If there is only a single comparison then nothing should
+    // be merged and can use original block.
     if (Cmps.size() == 1) {
-      // If a comparison from a mult-block is already handled then don't emit same block again.
+      // If a comparison from a mult-block is already handled
+      // then don't emit same block again.
       BasicBlock *const BB = Cmps[0].BB;
       if (ExistingBlocksToKeep.contains(BB))
         continue;
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
new file mode 100644
index 0000000000000..61fdd2b7e17e9
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -0,0 +1,175 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; define dso_local noundef zeroext i1 @cmp_mixed_split(ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %a, ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %b) local_unnamed_addr {
+; entry:
+;   %0 = load i32, ptr %a, align 4
+;   %1 = load i32, ptr %b, align 4
+;   %cmp = icmp eq i32 %0, %1
+;   br i1 %cmp, label %land.lhs.true, label %land.end
+; 
+; land.lhs.true:                                    ; preds = %entry
+;   %e = getelementptr inbounds nuw i8, ptr %a, i64 20
+;   %2 = load i32, ptr %e, align 4
+;   %a3 = getelementptr inbounds nuw i8, ptr %b, i64 4
+;   %3 = load i32, ptr %a3, align 4
+;   %b2 = getelementptr inbounds nuw i8, ptr %a, i64 8
+;   %4 = load i32, ptr %b2, align 4
+;   %c = getelementptr inbounds nuw i8, ptr %a, i64 12
+;   %5 = load i8, ptr %c, align 4
+;   %a1 = getelementptr inbounds nuw i8, ptr %a, i64 4
+;   %6 = load i32, ptr %a1, align 4
+;   %d = getelementptr inbounds nuw i8, ptr %a, i64 16
+;   %7 = load i32, ptr %d, align 4
+;   %cmp5 = icmp eq i32 %6, %3
+;   %cmp7 = icmp eq i8 %5, 43
+;   %or.cond = select i1 %cmp5, i1 %cmp7, i1 false
+;   %cmp9 = icmp eq i32 %4, 1
+;   %or.cond13 = select i1 %or.cond, i1 %cmp9, i1 false
+;   %cmp11 = icmp eq i32 %7, 12
+;   %or.cond14 = select i1 %or.cond13, i1 %cmp11, i1 false
+;   %cmp12 = icmp eq i32 %2, 3
+;   %spec.select = select i1 %or.cond14, i1 %cmp12, i1 false
+;   br label %land.end
+; 
+; land.end:                                         ; preds = %land.lhs.true, %entry
+;   %8 = phi i1 [ false, %entry ], [ %spec.select, %land.lhs.true ]
+;   ret i1 %8
+; }
+
+
+
+
+declare void @foo(...)
+
+; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.
+
+define dso_local noundef zeroext i1 @cmp_mixed_const_first(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed_const_first(
+; This merged-block should come first as it should be split.
+; CHECK:  "entry+land.rhs+land.lhs.true8":
+; CHECK-NEXT:    call void (...) @foo() #[[ATTR2:[0-9]+]]
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
+; CHECK-NEXT:    [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[CMP0:%.*]] = icmp eq i32 [[MEMCMP0]], 0
+; CHECK-NEXT:    br i1 [[CMP0]], label [[LAND_LHS_TRUE10:%.*]], label [[LAND_END:%.*]]
+; CHECK:   "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT:    [[MEMCMP1:%.*]] = call i32 @memcmp(ptr [[A]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i32 [[MEMCMP1]], 0
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP1]], [[LAND_LHS_TRUE10]] ], [ false, [[ENTRY_LAND_RHS:%.*]] ]
+; CHECK-NEXT:    ret i1 [[RES]]
+;
+entry:
+  %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %0 = load i32, ptr %e, align 4
+  %cmp = icmp eq i32 %0, 255
+  call void (...) @foo() inaccessiblememonly
+  br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true:                                    ; preds = %entry
+  %1 = load i32, ptr %a, align 4
+  %2 = load i32, ptr %b, align 4
+  %cmp3 = icmp eq i32 %1, %2
+  br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4:                                   ; preds = %land.lhs.true
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+  %3 = load i8, ptr %c, align 1
+  %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+  %4 = load i8, ptr %c5, align 1
+  %cmp7 = icmp eq i8 %3, %4
+  br i1 %cmp7, label %land.lhs.true8, label %land.end
+
+land.lhs.true8:                                   ; preds = %land.lhs.true4
+  %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+  %5 = load i32, ptr %g, align 4
+  %cmp9 = icmp eq i32 %5, 100
+  br i1 %cmp9, label %land.lhs.true10, label %land.end
+
+land.lhs.true10:                                  ; preds = %land.lhs.true8
+  %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %6 = load i8, ptr %b11, align 4
+  %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %7 = load i8, ptr %b13, align 4
+  %cmp15 = icmp eq i8 %6, %7
+  br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true10
+  %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+  %8 = load i32, ptr %f, align 4
+  %cmp16 = icmp eq i32 %8, 200
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true8, %land.lhs.true4, %land.lhs.true, %entry
+  %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true8 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+  ret i1 %9
+}
+
+; If block to split it in BCE-comparison that that block should be first.
+
+define dso_local noundef zeroext i1 @cmp_mixed_bce_first(
+    ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
+    ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed_bce_first(
+; CHECK:   "entry+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT:    call void (...) @foo() #[[ATTR2:[0-9]+]]
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[CMP1]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; CHECK:  "land.lhs.true+land.rhs+land.lhs.true4":
+; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:    [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[LAND_LHS_TRUE]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret i1 [[TMP4]]
+;
+entry:
+  %0 = load i32, ptr %a, align 4
+  %1 = load i32, ptr %b, align 4
+  call void (...) @foo() inaccessiblememonly
+  %cmp3 = icmp eq i32 %0, %1
+  br i1 %cmp3, label %land.lhs.true, label %land.end
+
+land.lhs.true:
+  %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+  %2 = load i32, ptr %e, align 4
+  %cmp = icmp eq i32 %2, 255
+  br i1 %cmp, label %land.lhs.true4, label %land.end
+
+land.lhs.true4:                                   ; preds = %land.lhs.true
+  %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+  %3 = load i8, ptr %c, align 1
+  %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+  %4 = load i8, ptr %c5, align 1
+  %cmp7 = icmp eq i8 %3, %4
+  %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+  %5 = load i32, ptr %g, align 4
+  %cmp9 = icmp eq i32 %5, 100
+  %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+  br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10:                                  ; preds = %land.lhs.true4
+  %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+  %6 = load i8, ptr %b11, align 4
+  %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+  %7 = load i8, ptr %b13, align 4
+  %cmp15 = icmp eq i8 %6, %7
+  br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs:                                         ; preds = %land.lhs.true10
+  %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+  %8 = load i32, ptr %f, align 4
+  %cmp16 = icmp eq i32 %8, 200
+  br label %land.end
+
+land.end:                                         ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+  %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+  ret i1 %9
+}

>From b57565426049fe2cc63435b435ea75bc1c81cb56 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 21:21:20 +0100
Subject: [PATCH 18/23] [MergeICmps] Made instruction splicing more robust by
 not assuming second to last inst holds cond-result

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index a5d2895c9f0e1..30852a846f3d7 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -942,9 +942,14 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
                                     AliasAnalysis &AA, DomTreeUpdater &DTU) {
   BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
                          NextCmpBlock->getParent(), InsertBefore);
+  auto *const BranchI = cast<BranchInst>(BB->getTerminator());
+  Value* CondResult = nullptr;
+  if (BranchI->isUnconditional())
+    CondResult = Phi.getIncomingValueForBlock(BB);
+  else
+    CondResult = cast<Value>(BranchI->getCondition());
   // Transfer all instructions except the branching terminator to the new block.
   MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
-  Value* CondResult = cast<Value>(&MultBB->back());
   IRBuilder<> Builder(MultBB);
   updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
   return MultBB;

>From cb53e53e790d0dffbace8d8bd86c4d48a8752bbe Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sun, 23 Mar 2025 20:42:10 +0100
Subject: [PATCH 19/23] [MergeICmps] Cleaned up code and added new debug info

---
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     | 170 ++++++++++--------
 .../MergeICmps/X86/mixed-cmp-split.ll         |  39 ----
 2 files changed, 93 insertions(+), 116 deletions(-)

diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 30852a846f3d7..21b97a0b45faf 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -185,6 +185,9 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, Instructi
 }
 
 
+// An abstract parent class that can either be a comparison of
+// two BCEAtoms with the same offsets to a base pointer (BCECmp)
+// or a comparison of a single BCEAtom with a constant (BCEConstCmp).
 struct Comparison {
 public:
   enum CompKind {
@@ -197,14 +200,11 @@ struct Comparison {
   int SizeBits;
   const ICmpInst *CmpI;
 
-  using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
-
   Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
         : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
   CompKind getKind() const { return Kind; }
 
   virtual ~Comparison() = default;
-  virtual LoadOperands getLoads() = 0;
   bool areContiguous(const Comparison& Other) const;
   bool operator<(const Comparison &Other) const;
 };
@@ -226,10 +226,6 @@ struct BCEConstCmp : public Comparison {
   static bool classof(const Comparison* C) {
     return C->getKind() == CK_ConstCmp;
   }
-  
-  Comparison::LoadOperands getLoads() override {
-    return std::make_pair(&Lhs,std::nullopt);
-  }
 };
 
 // A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -249,10 +245,6 @@ struct BCECmp : public Comparison {
   static bool classof(const Comparison* C) {
     return C->getKind() == CK_BceCmp;
   }
-
-  Comparison::LoadOperands getLoads() override {
-    return std::make_pair(&Lhs,&Rhs);
-  }
 };
 
 // TODO: this can be improved to take alignment into account.
@@ -286,7 +278,7 @@ bool Comparison::operator<(const Comparison& Other) const {
 }
 
 // Represents multiple comparisons inside of a single basic block.
-// This happens if multiple basic blocks have previously been merged into a single using a select node.
+// This happens if multiple basic blocks have previously been merged into a single block using a select node.
 class IntraCmpChain {
   // TODO: this could probably be a unique-ptr but current impl relies on some copies
   std::vector<std::shared_ptr<Comparison>> CmpChain;
@@ -302,7 +294,6 @@ class IntraCmpChain {
   }
 };
 
-
 // A basic block that contains one or more comparisons.
 class MultBCECmpBlock {
  public:
@@ -326,6 +317,9 @@ class MultBCECmpBlock {
   // block.
   bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
 
+  // Returns all instructions that should be split off of the comparison chain.
+  llvm::SmallVector<Instruction *, 4> getAllSplitInsts(AliasAnalysis &AA) const;
+
   // The basic block where this comparison happens.
   BasicBlock *BB;
   // Instructions relating to the BCECmp and branch.
@@ -342,15 +336,20 @@ class MultBCECmpBlock {
 // (see canSplit()).
 class SingleBCECmpBlock {
  public:
-  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
-      : BB(M.BB), OrigOrder(OrigOrder), Cmp(std::move(M.getCmps()[I])) {}
-
-  SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
-      : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(M.getCmps()[I])), SplitInsts(SplitInsts)  {}
-
-  const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
+  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder)
+      : BB(BB), OrigOrder(OrigOrder), Cmp(std::move(Cmp)) {}
+
+  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder,
+                    llvm::SmallVector<Instruction *, 4> SplitInsts)
+      : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)), SplitInsts(SplitInsts) {}
+
+  const BCEAtom* Lhs() const {
+    if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
+      return &BceConstCmp->Lhs;
+    auto *const BceCmp = cast<BCECmp>(Cmp.get());
+    return &BceCmp->Lhs;
+  }
   const Comparison* getCmp() const { return Cmp.get(); }
-
   bool operator<(const SingleBCECmpBlock &O) const {
     return *Cmp < *O.Cmp;
   }
@@ -383,11 +382,14 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
     };
-    for (auto& Cmp : Cmps) {
-      auto [Lhs,Rhs] = Cmp->getLoads();
-      if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
-        return false;
-    }
+    auto CmpLoadsAreClobbered = [&](const auto& Cmp) {
+      if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
+        return MayClobber(BceConstCmp->Lhs.LoadI);
+      auto *const BceCmp = cast<BCECmp>(Cmp.get());
+      return MayClobber(BceCmp->Lhs.LoadI) || MayClobber(BceCmp->Rhs.LoadI);
+    };
+    if (llvm::any_of(Cmps, CmpLoadsAreClobbered))
+      return false;
   }
   // Make sure this instruction does not use any of the BCE cmp block
   // instructions as operand.
@@ -425,6 +427,20 @@ bool MultBCECmpBlock::doesOtherWork() const {
   return false;
 }
 
+llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
+  llvm::SmallVector<Instruction *, 4> SplitInsts;
+  for (Instruction& Inst : *BB) {
+    if (BlockInsts.count(&Inst))
+      continue;
+    assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+    // This is a non-BCE-cmp-block instruction. And it can be separated
+    // from the BCE-cmp-block instructions.
+    SplitInsts.push_back(&Inst);
+  }
+  return SplitInsts;
+}
+
+
 // Visit the given comparison. If this is a comparison between two valid
 // BCE atoms, or between a BCE atom and a constant, returns the comparison.
 std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
@@ -552,42 +568,37 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
   return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
 }
 
-// void emitDebugInfo(BCECmpBlock &&Comparison) {
-//   LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-//                     << "': Found constant-cmp of " << Comparison.getCmp().SizeBits
-//                     << " bits including " << Comparison.getCmp()->Lhs.BaseId << " + "
-//                     << Comparison.getCmp().Lhs.Offset << "\n");
-
-//   LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-//                     << "': Found cmp of " << Comparison.getCmp().SizeBits
-//                     << " bits between " << Comparison.getCmp().Lhs.BaseId << " + "
-//                     << Comparison.Lhs.Offset << " and "
-//                     << Comparison.Rhs.BaseId << " + "
-//                     << Comparison.Rhs.Offset << "\n");
-//   LLVM_DEBUG(dbgs() << "\n");
-// }
-
-// Enqueues a single comparison and if it's the first comparison block then adds the `OtherInsts` to the block too to split it.
-static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
+void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock* BB) {
+  LLVM_DEBUG(dbgs() << "Block '" << BB->getName());
+  if (auto* ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
+    LLVM_DEBUG(dbgs() << "': Found constant-cmp of " << Cmp->SizeBits
+    << " bits including " << ConstCmp->Lhs.BaseId << " + "
+    << ConstCmp->Lhs.Offset << "\n");
+    return;
+  }
+  auto* BceCmp = cast<BCECmp>(Cmp.get());
+  LLVM_DEBUG(dbgs() << "': Found cmp of " << BceCmp->SizeBits
+  << " bits between " << BceCmp->Lhs.BaseId << " + "
+  << BceCmp->Lhs.Offset << " and "
+  << BceCmp->Rhs.BaseId << " + "
+  << BceCmp->Rhs.Offset << "\n");
+}
+
+// Enqueues all comparisons of a mult-block.
+// If the block requires splitting then adds `OtherInsts` to the block too.
+static inline void enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
                                 MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
-  // emitDebugInfo(Comparison);
-  for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
+  bool hasAlreadySplit = false;
+  for (auto& Cmp : CmpBlock.getCmps()) {
+    emitDebugInfo(Cmp, CmpBlock.BB);
     unsigned OrigOrder = Comparisons.size();
-    if (!RequireSplit || i != 0) {
-      Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder));
+    if (RequireSplit && !hasAlreadySplit) {
+      hasAlreadySplit = true;
+      auto SplitInsts = CmpBlock.getAllSplitInsts(AA);
+      Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
       continue;
     }
-    // If should split mult block then put all instructions at the beginning of the first block
-    llvm::SmallVector<Instruction *, 4> OtherInsts;
-    for (Instruction &Inst : *CmpBlock.BB) {
-      if (CmpBlock.BlockInsts.count(&Inst))
-        continue;
-      assert(CmpBlock.canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
-      // This is a non-BCE-cmp-block instruction. And it can be separated
-      // from the BCE-cmp-block instruction.
-      OtherInsts.push_back(&Inst);
-    }
-    Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder, OtherInsts));
+    Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder));
   }
 }
 
@@ -609,7 +620,6 @@ class BCECmpChain {
                   [](const auto &Blocks) { return Blocks.size() > 1; });
   };
 
-
 private:
   PHINode &Phi_;
   // The list of all blocks in the chain, grouped by contiguity.
@@ -714,7 +724,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
           LLVM_DEBUG(dbgs()
                      << "Split initial block '" << CmpBlock->BB->getName()
                      << "' that does extra work besides compare\n");
-          enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, true);
+          enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, true);
         } else {
           LLVM_DEBUG(dbgs()
                      << "ignoring initial block '" << CmpBlock->BB->getName()
@@ -747,7 +757,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
       // We could still merge bb1 and bb2 though.
       return;
     }
-    enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, false);
+    enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, false);
   }
   
   // It is possible we have no suitable comparison to merge.
@@ -827,6 +837,7 @@ class MergedBlockName {
 } // namespace
 
 
+// Add a branch to the next basic block in the chain.
 void updateBranching(Value* CondResult,
                      IRBuilder<>& Builder,
                      BasicBlock *BB,
@@ -836,7 +847,6 @@ void updateBranching(Value* CondResult,
                      const TargetLibraryInfo &TLI,
                      AliasAnalysis &AA, DomTreeUpdater &DTU) {
   BasicBlock *const PhiBB = Phi.getParent();
-  // Add a branch to the next basic block in the chain.
   if (NextCmpBlock == PhiBB) {
     // Continue to phi, passing it the comparison result.
     Builder.CreateBr(PhiBB);
@@ -851,6 +861,25 @@ void updateBranching(Value* CondResult,
   }
 }
 
+// Builds constant-struct to compare pointer to during memcmp(). Has to be a chain of const-comparisons.
+AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context) {
+  std::vector<Constant*> Constants;
+  std::vector<Type*> Types;
+
+  for (const auto& BceBlock : Comparisons) {
+    assert(isa<BCEConstCmp>(BceBlock.getCmp()) && "Const-cmp-chain can only contain const comparisons");
+    auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+    Constants.emplace_back(ConstCmp->Const);
+    Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
+  }
+  // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
+  auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+  auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+  auto *StructConstant = ConstantStruct::get(StructType, Constants);
+  Builder.CreateStore(StructConstant, StructAlloca);
+
+  return StructAlloca;
+}
 
 // Merges the given contiguous comparison blocks into one memcmp block.
 static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
@@ -870,27 +899,13 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
   IRBuilder<> Builder(BB);
   // Add the GEPs from the first BCECmpBlock.
   Value *Lhs, *Rhs;
-
   if (FirstCmp.Lhs()->GEP)
     Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
   else
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
 
-  // Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
   if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
-    std::vector<Constant*> Constants;
-    std::vector<Type*> Types;
-    for (const auto& BceBlock : Comparisons) {
-      auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
-      Constants.emplace_back(ConstCmp->Const);
-      Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
-    }
-    // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
-    auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
-    auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
-    auto *StructConstant = ConstantStruct::get(StructType, Constants);
-    Builder.CreateStore(StructConstant, StructAlloca);
-    Rhs = StructAlloca;
+    Rhs = buildStruct(Comparisons, Builder, Context);
   } else {
     auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
     if (FirstBceCmp->Rhs.GEP)
@@ -952,6 +967,7 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
   MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
   IRBuilder<> Builder(MultBB);
   updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+
   return MultBB;
 }
 
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
index 61fdd2b7e17e9..1bdd4fef67136 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -1,44 +1,5 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
 
-; define dso_local noundef zeroext i1 @cmp_mixed_split(ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %a, ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %b) local_unnamed_addr {
-; entry:
-;   %0 = load i32, ptr %a, align 4
-;   %1 = load i32, ptr %b, align 4
-;   %cmp = icmp eq i32 %0, %1
-;   br i1 %cmp, label %land.lhs.true, label %land.end
-; 
-; land.lhs.true:                                    ; preds = %entry
-;   %e = getelementptr inbounds nuw i8, ptr %a, i64 20
-;   %2 = load i32, ptr %e, align 4
-;   %a3 = getelementptr inbounds nuw i8, ptr %b, i64 4
-;   %3 = load i32, ptr %a3, align 4
-;   %b2 = getelementptr inbounds nuw i8, ptr %a, i64 8
-;   %4 = load i32, ptr %b2, align 4
-;   %c = getelementptr inbounds nuw i8, ptr %a, i64 12
-;   %5 = load i8, ptr %c, align 4
-;   %a1 = getelementptr inbounds nuw i8, ptr %a, i64 4
-;   %6 = load i32, ptr %a1, align 4
-;   %d = getelementptr inbounds nuw i8, ptr %a, i64 16
-;   %7 = load i32, ptr %d, align 4
-;   %cmp5 = icmp eq i32 %6, %3
-;   %cmp7 = icmp eq i8 %5, 43
-;   %or.cond = select i1 %cmp5, i1 %cmp7, i1 false
-;   %cmp9 = icmp eq i32 %4, 1
-;   %or.cond13 = select i1 %or.cond, i1 %cmp9, i1 false
-;   %cmp11 = icmp eq i32 %7, 12
-;   %or.cond14 = select i1 %or.cond13, i1 %cmp11, i1 false
-;   %cmp12 = icmp eq i32 %2, 3
-;   %spec.select = select i1 %or.cond14, i1 %cmp12, i1 false
-;   br label %land.end
-; 
-; land.end:                                         ; preds = %land.lhs.true, %entry
-;   %8 = phi i1 [ false, %entry ], [ %spec.select, %land.lhs.true ]
-;   ret i1 %8
-; }
-
-
-
-
 declare void @foo(...)
 
 ; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.

>From dd4cd885616b23b1a327c686c3085572f0ddac93 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 24 Mar 2025 20:50:48 +0100
Subject: [PATCH 20/23] [MergeICmps] Use GlobalConstant instead of local alloca
 for const-cmp; Is deleted when folded during expand-memcmp pass

---
 llvm/lib/CodeGen/ExpandMemCmp.cpp             |  19 +++
 llvm/lib/Transforms/Scalar/MergeICmps.cpp     |  12 +-
 .../Transforms/MergeICmps/X86/const-cmp-bb.ll |   5 +-
 .../MergeICmps/X86/many-const-cmp-select.ll   |  31 ++--
 .../MergeICmps/X86/mixed-cmp-bb-select.ll     |   6 +-
 .../MergeICmps/X86/mixed-cmp-split.ll         |  11 +-
 .../MergeICmps/X86/mixed-comparisons.ll       |   6 +-
 .../X86/mixed-type-const-comparisons.ll       |  11 +-
 .../X86/not-split-unmerged-select.ll          | 144 ++++++++--------
 .../MergeICmps/X86/partial-select-merge.ll    | 154 +++++++++---------
 .../MergeICmps/X86/split-block-does-work.ll   |   7 +-
 11 files changed, 209 insertions(+), 197 deletions(-)

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 74f93e1979532..e32cb2db1c954 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -879,8 +879,27 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
 
   if (Value *Res = Expansion.getMemCmpExpansion()) {
     // Replace call with result of expansion and erase call.
+    auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1)); 
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
+
+    // If the mergeicmps pass used a global constant to merge comparisons and
+    // the the global constants were folded then the variable can be deleted since it isn't used anymore.
+    if (GV) {
+      // NOTE: There is still a use lingering around but that use itself isn't
+      // used so it is fine to erase this instruction.
+      static bool (*hasActiveUses)(Value*) = [](Value* V) {
+        for (User* U: V->users()){
+          if (hasActiveUses(U))
+            return true;
+        }
+        return false;
+      };
+      if (!hasActiveUses(GV)) {
+        LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName() << " that was introduced by the previous mergeicmps pass\n");
+        GV->eraseFromParent();
+      }
+    }
   }
 
   return true;
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 21b97a0b45faf..2eb1c9761d32e 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -861,8 +861,9 @@ void updateBranching(Value* CondResult,
   }
 }
 
-// Builds constant-struct to compare pointer to during memcmp(). Has to be a chain of const-comparisons.
-AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context) {
+// Builds global constant-struct to compare to pointer during memcmp().
+// Has to be global in order for expand-memcmp pass to be able to fold constants.
+GlobalVariable* buildConstantStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context, Module& M) {
   std::vector<Constant*> Constants;
   std::vector<Type*> Types;
 
@@ -872,13 +873,10 @@ AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& B
     Constants.emplace_back(ConstCmp->Const);
     Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
   }
-  // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
   auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
-  auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
   auto *StructConstant = ConstantStruct::get(StructType, Constants);
-  Builder.CreateStore(StructConstant, StructAlloca);
 
-  return StructAlloca;
+  return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage, StructConstant, "memcmp_const_op");
 }
 
 // Merges the given contiguous comparison blocks into one memcmp block.
@@ -905,7 +903,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
     Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
 
   if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
-    Rhs = buildStruct(Comparisons, Builder, Context);
+    Rhs = buildConstantStruct(Comparisons, Builder, Context, *Phi.getModule());
   } else {
     auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
     if (FirstBceCmp->Rhs.GEP)
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 51c3c27583602..c39d586d2f174 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -4,11 +4,10 @@
 ; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
 
 define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
+; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>, align 8
-; CHECK-NEXT:    store <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>, ptr [[TMP0]], align 1
-; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[MEMCMP_OP]], i64 3)
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:    br label [[LAND_END5:%.*]]
 ; CHECK:       land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index 0ca0f671d98a4..bca4dacbefbfa 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -2,29 +2,28 @@
 
 ; Can merge contiguous const-comparison basic blocks that include a select statement.
 
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
+
 define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
 ; CHECK-LABEL: @is_all_ones_many(
 ; CHECK-NEXT:  "entry+land.lhs.true11":
-; CHECK-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8, i8 }>
-; CHECK-NEXT:    store <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>, ptr [[TMP0]], align 1
-; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
-; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT:    br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[MEMCMP_OP1]], i64 4)
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[TMP0]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "land.lhs.true16+land.lhs.true21":
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; CHECK-NEXT:    store <{ i8, i8 }> <{ i8 2, i8 7 }>, ptr [[TMP3]], align 1
-; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
-; CHECK-NEXT:    [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT:    br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[TMP2]], label [[LAST_CMP:%.*]], label [[LAND_END]]
 ; CHECK:  land.rhs1:
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
-; CHECK-NEXT:    [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
-; CHECK-NEXT:    [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
+; CHECK-NEXT:    [[TMP4:%.*]] = load i8, ptr [[TMP3]], align 1
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 9
 ; CHECK-NEXT:    br label [[LAND_END]]
 ; CHECK:       land.end:
-; CHECK-NEXT:    [[TMP8:%.*]] = phi i1 [ [[TMP7]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    ret i1 [[TMP8]]
+; CHECK-NEXT:    [[TMP6:%.*]] = phi i1 [ [[TMP5]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT:    ret i1 [[TMP6]]
 ;
 entry:
   %0 = load i8, ptr %p, align 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index dfe57e6ef930a..3990af69d6c83 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -2,6 +2,8 @@
 
 ; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
 
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
 define dso_local noundef zeroext i1 @cmp_mixed(
     ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
     ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
@@ -12,9 +14,7 @@ define dso_local noundef zeroext i1 @cmp_mixed(
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "entry+land.rhs+land.lhs.true4":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
 ; CHECK:       land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
index 1bdd4fef67136..3e4e4c3eaf6be 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -4,15 +4,16 @@ declare void @foo(...)
 
 ; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.
 
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
 define dso_local noundef zeroext i1 @cmp_mixed_const_first(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
 ; CHECK-LABEL: @cmp_mixed_const_first(
 ; This merged-block should come first as it should be split.
 ; CHECK:  "entry+land.rhs+land.lhs.true8":
 ; CHECK-NEXT:    call void (...) @foo() #[[ATTR2:[0-9]+]]
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT:    [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
 ; CHECK-NEXT:    [[CMP0:%.*]] = icmp eq i32 [[MEMCMP0]], 0
 ; CHECK-NEXT:    br i1 [[CMP0]], label [[LAND_LHS_TRUE10:%.*]], label [[LAND_END:%.*]]
 ; CHECK:   "land.lhs.true+land.lhs.true10+land.lhs.true4":
@@ -82,9 +83,7 @@ define dso_local noundef zeroext i1 @cmp_mixed_bce_first(
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "land.lhs.true+land.rhs+land.lhs.true4":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP1]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
 ; CHECK:       land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index d88d7d824b5ed..b5e85d3a09dfb 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -4,6 +4,8 @@
 ; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
 ; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
 
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
 define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
 ; CHECK-LABEL: @cmp_mixed(
 ; This is the classic BCE comparison block
@@ -14,9 +16,7 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
 ; This is the new BCE to constant comparison block
 ; CHECK:  "entry+land.rhs+land.lhs.true8":
 ; CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT:    [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT:    store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT:    [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
 ; CHECK-NEXT:    br label [[LAND_END]]
 ; CHECK:       land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
index 15c5a382d1f46..3a5bf5585d46a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -3,6 +3,9 @@
 ; Tests if a const-cmp-chain of different types can still be merged.
 ; This is usually the case when comparing different struct fields to constants.
 
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i8 }> <{ i32 3, i8 100 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>
+
 ; Can only merge gep 0 with gep 4 due to alignment since gep 8 is not directly adjacent to gep 4.
 define dso_local zeroext i1 @is_all_ones_struct(
 ; CHECK-LABEL: @is_all_ones_struct(
@@ -12,9 +15,7 @@ define dso_local zeroext i1 @is_all_ones_struct(
 ; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
 ; CHECK-NEXT:   br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
 ; CHECK:      "land.rhs+land.lhs.true":
-; CHECK-NEXT:   [[TMP2:%.*]] = alloca <{ i32, i8 }>
-; CHECK-NEXT:   store <{ i32, i8 }> <{ i32 3, i8 100 }>, ptr [[TMP2]]
-; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
+; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[MEMCMP_OP0]], i64 5)
 ; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:   br label [[LAND_END]]
 ; CHECK:      land.end:
@@ -49,9 +50,7 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
 ; CHECK-LABEL: @is_all_ones_struct_select_block(
 ; CHECK:      "entry+land.rhs":
-; CHECK-NEXT:   [[TMP0:%.*]] = alloca <{ i32, i8, i8 }>
-; CHECK-NEXT:   store <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>, ptr [[TMP0]]
-; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
+; CHECK-NEXT:   [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[MEMCMP_OP1]], i64 6)
 ; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; CHECK-NEXT:   br label [[LAND_END]]
 ; CHECK:      land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index 582b57d8c60ce..d3e882a226ac7 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,46 +1,48 @@
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
 
 ; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
 
+; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
+
 define dso_local noundef zeroext i1 @unmergable_select(
     ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @unmergable_select(
-; REG:       entry:
-; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
-; REG:       land.lhs.true11:
-; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT:    [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; REG-NEXT:    [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true16:
-; REG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true21:
-; REG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT:    br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG:       land.rhs:
-; REG-NEXT:    [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
-; REG-NEXT:    [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
-; REG-NEXT:    [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
-; REG-NEXT:    br label [[LAND_END]]
-; REG:  land.end:
-; REG-NEXT:    [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
-; REG-NEXT:    ret i1 [[RES]]
+; CHECK-LABEL: @unmergable_select(
+; CHECK:       entry:
+; CHECK-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
+; CHECK:       land.lhs.true11:
+; CHECK-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CHECK-NEXT:    [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CHECK-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true16:
+; CHECK-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true21:
+; CHECK-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT:    br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK:       land.rhs:
+; CHECK-NEXT:    [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
+; CHECK-NEXT:    [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; CHECK-NEXT:    [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:  land.end:
+; CHECK-NEXT:    [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
+; CHECK-NEXT:    ret i1 [[RES]]
 ;
 entry:
   %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
@@ -88,40 +90,38 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 ; p[12] and p[13] mergable, select mult-block is part of the chain but isn't merged and won't get split up into its single comparisons.
 
 define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @partial_merge_not_select(
-; REG:       entry3:
-; REG-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; REG-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
-; REG:       "land.lhs.true11+land.rhs":
-; REG-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT:    [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; REG-NEXT:    store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
-; REG-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
-; REG-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true162:
-; REG-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true211:
-; REG-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT:    br label [[LAND_END]]
-; REG:  land.end:
-; REG-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
-; REG-NEXT:    ret i1 [[RES]]
+; CHECK-LABEL: @partial_merge_not_select(
+; CHECK:       entry3:
+; CHECK-NEXT:    [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; CHECK-NEXT:    [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT:    [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT:    [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT:    [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT:    [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT:    [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT:    [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT:    br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CHECK:       "land.lhs.true11+land.rhs":
+; CHECK-NEXT:    [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[MEMCMP_OP]], i64 2)
+; CHECK-NEXT:    [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true162:
+; CHECK-NEXT:    [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT:    [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT:    [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT:    br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true211:
+; CHECK-NEXT:    [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT:    [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT:    [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT:    br label [[LAND_END]]
+; CHECK:  land.end:
+; CHECK-NEXT:    [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
+; CHECK-NEXT:    ret i1 [[RES]]
 ;
 entry:
   %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 317a3a1464536..f67743ed6fcc1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -1,49 +1,49 @@
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
 
 ; Cannot merge only part of a select block if not entire block mergable.
 
 define zeroext i1 @cmp_partially_mergable_select(
     ptr nocapture readonly align 4 dereferenceable(24) %a,
     ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
-; REG-LABEL: @cmp_partially_mergable_select(
-; REG:      entry:
-; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
-; REG-NEXT:   [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
-; REG-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
-; REG-NEXT:   br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
-; REG:      land.lhs.true:
-; REG-NEXT:   [[TMP1:%.*]] = load i32, ptr [[A]], align 4
-; REG-NEXT:   [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
-; REG-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
-; REG-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
-; REG:      land.lhs.true4:
-; REG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
-; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
-; REG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
-; REG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; REG-NEXT:   [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
-; REG-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
-; REG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
-; REG-NEXT:   br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
-; REG:      land.lhs.true10:
-; REG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; REG-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
-; REG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; REG-NEXT:   [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
-; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
-; REG-NEXT:   br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG:      land.rhs:
-; REG-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
-; REG-NEXT:   [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
-; REG-NEXT:   [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
-; REG-NEXT:   [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
-; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
-; REG-NEXT:   br label [[LAND_END]]
-; REG:      land.end:
-; REG-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
-; REG-NEXT:   ret i1 [[RES]]
+; CHECK-LABEL: @cmp_partially_mergable_select(
+; CHECK:      entry:
+; CHECK-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; CHECK-NEXT:   [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
+; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
+; CHECK-NEXT:   br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; CHECK:      land.lhs.true:
+; CHECK-NEXT:   [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT:   [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
+; CHECK-NEXT:   br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
+; CHECK:      land.lhs.true4:
+; CHECK-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
+; CHECK-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
+; CHECK-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
+; CHECK-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; CHECK-NEXT:   [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
+; CHECK-NEXT:   [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
+; CHECK-NEXT:   [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; CHECK-NEXT:   br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
+; CHECK:      land.lhs.true10:
+; CHECK-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; CHECK-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
+; CHECK-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; CHECK-NEXT:   [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
+; CHECK-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
+; CHECK-NEXT:   br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK:      land.rhs:
+; CHECK-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
+; CHECK-NEXT:   [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
+; CHECK-NEXT:   [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
+; CHECK-NEXT:   [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
+; CHECK-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
+; CHECK-NEXT:   br label [[LAND_END]]
+; CHECK:      land.end:
+; CHECK-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
+; CHECK-NEXT:   ret i1 [[RES]]
 ;
 entry:
   %e = getelementptr inbounds nuw i8, ptr %a, i64 8
@@ -95,43 +95,43 @@ land.end:                                         ; preds = %land.rhs, %land.lhs
 
 define dso_local zeroext i1 @cmp_partially_mergable_select_array(
     ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @cmp_partially_mergable_select_array(
-; REG:       entry:
-; REG-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT:   [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT:   [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT:   br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
-; REG:       land.lhs.true11:
-; REG-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; REG-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; REG-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true16:
-; REG-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
-; REG:       land.lhs.true21:
-; REG-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT:   br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG:       land.rhs:
-; REG-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
-; REG-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
-; REG-NEXT:   [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
-; REG-NEXT:   br label [[LAND_END]]
-; REG:       land.end:
-; REG-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
-; REG-NEXT:   ret i1 [[RES]]
+; CHECK-LABEL: @cmp_partially_mergable_select_array(
+; CHECK:       entry:
+; CHECK-NEXT:   [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; CHECK-NEXT:   [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT:   [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT:   [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT:   [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT:   [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT:   [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT:   [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT:   [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT:   [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT:   [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT:   br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
+; CHECK:       land.lhs.true11:
+; CHECK-NEXT:   [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CHECK-NEXT:   [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CHECK-NEXT:   [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CHECK-NEXT:   br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true16:
+; CHECK-NEXT:   [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT:   [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT:   [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT:   br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
+; CHECK:       land.lhs.true21:
+; CHECK-NEXT:   [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT:   [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT:   [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT:   br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK:       land.rhs:
+; CHECK-NEXT:   [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
+; CHECK-NEXT:   [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; CHECK-NEXT:   [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT:   br label [[LAND_END]]
+; CHECK:       land.end:
+; CHECK-NEXT:   [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
+; CHECK-NEXT:   ret i1 [[RES]]
 ;
 entry:
   %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 5381d88ed7f52..00306a2b5f22c 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -1,4 +1,3 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
 ; RUN: opt < %s -passes=mergeicmps -verify-dom-info -mtriple=x86_64-unknown-unknown -S | FileCheck %s --check-prefix=X86
 
 %S = type { i32, i32, i32, i32 }
@@ -6,6 +5,8 @@
 declare void @foo(...)
 declare void @bar(...)
 
+; X86: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
+
 ; We can split %entry and create a memcmp(16 bytes).
 define zeroext i1 @opeq1(
 ; X86-LABEL: @opeq1(
@@ -250,9 +251,7 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
 ; X86-NEXT:    call void (...) @foo() #[[ATTR2]]
 ; X86-NEXT:    call void (...) @bar() #[[ATTR2]]
 ; X86-NEXT:    [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT:    [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>
-; X86-NEXT:    store <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>, ptr [[TMP0]], align 1
-; X86-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
+; X86-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[MEMCMP_OP]], i64 3)
 ; X86-NEXT:    [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
 ; X86-NEXT:    br label [[LAND_END:%.*]]
 ; X86:       land.end:

>From e029bebe32b4d36d62dda9903e89f23538621852 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 18:17:09 +0200
Subject: [PATCH 21/23] [MergerICmps] Tested that global constant is properly
 removed in expand-memcmp

---
 llvm/lib/CodeGen/ExpandMemCmp.cpp             |  4 +--
 .../Transforms/MergeICmps/X86/const-cmp-bb.ll | 25 +++++++++++++++++--
 .../MergeICmps/X86/many-const-cmp-select.ll   |  8 ++++--
 .../MergeICmps/X86/mixed-cmp-bb-select.ll     |  2 ++
 .../X86/not-split-unmerged-select.ll          |  2 ++
 .../MergeICmps/X86/split-block-does-work.ll   |  2 ++
 6 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index e32cb2db1c954..41b43a131932a 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -878,14 +878,14 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   NumMemCmpInlined++;
 
   if (Value *Res = Expansion.getMemCmpExpansion()) {
-    // Replace call with result of expansion and erase call.
     auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1)); 
+    // Replace call with result of expansion and erase call.
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
 
     // If the mergeicmps pass used a global constant to merge comparisons and
     // the the global constants were folded then the variable can be deleted since it isn't used anymore.
-    if (GV) {
+    if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
       // NOTE: There is still a use lingering around but that use itself isn't
       // used so it is fine to erase this instruction.
       static bool (*hasActiveUses)(Value*) = [](Value* V) {
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index c39d586d2f174..3956c62579986 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -1,10 +1,14 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --force-update
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
 
 ; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
 
-define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
 ; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
+
+; Global should be removed once its constant has been folded.
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
+
+define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
 ; CHECK-LABEL: @test(
 ; CHECK-NEXT:  "entry+land.lhs.true+land.rhs":
 ; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[MEMCMP_OP]], i64 3)
@@ -13,6 +17,23 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
 ; CHECK:       land.end:
 ; CHECK-NEXT:    ret i1 [[TMP1]]
 ;
+; EXPANDED-LABEL: define zeroext i1 @test(
+; EXPANDED-SAME: ptr nocapture noundef nonnull dereferenceable(3) [[P:%.*]]) local_unnamed_addr {
+; EXPANDED-NEXT:  "entry+land.lhs.true+land.rhs":
+; EXPANDED-NEXT:    [[TMP0:%.*]] = load i16, ptr [[P]], align 1
+; EXPANDED-NEXT:    [[TMP8:%.*]] = xor i16 [[TMP0]], -14081
+; EXPANDED-NEXT:    [[TMP2:%.*]] = getelementptr i8, ptr [[P]], i64 2
+; EXPANDED-NEXT:    [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; EXPANDED-NEXT:    [[TMP4:%.*]] = zext i8 [[TMP3]] to i16
+; EXPANDED-NEXT:    [[TMP5:%.*]] = xor i16 [[TMP4]], 190
+; EXPANDED-NEXT:    [[TMP6:%.*]] = or i16 [[TMP8]], [[TMP5]]
+; EXPANDED-NEXT:    [[TMP7:%.*]] = icmp ne i16 [[TMP6]], 0
+; EXPANDED-NEXT:    [[CMP:%.*]] = zext i1 [[TMP7]] to i32
+; EXPANDED-NEXT:    [[RES:%.*]] = icmp eq i32 [[CMP]], 0
+; EXPANDED-NEXT:    br label %[[LAND_END:.*]]
+; EXPANDED:       [[LAND_END]]:
+; EXPANDED-NEXT:    ret i1 [[RES]]
+;
 entry:
   %0 = load i8, ptr %p, align 1
   %cmp = icmp eq i8 %0, -1
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index bca4dacbefbfa..c4c2fe7e6a222 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -1,10 +1,14 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
 
 ; Can merge contiguous const-comparison basic blocks that include a select statement.
 
 ; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
 ; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
 
+; EXPANDED-NOT: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
+; EXPANDED-NOT: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
+
 define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
 ; CHECK-LABEL: @is_all_ones_many(
 ; CHECK-NEXT:  "entry+land.lhs.true11":
@@ -13,8 +17,8 @@ define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dere
 ; CHECK-NEXT:    br i1 [[TMP0]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
 ; CHECK:  "land.lhs.true16+land.lhs.true21":
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT:    [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
-; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT:    [[MEMCMP1:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
+; CHECK-NEXT:    [[TMP2:%.*]] = icmp eq i32 [[MEMCMP1]], 0
 ; CHECK-NEXT:    br i1 [[TMP2]], label [[LAST_CMP:%.*]], label [[LAND_END]]
 ; CHECK:  land.rhs1:
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 3990af69d6c83..d81aecc76ea4a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -1,8 +1,10 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
 
 ; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
 
 ; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+; EXPANDED-NOT: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
 
 define dso_local noundef zeroext i1 @cmp_mixed(
     ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index d3e882a226ac7..d059609afe292 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,8 +1,10 @@
 ; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
 
 ; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
 
 ; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
 
 define dso_local noundef zeroext i1 @unmergable_select(
     ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 00306a2b5f22c..442d11f9c77fa 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -1,4 +1,5 @@
 ; RUN: opt < %s -passes=mergeicmps -verify-dom-info -mtriple=x86_64-unknown-unknown -S | FileCheck %s --check-prefix=X86
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
 
 %S = type { i32, i32, i32, i32 }
 
@@ -6,6 +7,7 @@ declare void @foo(...)
 declare void @bar(...)
 
 ; X86: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
 
 ; We can split %entry and create a memcmp(16 bytes).
 define zeroext i1 @opeq1(

>From e1fe5286bb4e3cbc045c1b5ecee020451a601993 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 19:27:40 +0200
Subject: [PATCH 22/23] [MergerICmps] Formatted

---
 llvm/lib/CodeGen/ExpandMemCmp.cpp         |  13 +-
 llvm/lib/Transforms/Scalar/MergeICmps.cpp | 363 ++++++++++++----------
 2 files changed, 205 insertions(+), 171 deletions(-)

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 41b43a131932a..323d34f838b27 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -878,25 +878,28 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
   NumMemCmpInlined++;
 
   if (Value *Res = Expansion.getMemCmpExpansion()) {
-    auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1)); 
+    auto *GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
     // Replace call with result of expansion and erase call.
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
 
     // If the mergeicmps pass used a global constant to merge comparisons and
-    // the the global constants were folded then the variable can be deleted since it isn't used anymore.
+    // the the global constants were folded then the variable can be deleted
+    // since it isn't used anymore.
     if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
       // NOTE: There is still a use lingering around but that use itself isn't
       // used so it is fine to erase this instruction.
-      static bool (*hasActiveUses)(Value*) = [](Value* V) {
-        for (User* U: V->users()){
+      static bool (*hasActiveUses)(Value *) = [](Value *V) {
+        for (User *U : V->users()) {
           if (hasActiveUses(U))
             return true;
         }
         return false;
       };
       if (!hasActiveUses(GV)) {
-        LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName() << " that was introduced by the previous mergeicmps pass\n");
+        LLVM_DEBUG(
+            dbgs() << "Removing global constant " << GV->getName()
+                   << " that was introduced by the previous mergeicmps pass\n");
         GV->eraseFromParent();
       }
     }
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2eb1c9761d32e..0167fdddf7f7f 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -52,8 +52,8 @@
 #include "llvm/IR/Dominators.h"
 #include "llvm/IR/Function.h"
 #include "llvm/IR/GlobalValue.h"
-#include "llvm/IR/Instruction.h"
 #include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
 #include "llvm/IR/ValueMap.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Pass.h"
@@ -132,14 +132,14 @@ class BaseIdentifier {
   DenseMap<const Value*, int> BaseToIndex;
 };
 
-
 // All Instructions related to a comparison.
 typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
 
 // If this value is a load from a constant offset w.r.t. a base address, and
 // there are no other users of the load or address, returns the base address and
 // the offset.
-BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, InstructionSet* BlockInsts) {
+BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId,
+                             InstructionSet *BlockInsts) {
   auto *const LoadI = dyn_cast<LoadInst>(Val);
   if (!LoadI)
     return {};
@@ -184,7 +184,6 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, Instructi
   return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
 }
 
-
 // An abstract parent class that can either be a comparison of
 // two BCEAtoms with the same offsets to a base pointer (BCECmp)
 // or a comparison of a single BCEAtom with a constant (BCEConstCmp).
@@ -194,23 +193,26 @@ struct Comparison {
     CK_ConstCmp,
     CK_BceCmp,
   };
+
 private:
   const CompKind Kind;
+
 public:
   int SizeBits;
   const ICmpInst *CmpI;
 
   Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
-        : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
+      : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
   CompKind getKind() const { return Kind; }
 
   virtual ~Comparison() = default;
-  bool areContiguous(const Comparison& Other) const;
+  bool areContiguous(const Comparison &Other) const;
   bool operator<(const Comparison &Other) const;
 };
 
 // A comparison between a BCE atom and an integer constant.
-// If these BCE atoms are chained and access adjacent memory then they too can be merged, e.g.
+// If these BCE atoms are chained and access adjacent memory then they too can
+// be merged, e.g.
 // ```
 // int *p = ...;
 // int a = p[0];
@@ -219,11 +221,12 @@ struct Comparison {
 // ```
 struct BCEConstCmp : public Comparison {
   BCEAtom Lhs;
-  Constant* Const;
+  Constant *Const;
 
-  BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
-      : Comparison(CK_ConstCmp, SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
-  static bool classof(const Comparison* C) {
+  BCEConstCmp(BCEAtom L, Constant *Const, int SizeBits, const ICmpInst *CmpI)
+      : Comparison(CK_ConstCmp, SizeBits, CmpI), Lhs(std::move(L)),
+        Const(Const) {}
+  static bool classof(const Comparison *C) {
     return C->getKind() == CK_ConstCmp;
   }
 };
@@ -239,54 +242,58 @@ struct BCECmp : public Comparison {
   BCEAtom Rhs;
 
   BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
-      : Comparison(CK_BceCmp, SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R))  {
+      : Comparison(CK_BceCmp, SizeBits, CmpI), Lhs(std::move(L)),
+        Rhs(std::move(R)) {
     if (Rhs < Lhs) std::swap(Rhs, Lhs);
   }
-  static bool classof(const Comparison* C) {
-    return C->getKind() == CK_BceCmp;
-  }
+  static bool classof(const Comparison *C) { return C->getKind() == CK_BceCmp; }
 };
 
 // TODO: this can be improved to take alignment into account.
-bool Comparison::areContiguous(const Comparison& Other) const {
-  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+bool Comparison::areContiguous(const Comparison &Other) const {
+  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) &&
+         "Comparisons are of same kind");
   if (isa<BCEConstCmp>(this)) {
-    const auto& First = cast<BCEConstCmp>(this);
-    const auto& Second = cast<BCEConstCmp>(Other);
+    const auto &First = cast<BCEConstCmp>(this);
+    const auto &Second = cast<BCEConstCmp>(Other);
 
     return First->Lhs.BaseId == Second.Lhs.BaseId &&
            First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset;
   }
-  const auto& First = cast<BCECmp>(this);
-  const auto& Second = cast<BCECmp>(Other);
+  const auto &First = cast<BCECmp>(this);
+  const auto &Second = cast<BCECmp>(Other);
 
   return First->Lhs.BaseId == Second.Lhs.BaseId &&
          First->Rhs.BaseId == Second.Rhs.BaseId &&
          First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset &&
          First->Rhs.Offset + First->SizeBits / 8 == Second.Rhs.Offset;
 }
-bool Comparison::operator<(const Comparison& Other) const {
-  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+bool Comparison::operator<(const Comparison &Other) const {
+  assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) &&
+         "Comparisons are of same kind");
   if (isa<BCEConstCmp>(this)) {
-    const auto& First = cast<BCEConstCmp>(this);
-    const auto& Second = cast<BCEConstCmp>(Other);
+    const auto &First = cast<BCEConstCmp>(this);
+    const auto &Second = cast<BCEConstCmp>(Other);
     return First->Lhs < Second.Lhs;
   }
-  const auto& First = cast<BCECmp>(this);
-  const auto& Second = cast<BCECmp>(Other);
-  return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
+  const auto &First = cast<BCECmp>(this);
+  const auto &Second = cast<BCECmp>(Other);
+  return std::tie(First->Lhs, First->Rhs) < std::tie(Second.Lhs, Second.Rhs);
 }
 
 // Represents multiple comparisons inside of a single basic block.
-// This happens if multiple basic blocks have previously been merged into a single block using a select node.
+// This happens if multiple basic blocks have previously been merged into a
+// single block using a select node.
 class IntraCmpChain {
-  // TODO: this could probably be a unique-ptr but current impl relies on some copies
+  // TODO: this could probably be a unique-ptr but current impl relies on some
+  // copies
   std::vector<std::shared_ptr<Comparison>> CmpChain;
 
 public:
   IntraCmpChain(std::shared_ptr<Comparison> C) : CmpChain{C} {}
   IntraCmpChain combine(const IntraCmpChain OtherChain) {
-    CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+    CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(),
+                    OtherChain.CmpChain.end());
     return *this;
   }
   std::vector<std::shared_ptr<Comparison>> getCmpChain() const {
@@ -296,13 +303,12 @@ class IntraCmpChain {
 
 // A basic block that contains one or more comparisons.
 class MultBCECmpBlock {
- public:
-  MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+public:
+  MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB,
+                  InstructionSet BlockInsts)
       : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
 
-  std::vector<std::shared_ptr<Comparison>> getCmps() {
-    return Cmps;
-  }
+  std::vector<std::shared_ptr<Comparison>> getCmps() { return Cmps; }
 
   // Returns true if the block does other works besides comparison.
   bool doesOtherWork() const;
@@ -335,24 +341,25 @@ class MultBCECmpBlock {
 // split into the atom comparison part and the "other work" part
 // (see canSplit()).
 class SingleBCECmpBlock {
- public:
-  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder)
+public:
+  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock *BB,
+                    unsigned OrigOrder)
       : BB(BB), OrigOrder(OrigOrder), Cmp(std::move(Cmp)) {}
 
-  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder,
+  SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock *BB,
+                    unsigned OrigOrder,
                     llvm::SmallVector<Instruction *, 4> SplitInsts)
-      : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)), SplitInsts(SplitInsts) {}
+      : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)),
+        SplitInsts(SplitInsts) {}
 
-  const BCEAtom* Lhs() const {
+  const BCEAtom *Lhs() const {
     if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
       return &BceConstCmp->Lhs;
     auto *const BceCmp = cast<BCECmp>(Cmp.get());
     return &BceCmp->Lhs;
   }
-  const Comparison* getCmp() const { return Cmp.get(); }
-  bool operator<(const SingleBCECmpBlock &O) const {
-    return *Cmp < *O.Cmp;
-  }
+  const Comparison *getCmp() const { return Cmp.get(); }
+  bool operator<(const SingleBCECmpBlock &O) const { return *Cmp < *O.Cmp; }
 
   // We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
   // instructions. Split the old block and move all non-BCE-cmp-insts into the
@@ -372,7 +379,7 @@ class SingleBCECmpBlock {
 };
 
 bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
-                                    AliasAnalysis &AA) const {
+                                        AliasAnalysis &AA) const {
   // If this instruction may clobber the loads and is in middle of the BCE cmp
   // block instructions, then bail for now.
   if (Inst->mayWriteToMemory()) {
@@ -382,7 +389,7 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
       return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
              isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
     };
-    auto CmpLoadsAreClobbered = [&](const auto& Cmp) {
+    auto CmpLoadsAreClobbered = [&](const auto &Cmp) {
       if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
         return MayClobber(BceConstCmp->Lhs.LoadI);
       auto *const BceCmp = cast<BCECmp>(Cmp.get());
@@ -427,9 +434,10 @@ bool MultBCECmpBlock::doesOtherWork() const {
   return false;
 }
 
-llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
+llvm::SmallVector<Instruction *, 4>
+MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
   llvm::SmallVector<Instruction *, 4> SplitInsts;
-  for (Instruction& Inst : *BB) {
+  for (Instruction &Inst : *BB) {
     if (BlockInsts.count(&Inst))
       continue;
     assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
@@ -440,12 +448,12 @@ llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnaly
   return SplitInsts;
 }
 
-
 // Visit the given comparison. If this is a comparison between two valid
 // BCE atoms, or between a BCE atom and a constant, returns the comparison.
-std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
-                                const ICmpInst::Predicate ExpectedPredicate,
-                                BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+std::optional<std::shared_ptr<Comparison>>
+visitICmp(const ICmpInst *const CmpI,
+          const ICmpInst::Predicate ExpectedPredicate, BaseIdentifier &BaseId,
+          InstructionSet *BlockInsts) {
   // The comparison can only be used once:
   //  - For intermediate blocks, as a branch condition.
   //  - For the final block, as an incoming value for the Phi.
@@ -465,43 +473,51 @@ std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
   if (!Lhs.BaseId)
     return std::nullopt;
 
-  // Second operand can either be load if doing compare between two BCE atoms or 
+  // Second operand can either be load if doing compare between two BCE atoms or
   // can be constant if comparing adjacent memory to constant
-  auto* RhsOperand = CmpI->getOperand(1);
+  auto *RhsOperand = CmpI->getOperand(1);
   const auto &DL = CmpI->getDataLayout();
   int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
 
   BlockInsts->insert(CmpI);
-  if (auto const& Const = dyn_cast<Constant>(RhsOperand))
-    return std::make_shared<BCEConstCmp>(BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
+  if (auto const &Const = dyn_cast<Constant>(RhsOperand))
+    return std::make_shared<BCEConstCmp>(
+        BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
 
   auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
   if (!Rhs.BaseId)
     return std::nullopt;
-  return std::make_shared<BCECmp>(BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
+  return std::make_shared<BCECmp>(
+      BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
 }
 
-// Chain of comparisons inside a single basic block connected using `select` nodes.
-std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&, InstructionSet*);
+// Chain of comparisons inside a single basic block connected using `select`
+// nodes.
+std::optional<IntraCmpChain> visitComparison(Value *, ICmpInst::Predicate,
+                                             BaseIdentifier &,
+                                             InstructionSet *);
 
 std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
-                                  ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId, InstructionSet *BlockInsts) {
+                                         ICmpInst::Predicate ExpectedPredicate,
+                                         BaseIdentifier &BaseId,
+                                         InstructionSet *BlockInsts) {
   if (!SelectI->hasOneUse()) {
     LLVM_DEBUG(dbgs() << "select has several uses\n");
     return std::nullopt;
   }
-  auto* Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-  auto* Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
-  auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-  auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+  auto *Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+  auto *Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
+  auto const &Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+  auto const &ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
 
   if (!(Cmp1 || Sel1) || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
     return std::nullopt;
 
-  auto Lhs = visitComparison(SelectI->getOperand(0),ExpectedPredicate,BaseId,BlockInsts);
+  auto Lhs = visitComparison(SelectI->getOperand(0), ExpectedPredicate, BaseId,
+                             BlockInsts);
   if (!Lhs)
     return std::nullopt;
-  auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId,BlockInsts);
+  auto Rhs = visitComparison(Cmp2, ExpectedPredicate, BaseId, BlockInsts);
   if (!Rhs)
     return std::nullopt;
 
@@ -509,8 +525,9 @@ std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
   return Lhs->combine(std::move(*Rhs));
 }
 
-std::optional<IntraCmpChain> visitComparison(Value *Cond,
-            ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+std::optional<IntraCmpChain>
+visitComparison(Value *Cond, ICmpInst::Predicate ExpectedPredicate,
+                BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
   if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) {
     auto CmpVisit = visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
     if (!CmpVisit)
@@ -526,9 +543,9 @@ std::optional<IntraCmpChain> visitComparison(Value *Cond,
 // Visit the given comparison block. If this is a comparison between two valid
 // BCE atoms, returns the comparison.
 std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
-                                         BasicBlock *const Block,
-                                         const BasicBlock *const PhiBlock,
-                                         BaseIdentifier &BaseId) {
+                                             BasicBlock *const Block,
+                                             const BasicBlock *const PhiBlock,
+                                             BaseIdentifier &BaseId) {
   if (Block->empty())
     return std::nullopt;
   auto *const BranchI = dyn_cast<BranchInst>(Block->getTerminator());
@@ -560,7 +577,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
   }
 
   InstructionSet BlockInsts;
-  std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
+  std::optional<IntraCmpChain> Result =
+      visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
   if (!Result)
     return std::nullopt;
 
@@ -568,34 +586,36 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
   return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
 }
 
-void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock* BB) {
+void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock *BB) {
   LLVM_DEBUG(dbgs() << "Block '" << BB->getName());
-  if (auto* ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
+  if (auto *ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
     LLVM_DEBUG(dbgs() << "': Found constant-cmp of " << Cmp->SizeBits
-    << " bits including " << ConstCmp->Lhs.BaseId << " + "
-    << ConstCmp->Lhs.Offset << "\n");
+                      << " bits including " << ConstCmp->Lhs.BaseId << " + "
+                      << ConstCmp->Lhs.Offset << "\n");
     return;
   }
-  auto* BceCmp = cast<BCECmp>(Cmp.get());
+  auto *BceCmp = cast<BCECmp>(Cmp.get());
   LLVM_DEBUG(dbgs() << "': Found cmp of " << BceCmp->SizeBits
-  << " bits between " << BceCmp->Lhs.BaseId << " + "
-  << BceCmp->Lhs.Offset << " and "
-  << BceCmp->Rhs.BaseId << " + "
-  << BceCmp->Rhs.Offset << "\n");
+                    << " bits between " << BceCmp->Lhs.BaseId << " + "
+                    << BceCmp->Lhs.Offset << " and " << BceCmp->Rhs.BaseId
+                    << " + " << BceCmp->Rhs.Offset << "\n");
 }
 
 // Enqueues all comparisons of a mult-block.
 // If the block requires splitting then adds `OtherInsts` to the block too.
-static inline void enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
-                                MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
+static inline void
+enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
+                  MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA,
+                  bool RequireSplit) {
   bool hasAlreadySplit = false;
-  for (auto& Cmp : CmpBlock.getCmps()) {
+  for (auto &Cmp : CmpBlock.getCmps()) {
     emitDebugInfo(Cmp, CmpBlock.BB);
     unsigned OrigOrder = Comparisons.size();
     if (RequireSplit && !hasAlreadySplit) {
       hasAlreadySplit = true;
       auto SplitInsts = CmpBlock.getAllSplitInsts(AA);
-      Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
+      Comparisons.push_back(
+          SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
       continue;
     }
     Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder));
@@ -629,21 +649,21 @@ class BCECmpChain {
   BasicBlock *EntryBlock_;
 };
 
-
-// Returns true if a merge in the chain depends on a basic block where not every comparison is merged.
-// NOTE: This is pretty restrictive and could potentially be handled using an improved tradeoff heuristic.
+// Returns true if a merge in the chain depends on a basic block where not every
+// comparison is merged. NOTE: This is pretty restrictive and could potentially
+// be handled using an improved tradeoff heuristic.
 bool BCECmpChain::multBlockOnlyPartiallyMerged() {
-  llvm::SmallDenseSet<const BasicBlock*, 8> UnmergedBlocks, MergedBB;
+  llvm::SmallDenseSet<const BasicBlock *, 8> UnmergedBlocks, MergedBB;
 
-  for (auto& Merged : MergedBlocks_) {
+  for (auto &Merged : MergedBlocks_) {
     if (Merged.size() == 1) {
       UnmergedBlocks.insert(Merged[0].BB);
       continue;
     }
-    for (auto& C : Merged)
+    for (auto &C : Merged)
       MergedBB.insert(C.BB);
   }
-  return llvm::any_of(MergedBB, [&](const BasicBlock* BB){
+  return llvm::any_of(MergedBB, [&](const BasicBlock *BB) {
     return UnmergedBlocks.contains(BB);
   });
 }
@@ -655,39 +675,43 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
   return MinOrigOrder;
 }
 
-/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
-/// ranges that can be merged together into a single comparison.
-template<class RandomIt>
-static void mergeBlocks(RandomIt First, RandomIt Last,
-                        std::vector<BCECmpChain::ContiguousBlocks>* MergedBlocks) {
+/// Given a chain of comparison blocks (of the same kind), groups the blocks
+/// into contiguous ranges that can be merged together into a single comparison.
+template <class RandomIt>
+static void
+mergeBlocks(RandomIt First, RandomIt Last,
+            std::vector<BCECmpChain::ContiguousBlocks> *MergedBlocks) {
   // Sort to detect continuous offsets.
-  llvm::sort(First, Last,
-             [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
-              return LhsBlock < RhsBlock;
-             });
+  llvm::sort(
+      First, Last,
+      [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
+        return LhsBlock < RhsBlock;
+      });
 
   BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
   int Offset = MergedBlocks->size();
-  for (auto& BlockIt = First; BlockIt != Last; ++BlockIt) {
-    if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
+  for (auto &BlockIt = First; BlockIt != Last; ++BlockIt) {
+    if (!LastMergedBlock ||
+        !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
       MergedBlocks->emplace_back();
       LastMergedBlock = &MergedBlocks->back();
     } else {
-      LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName() << " into "
-                        << LastMergedBlock->back().BB->getName() << "\n");
+      LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName()
+                        << " into " << LastMergedBlock->back().BB->getName()
+                        << "\n");
     }
     LastMergedBlock->push_back(std::move(*BlockIt));
   }
 
   // While we allow reordering for merging, do not reorder unmerged comparisons.
   // Doing so may introduce branch on poison.
-  llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(), [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
-                              const BCECmpChain::ContiguousBlocks &RhsBlocks) {
-    return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
-  });
+  llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(),
+             [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
+                const BCECmpChain::ContiguousBlocks &RhsBlocks) {
+               return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
+             });
 }
 
-
 BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
                          AliasAnalysis &AA)
     : Phi_(Phi) {
@@ -759,7 +783,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
     }
     enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, false);
   }
-  
+
   // It is possible we have no suitable comparison to merge.
   if (Comparisons.empty()) {
     LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
@@ -768,13 +792,17 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
 
   EntryBlock_ = Comparisons[0].BB;
 
-  auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
-  auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
+  auto isConstCmp = [](SingleBCECmpBlock &C) {
+    return isa<BCEConstCmp>(C.getCmp());
+  };
+  auto BceIt =
+      std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
 
   // The chain that requires splitting should always be first.
-  // If no chain requires splitting then defaults to BCE-comparisons coming first.
+  // If no chain requires splitting then defaults to BCE-comparisons coming
+  // first.
   if (std::any_of(Comparisons.begin(), BceIt,
-                   [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
+                  [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
     mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
     mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
   } else {
@@ -805,12 +833,11 @@ class MergedBlockName {
     // Since multiple comparisons can come from the same basic block
     // (when using select inst) don't want to repeat same name twice
     UniqueVector<StringRef> UniqueNames;
-    for (const auto& B : Comparisons)
+    for (const auto &B : Comparisons)
       UniqueNames.insert(B.BB->getName());
-    const int size = std::accumulate(UniqueNames.begin(), UniqueNames.end(), 0,
-                                     [](int i, const StringRef &Name) {
-                                       return i + Name.size();
-                                     });
+    const int size = std::accumulate(
+        UniqueNames.begin(), UniqueNames.end(), 0,
+        [](int i, const StringRef &Name) { return i + Name.size(); });
     if (size == 0)
       return StringRef("", 0);
 
@@ -836,15 +863,10 @@ class MergedBlockName {
 };
 } // namespace
 
-
 // Add a branch to the next basic block in the chain.
-void updateBranching(Value* CondResult,
-                     IRBuilder<>& Builder,
-                     BasicBlock *BB,
-                     BasicBlock *const NextCmpBlock,
-                     PHINode &Phi,
-                     LLVMContext &Context,
-                     const TargetLibraryInfo &TLI,
+void updateBranching(Value *CondResult, IRBuilder<> &Builder, BasicBlock *BB,
+                     BasicBlock *const NextCmpBlock, PHINode &Phi,
+                     LLVMContext &Context, const TargetLibraryInfo &TLI,
                      AliasAnalysis &AA, DomTreeUpdater &DTU) {
   BasicBlock *const PhiBB = Phi.getParent();
   if (NextCmpBlock == PhiBB) {
@@ -862,29 +884,34 @@ void updateBranching(Value* CondResult,
 }
 
 // Builds global constant-struct to compare to pointer during memcmp().
-// Has to be global in order for expand-memcmp pass to be able to fold constants.
-GlobalVariable* buildConstantStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context, Module& M) {
-  std::vector<Constant*> Constants;
-  std::vector<Type*> Types;
-
-  for (const auto& BceBlock : Comparisons) {
-    assert(isa<BCEConstCmp>(BceBlock.getCmp()) && "Const-cmp-chain can only contain const comparisons");
-    auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+// Has to be global in order for expand-memcmp pass to be able to fold
+// constants.
+GlobalVariable *buildConstantStruct(ArrayRef<SingleBCECmpBlock> &Comparisons,
+                                    IRBuilder<> &Builder, LLVMContext &Context,
+                                    Module &M) {
+  std::vector<Constant *> Constants;
+  std::vector<Type *> Types;
+
+  for (const auto &BceBlock : Comparisons) {
+    assert(isa<BCEConstCmp>(BceBlock.getCmp()) &&
+           "Const-cmp-chain can only contain const comparisons");
+    auto *ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
     Constants.emplace_back(ConstCmp->Const);
     Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
   }
-  auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+  auto *StructType = StructType::get(
+      Context, Types, /* currently only matches packed offsets */ true);
   auto *StructConstant = ConstantStruct::get(StructType, Constants);
 
-  return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage, StructConstant, "memcmp_const_op");
+  return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage,
+                            StructConstant, "memcmp_const_op");
 }
 
 // Merges the given contiguous comparison blocks into one memcmp block.
 static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
                                     BasicBlock *const InsertBefore,
                                     BasicBlock *const NextCmpBlock,
-                                    PHINode &Phi,
-                                    LLVMContext &Context,
+                                    PHINode &Phi, LLVMContext &Context,
                                     const TargetLibraryInfo &TLI,
                                     AliasAnalysis &AA, DomTreeUpdater &DTU) {
   assert(Comparisons.size() > 1 && "merging multiple comparisons");
@@ -905,7 +932,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
   if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
     Rhs = buildConstantStruct(Comparisons, Builder, Context, *Phi.getModule());
   } else {
-    auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
+    auto *FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
     if (FirstBceCmp->Rhs.GEP)
       Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
     else
@@ -917,7 +944,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
   // If there is one block that requires splitting, we do it now, i.e.
   // just before we know we will collapse the chain. The instructions
   // can be executed before any of the instructions in the chain.
-  const auto* ToSplit = llvm::find_if(
+  const auto *ToSplit = llvm::find_if(
       Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
   if (ToSplit != Comparisons.end()) {
     LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
@@ -927,9 +954,11 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
   // memcmp expects a 'size_t' argument and returns 'int'.
   unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
   unsigned IntBits = TLI.getIntSize();
-  const unsigned TotalSizeBits = std::accumulate(
-      Comparisons.begin(), Comparisons.end(), 0u,
-      [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+  const unsigned TotalSizeBits =
+      std::accumulate(Comparisons.begin(), Comparisons.end(), 0u,
+                      [](int Size, const SingleBCECmpBlock &C) {
+                        return Size + C.getCmp()->SizeBits;
+                      });
 
   // Create memcmp() == 0.
   const auto &DL = Phi.getDataLayout();
@@ -937,26 +966,26 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
       Lhs, Rhs,
       ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
       Builder, DL, &TLI);
-  Value* IsEqual = Builder.CreateICmpEQ(
+  Value *IsEqual = Builder.CreateICmpEQ(
       MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
 
-  updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+  updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA,
+                  DTU);
   return BB;
 }
 
 // Keep existing block if it isn't merged. Only change the branches.
 // Also handles not splitting mult-blocks that use select instructions.
 static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
-                                    BasicBlock *const InsertBefore,
-                                    BasicBlock *const NextCmpBlock,
-                                    PHINode &Phi,
-                                    LLVMContext &Context,
-                                    const TargetLibraryInfo &TLI,
-                                    AliasAnalysis &AA, DomTreeUpdater &DTU) {
-  BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
-                         NextCmpBlock->getParent(), InsertBefore);
+                                       BasicBlock *const InsertBefore,
+                                       BasicBlock *const NextCmpBlock,
+                                       PHINode &Phi, LLVMContext &Context,
+                                       const TargetLibraryInfo &TLI,
+                                       AliasAnalysis &AA, DomTreeUpdater &DTU) {
+  BasicBlock *MultBB = BasicBlock::Create(
+      Context, BB->getName(), NextCmpBlock->getParent(), InsertBefore);
   auto *const BranchI = cast<BranchInst>(BB->getTerminator());
-  Value* CondResult = nullptr;
+  Value *CondResult = nullptr;
   if (BranchI->isUnconditional())
     CondResult = Phi.getIncomingValueForBlock(BB);
   else
@@ -964,7 +993,8 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
   // Transfer all instructions except the branching terminator to the new block.
   MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
   IRBuilder<> Builder(MultBB);
-  updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+  updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI,
+                  AA, DTU);
 
   return MultBB;
 }
@@ -979,7 +1009,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   // so that the next block is always available to branch to.
   BasicBlock *InsertBefore = EntryBlock_;
   BasicBlock *NextCmpBlock = Phi_.getParent();
-  SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
+  SmallDenseSet<const BasicBlock *, 8> ExistingBlocksToKeep;
   LLVMContext &Context = NextCmpBlock->getContext();
   for (const auto &Cmps : reverse(MergedBlocks_)) {
     // If there is only a single comparison then nothing should
@@ -992,7 +1022,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
         continue;
       ExistingBlocksToKeep.insert(BB);
       InsertBefore = NextCmpBlock = updateOriginalBlock(
-        BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+          BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
     } else {
       InsertBefore = NextCmpBlock = mergeComparisons(
           Cmps, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
@@ -1027,7 +1057,8 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
   SmallVector<BasicBlock *, 16> DeadBlocks;
   for (const auto &Blocks : MergedBlocks_) {
     for (const SingleBCECmpBlock &Block : Blocks) {
-      // Many single blocks can refer to the same multblock coming from an select instruction.
+      // Many single blocks can refer to the same multblock coming from an
+      // select instruction.
       // TODO: preferrably use a set instead
       if (llvm::is_contained(DeadBlocks, Block.BB))
         continue;
@@ -1077,11 +1108,10 @@ std::vector<BasicBlock *> getOrderedBlocks(PHINode &Phi,
   return Blocks;
 }
 
-template<typename T>
-bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
-  auto* IncomingValue = Phi.getIncomingValue(I);
+template <typename T> bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
+  auto *IncomingValue = Phi.getIncomingValue(I);
   return !isa<T>(IncomingValue) ||
-    cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
+         cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
 }
 
 bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
@@ -1115,7 +1145,8 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
       LLVM_DEBUG(dbgs() << "skip: several non-constant values\n");
       return false;
     }
-    if (isInvalidPrevBlock<ICmpInst>(Phi,I) && isInvalidPrevBlock<SelectInst>(Phi,I)) {
+    if (isInvalidPrevBlock<ICmpInst>(Phi, I) &&
+        isInvalidPrevBlock<SelectInst>(Phi, I)) {
       // Non-constant incoming value is not from a cmp instruction or not
       // produced by the last block. We could end up processing the value
       // producing block more than once.

>From 221fe3506eae132dea2b93954a9515fb84c1f2db Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 23:49:31 +0200
Subject: [PATCH 23/23] [MergerICmps] Fixed global var removal for failing
 memcmp codegen tests

---
 llvm/lib/CodeGen/ExpandMemCmp.cpp | 27 +++++++++------------------
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 323d34f838b27..c6f7f850c29fb 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -883,25 +883,16 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
     CI->replaceAllUsesWith(Res);
     CI->eraseFromParent();
 
-    // If the mergeicmps pass used a global constant to merge comparisons and
-    // the the global constants were folded then the variable can be deleted
+    // If the memcmp call used a global constant to merge comparisons and
+    // the global constant was folded then the variable can be deleted
     // since it isn't used anymore.
-    if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
-      // NOTE: There is still a use lingering around but that use itself isn't
-      // used so it is fine to erase this instruction.
-      static bool (*hasActiveUses)(Value *) = [](Value *V) {
-        for (User *U : V->users()) {
-          if (hasActiveUses(U))
-            return true;
-        }
-        return false;
-      };
-      if (!hasActiveUses(GV)) {
-        LLVM_DEBUG(
-            dbgs() << "Removing global constant " << GV->getName()
-                   << " that was introduced by the previous mergeicmps pass\n");
-        GV->eraseFromParent();
-      }
+    // This is mostly done when mergeicmps used a global constant to merge
+    // constant comparisons.
+    if (GV && GV->hasPrivateLinkage() && GV->isConstant() &&
+        !GV->isConstantUsed()) {
+      LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName()
+                        << " that was used by the dead memcmp() call\n");
+      GV->eraseFromParent();
     }
   }
 



More information about the llvm-commits mailing list