[llvm] [MergeICmps] Merge adjacent comparisons to constants (PR #133817)
Philipp Rados via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 31 16:01:37 PDT 2025
https://github.com/PhilippRados created https://github.com/llvm/llvm-project/pull/133817
This pull request aims to fix #117853.
### General idea
It extends the existing `MergeICmps` pass to not only merge comparisons like: `a.a == b.a && a.b == b.b` but also comparisons with arbitrary constants such as `a.a == 245 && a.b == -1`.
### Changes
Since the original pass only worked under the assumption that a single comparison could happen per basic block this had to be altered to allow multiple comparisons in a single basic block. This is because constant comparisons get flattened into a single block using a `select` instruction before the `MergeICmps` pass is run.
### How it works
Whenever a matching comparison is encountered it adds it to the cmp-chain. Then when all comparisons have been found it
sorts them meaning all const-comparisons are followed by all bce-comparisons (depends on which was first). Then it goes through all comparisons in the chain and merges the ones adjacent to each other. Comparisons inside a flattened select-block can only be merged if every comparison in that block is merged (this is a rather defensive approach).
The const merging works by building a global constant struct for every merge. This needs to be a global-const in order to be constant folded by the expand-memcmp pass where it is then also removed.
### Example
A single comparison chain can now be made up of both BCE-comparisons (two offsets to the same base) and const-comparisons (contiguous offsets to the same base with a constant).
This means that the expression:
```
struct S {
int a;
unsigned char b;
unsigned char c;
uint_16_t d;
int e;
int f;
int g;
};
bool cmp(S& a, S& b) {
return a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
}
```
can be turned into:
```
// simplified representation, for exact implementation see llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@memcmp_const_op = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
define @cmp(...) {
BB1:
memcmp(a,b,6);
br BB2, BB_end
BB2:
offset = gep ptr %a, 8
memcmp(offset, memcmp_const_op, 12)
br BB_end
...
}
```
### Issues in the current implementation, waiting for feedback on this
- This implementation currently doesn't handle single select block like the one mentioned in the issue above. The only question I have is when to launch it since for this all instructions would have to be traversed again which is a slowdown for all functions running -O3. I think a good tradeoff would be to only check if the branch condition is a select and then start the optimization from there instead of checking every single instruction.
- This pattern is pretty strict, it only works when the function returns a bool and the parameters have to dereferenceable. These restrictions basically render this optimization obsolete in C. Otherwise this implementation could also merge vectors/arrays.
- Some testcases fail alive2 when the memory is accessed differently. I think these are known issues though (https://github.com/llvm/llvm-project/issues/62459 and https://github.com/llvm/llvm-project/issues/51187)
>From 77578ba1f64f633776a2bcc49765f9fd9a08f1ff Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Thu, 30 Jan 2025 17:25:07 +0100
Subject: [PATCH 01/23] [MergeICmps] First implementation of merging
comparisons that compare adjacent memory blocks with constants
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 142 ++++++++++++++++++++++
1 file changed, 142 insertions(+)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 4291f3aee0cd1..d9b2456d40b8e 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -52,6 +52,7 @@
#include "llvm/IR/Function.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/ValueMap.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
#include "llvm/Transforms/Scalar.h"
@@ -842,6 +843,119 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
return CmpChain.simplify(TLI, AA, DTU);
}
+void removeUnusedOperands(SmallVector<Value *, 8> toCheck) {
+ while (!toCheck.empty()) {
+ Value *V = toCheck.pop_back_val();
+
+ // Only process instructions (skip constants, globals, etc.)
+ if (Instruction *OpI = dyn_cast<Instruction>(V)) {
+ if (OpI->use_empty()) {
+ toCheck.append(OpI->operands().begin(),OpI->operands().end());
+ OpI->eraseFromParent();
+ }
+ }
+ }
+}
+
+struct CommonCmp {
+ ICmpInst* CmpI;
+ unsigned Offset;
+};
+
+void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
+ auto First = AdjacentMem[0];
+ IRBuilder<> Builder(SelectI);
+ LLVMContext &Context = First.CmpI->getContext();
+ const auto &DL = First.CmpI->getDataLayout();
+
+ auto *CmpType = First.CmpI->getOperand(0)->getType();
+ auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
+ auto ArraySize = DL.getTypeAllocSize(ArrayType);
+ // TODO: check for alignment
+ auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+
+ std::vector<Constant*> Constants;
+ for (const auto& CI : AdjacentMem) {
+ // safe since we checked before that second operand is constantint
+ Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
+ }
+ auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+ Builder.CreateStore(ArrayConstant,ArrayAlloca);
+
+ // TODO: adjust base-ptr to point to start of load-offset
+ // TODO: also have to handle !=
+ Value *const MemCmpCall = emitMemCmp(
+ Base, ArrayAlloca,
+ ConstantInt::get(Type::getInt64Ty(Context), ArraySize),
+ Builder, DL, &TLI);
+ auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
+
+ BasicBlock::iterator ii(SelectI);
+ SmallVector<Value *, 8> deadOperands(SelectI->operands());
+ ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
+ removeUnusedOperands(deadOperands);
+
+ dbgs() << "DONE merging";
+}
+
+// Combines Icmp instructions if they operate on adjacent memory
+// TODO: check that base address' memory isn't modified between comparisons
+bool tryMergeIcmps(SelectInst* SelectI, Value* Base, std::vector<CommonCmp> &Icmps,const TargetLibraryInfo &TLI) {
+ assert(!Icmps.empty() && "if entry exists then has at least one cmp");
+ bool hasMerged = false;
+
+ std::vector<CommonCmp> AdjacentMem{Icmps[0]};
+ auto Prev = Icmps[0];
+ for (auto& Cmp : llvm::drop_begin(Icmps)) {
+ if (Cmp.Offset == (Prev.Offset + 1)) {
+ AdjacentMem.emplace_back(Cmp);
+ } else if (AdjacentMem.size() > 1) {
+ mergeAdjacentComparisons(SelectI,Base, AdjacentMem,TLI);
+ hasMerged = true;
+ AdjacentMem.clear();
+ AdjacentMem.emplace_back(Cmp);
+ }
+ Prev = Cmp;
+ }
+
+ if (AdjacentMem.size() > 1) {
+ mergeAdjacentComparisons(SelectI, Base, AdjacentMem,TLI);
+ hasMerged = true;
+ }
+
+ return hasMerged;
+}
+
+// Given an operand from a load, return the original base pointer and
+// if operand is GEP also it's offset from base pointer
+// but only if offset is known at compile time
+std::tuple<Value*, std::optional<unsigned>> findPtrAndOffset(Value* V, unsigned Offset) {
+ if (const auto& GepI = dyn_cast<GetElementPtrInst>(V)){
+ if (const auto& Index = dyn_cast<ConstantInt>(GepI->getOperand(1))) {
+ if (Index->getBitWidth() <= 64) {
+ return findPtrAndOffset(GepI->getPointerOperand(), Offset + Index->getZExtValue());
+ }
+ }
+ return {V,std::nullopt};
+ }
+
+ return {V,Offset};
+}
+
+
+std::optional<Value*> constantCmp(ICmpInst* CmpI,std::vector<CommonCmp>* cmps) {
+ auto const& LoadI = dyn_cast<LoadInst>(CmpI->getOperand(0));
+ auto const& ConstantI = dyn_cast<ConstantInt>(CmpI->getOperand(1));
+ if (!LoadI || !ConstantI)
+ return std::nullopt;
+
+ auto [BasePtr, Offset] = findPtrAndOffset(LoadI->getOperand(0),0);
+ if (Offset)
+ cmps->emplace_back(CommonCmp {CmpI, *Offset});
+
+ return BasePtr;
+}
+
static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
const TargetTransformInfo &TTI, AliasAnalysis &AA,
DominatorTree *DT) {
@@ -867,6 +981,34 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
MadeChange |= processPhi(*Phi, TLI, AA, DTU);
}
+ // merge cmps that load from same address and compare with constant
+ for (BasicBlock &BB : F) {
+ // from bottom up to find the root result of all comparisons
+ for (Instruction &I : llvm::reverse(BB)) {
+ if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
+ auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+ auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+ auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+ if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
+ continue;
+
+ Value* BasePtr;
+ std::vector<CommonCmp> cmps;
+ if (auto bp = constantCmp(Cmp1,&cmps))
+ BasePtr = *bp;
+ if (auto bp = constantCmp(Cmp2,&cmps)) {
+ if (BasePtr != bp) continue;
+ }
+
+ MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
+ break;
+ }
+ }
+ }
+
+ F.dump();
+
return MadeChange;
}
>From 23042079fd84c1edbea65d4fdd09e9194fa1dfb4 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Fri, 14 Feb 2025 20:43:55 +0100
Subject: [PATCH 02/23] [MergeICmps] SelectCmp checkpoint
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 204 ++++++++++++++++++----
1 file changed, 167 insertions(+), 37 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index d9b2456d40b8e..2194c4a925162 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -50,6 +50,7 @@
#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"
+#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/ValueMap.h"
@@ -176,24 +177,58 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
}
+struct Comparison {
+ int SizeBits;
+ const ICmpInst *CmpI;
+
+ using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
+
+ Comparison(int SizeBits, const ICmpInst *CmpI);
+ virtual ~Comparison() {};
+ virtual LoadOperands getLoads() = 0;
+};
+
+// A comparison between a BCE atom and an integer constant.
+// If these BCE atoms are chained and access adjacent memory then they too can be merged, e.g.
+// ```
+// int *p = ...;
+// int a = p[0];
+// int b = p[1];
+// return a == 100 && b == 2;
+// ```
+struct BCEConstCmp : public Comparison {
+ BCEAtom Lhs;
+ Constant* Const;
+
+ BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
+ : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+
+ Comparison::LoadOperands getLoads() override {
+ return std::make_pair(&Lhs,std::nullopt);
+ }
+};
+
// A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
// top.
// Note: the terminology is misleading: the comparison is symmetric, so there
// is no real {l/r}hs. What we want though is to have the same base on the
// left (resp. right), so that we can detect consecutive loads. To ensure this
// we put the smallest atom on the left.
-struct BCECmp {
+struct BCECmp : public Comparison {
BCEAtom Lhs;
BCEAtom Rhs;
- int SizeBits;
- const ICmpInst *CmpI;
BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
- : Lhs(std::move(L)), Rhs(std::move(R)), SizeBits(SizeBits), CmpI(CmpI) {
+ : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R)) {
if (Rhs < Lhs) std::swap(Rhs, Lhs);
}
+
+ Comparison::LoadOperands getLoads() override {
+ return std::make_pair(&Lhs,&Rhs);
+ }
};
+
// A basic block with a comparison between two BCE atoms.
// The block might do extra work besides the atom comparison, in which case
// doesOtherWork() returns true. Under some conditions, the block can be
@@ -203,12 +238,12 @@ class BCECmpBlock {
public:
typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
- BCECmpBlock(BCECmp Cmp, BasicBlock *BB, InstructionSet BlockInsts)
- : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+ BCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+ : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
- const BCEAtom &Lhs() const { return Cmp.Lhs; }
- const BCEAtom &Rhs() const { return Cmp.Rhs; }
- int SizeBits() const { return Cmp.SizeBits; }
+ // const BCEAtom &Lhs() const { return Cmp.Lhs; }
+ // const BCEAtom &Rhs() const { return Cmp.Rhs; }
+ // int SizeBits() const { return Cmp.SizeBits; }
// Returns true if the block does other works besides comparison.
bool doesOtherWork() const;
@@ -238,7 +273,7 @@ class BCECmpBlock {
unsigned OrigOrder = 0;
private:
- BCECmp Cmp;
+ std::vector<Comparison*> Cmps;
};
bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
@@ -301,9 +336,50 @@ bool BCECmpBlock::doesOtherWork() const {
return false;
}
+class IntraCmpChain {
+ std::vector<Comparison*> CmpChain;
+
+public:
+ IntraCmpChain(Comparison* C) : CmpChain{C} {}
+ IntraCmpChain concat(const IntraCmpChain OtherChain) {
+ CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+ return *this;
+ }
+ BCECmpBlock::InstructionSet getAllInsts() {
+ BCECmpBlock::InstructionSet Insts;
+ for (auto Cmp : CmpChain) {
+ // TODO: this mess should be able to get OOP'd
+ if (auto* BceCmpI = dyn_cast<BCECmp>(&Cmp)) {
+ Insts.insert(BceCmpI->Lhs.LoadI);
+ Insts.insert(BceCmpI->Rhs.LoadI);
+ Insts.insert(BceCmpI->CmpI);
+ if (BceCmpI->Lhs.GEP)
+ Insts.insert(BceCmpI->Lhs.GEP);
+ if (BceCmpI->Rhs.GEP)
+ Insts.insert(BceCmpI->Rhs.GEP);
+ } else if (auto* BceConstCmpI = dyn_cast<BCEConstCmp>(&Cmp)) {
+ Insts.insert(BceCmpI->Lhs.LoadI);
+ Insts.insert(BceCmpI->CmpI);
+ if (BceCmpI->Lhs.GEP)
+ Insts.insert(BceCmpI->Lhs.GEP);
+ }
+ }
+ return Insts;
+ }
+ std::vector<Comparison*> getCmpChain() const {
+ return CmpChain;
+ }
+
+ // Determines if all comparisons in the comparison chain are all either `BCECmp` or all `BCEConstCmp`
+ bool isAllSameCmp() {
+ return llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCECmp>(c);}) ||
+ llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCEConstCmp>(c);});
+ }
+};
+
// Visit the given comparison. If this is a comparison between two valid
-// BCE atoms, returns the comparison.
-std::optional<BCECmp> visitICmp(const ICmpInst *const CmpI,
+// BCE atoms, or between a BCE atom and a constant, returns the comparison.
+std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
const ICmpInst::Predicate ExpectedPredicate,
BaseIdentifier &BaseId) {
// The comparison can only be used once:
@@ -320,17 +396,63 @@ std::optional<BCECmp> visitICmp(const ICmpInst *const CmpI,
LLVM_DEBUG(dbgs() << "cmp "
<< (ExpectedPredicate == ICmpInst::ICMP_EQ ? "eq" : "ne")
<< "\n");
+ // First operand is always a load
auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId);
if (!Lhs.BaseId)
return std::nullopt;
- auto Rhs = visitICmpLoadOperand(CmpI->getOperand(1), BaseId);
+
+ // Second operand can either be load if doing compare between two BCE atoms or
+ // can be constant if comparing adjacent memory to constant
+ auto* RhsOperand = CmpI->getOperand(1);
+ const auto &DL = CmpI->getDataLayout();
+ int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
+
+ if (auto const& Const = dyn_cast<Constant>(RhsOperand))
+ return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
+
+ auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId);
if (!Rhs.BaseId)
return std::nullopt;
- const auto &DL = CmpI->getDataLayout();
- return BCECmp(std::move(Lhs), std::move(Rhs),
- DL.getTypeSizeInBits(CmpI->getOperand(0)->getType()), CmpI);
+ return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
}
+// Chain of comparisons inside a single basic block connected using `select` nodes.
+std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+
+std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+ ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
+ if (!SelectI->hasOneUse()) {
+ LLVM_DEBUG(dbgs() << "select has several uses\n");
+ return std::nullopt;
+ }
+ auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+ auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+ auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+ if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+ return std::nullopt;
+
+ auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
+ if (!Lhs)
+ return std::nullopt;
+ auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
+ if (!Rhs)
+ return std::nullopt;
+
+ return Lhs->concat(*Rhs);
+}
+
+std::optional<IntraCmpChain> visitComparison(Value *Cond,
+ ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
+ if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+ return visitICmp(CmpI, ExpectedPredicate, BaseId);
+ if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+ return visitSelect(SelectI, ExpectedPredicate, BaseId);
+
+ return std::nullopt;
+}
+
+
// Visit the given comparison block. If this is a comparison between two valid
// BCE atoms, returns the comparison.
std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
@@ -367,22 +489,21 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
}
- auto *CmpI = dyn_cast<ICmpInst>(Cond);
- if (!CmpI)
+ std::optional<IntraCmpChain> CmpChain = visitComparison(Cond, ExpectedPredicate, BaseId);
+ if (!CmpChain)
return std::nullopt;
- LLVM_DEBUG(dbgs() << "icmp\n");
- std::optional<BCECmp> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
- if (!Result)
+ if (!CmpChain->isAllSameCmp())
return std::nullopt;
- BCECmpBlock::InstructionSet BlockInsts(
- {Result->Lhs.LoadI, Result->Rhs.LoadI, Result->CmpI, BranchI});
- if (Result->Lhs.GEP)
- BlockInsts.insert(Result->Lhs.GEP);
- if (Result->Rhs.GEP)
- BlockInsts.insert(Result->Rhs.GEP);
- return BCECmpBlock(std::move(*Result), Block, BlockInsts);
+ std::vector<Comparison*> SortedCmpChain(CmpChain->getCmpChain());
+ llvm::sort(SortedCmpChain, [](Comparison* l, Comparison* r) {
+ return l->getLoads() < r->getLoads();
+ });
+
+ BCECmpBlock::InstructionSet BlockInsts(CmpChain->getAllInsts());
+ BlockInsts.insert(BranchI);
+ return BCECmpBlock(SortedCmpChain, Block, BlockInsts);
}
static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
@@ -832,6 +953,7 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
const auto Blocks =
getOrderedBlocks(Phi, LastBlock, Phi.getNumIncomingValues());
+
if (Blocks.empty()) return false;
BCECmpChain CmpChain(Blocks, Phi, AA);
@@ -863,16 +985,18 @@ struct CommonCmp {
};
void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
- auto First = AdjacentMem[0];
IRBuilder<> Builder(SelectI);
- LLVMContext &Context = First.CmpI->getContext();
- const auto &DL = First.CmpI->getDataLayout();
+ auto* M = SelectI->getModule();
+ LLVMContext &Context = SelectI->getContext();
+ const auto &DL = SelectI->getDataLayout();
+ auto First = AdjacentMem[0];
auto *CmpType = First.CmpI->getOperand(0)->getType();
auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
- auto ArraySize = DL.getTypeAllocSize(ArrayType);
+ auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
// TODO: check for alignment
- auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+ // auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+ // Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
std::vector<Constant*> Constants;
for (const auto& CI : AdjacentMem) {
@@ -880,14 +1004,20 @@ void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<Common
Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
}
auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
- Builder.CreateStore(ArrayConstant,ArrayAlloca);
+M->getOrInsertGlobal("globalKey", ArrayType);
+ GlobalVariable* gVar = M->getNamedGlobal("globalKey");
+ gVar->setLinkage(GlobalValue::PrivateLinkage);
+ gVar->setInitializer(ArrayConstant);
+ gVar->setConstant(true);
+ // Builder.CreateStore(ArrayConstant,ArrayAlloca);
// TODO: adjust base-ptr to point to start of load-offset
// TODO: also have to handle !=
Value *const MemCmpCall = emitMemCmp(
- Base, ArrayAlloca,
- ConstantInt::get(Type::getInt64Ty(Context), ArraySize),
+ Base, gVar,
+ ArraySize,
Builder, DL, &TLI);
+ // Builder.CreateLifetimeEnd(ArrayAlloca,ArraySize);
auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
BasicBlock::iterator ii(SelectI);
@@ -981,7 +1111,7 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
MadeChange |= processPhi(*Phi, TLI, AA, DTU);
}
- // merge cmps that load from same address and compare with constant
+ // Try to merge remaining select nodes that haven't been merged from phi-node merging
for (BasicBlock &BB : F) {
// from bottom up to find the root result of all comparisons
for (Instruction &I : llvm::reverse(BB)) {
>From 845569139580b005f3446895ec915ff3e1d5c25a Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Sun, 16 Feb 2025 20:53:58 +0100
Subject: [PATCH 03/23] [MergeICmps] Implemented merge with constant across
basic blocks
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 270 +++++++++++-------
.../Transforms/MergeICmps/X86/const-cmp-bb.ll | 37 +++
2 files changed, 202 insertions(+), 105 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2194c4a925162..93ecfc5d780e4 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -183,9 +183,19 @@ struct Comparison {
using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
- Comparison(int SizeBits, const ICmpInst *CmpI);
- virtual ~Comparison() {};
+ Comparison(int SizeBits, const ICmpInst *CmpI) : SizeBits(SizeBits), CmpI(CmpI) {}
+ virtual ~Comparison() = default;
virtual LoadOperands getLoads() = 0;
+ virtual std::optional<Constant*> getConstant() = 0;
+ virtual bool isConstCmp()const = 0;
+ bool operator<(Comparison &O) {
+ auto [Lhs,Rhs] = getLoads();
+ auto [OtherLhs,OtherRhs] = O.getLoads();
+
+ if (!isConstCmp())
+ return std::tie(*Lhs,**Rhs) < std::tie(*OtherLhs,**OtherRhs);
+ return *Lhs < *OtherLhs;
+ }
};
// A comparison between a BCE atom and an integer constant.
@@ -206,6 +216,12 @@ struct BCEConstCmp : public Comparison {
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,std::nullopt);
}
+ std::optional<Constant*> getConstant() override {
+ return Const;
+ }
+ bool isConstCmp() const override {
+ return true;
+ }
};
// A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -226,6 +242,12 @@ struct BCECmp : public Comparison {
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,&Rhs);
}
+ std::optional<Constant*> getConstant() override {
+ return std::nullopt;
+ }
+ bool isConstCmp() const override {
+ return false;
+ }
};
@@ -238,12 +260,25 @@ class BCECmpBlock {
public:
typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
- BCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
- : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
+ BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
+ : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+
+ const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
+ const std::optional<BCEAtom*> Rhs() const { return Cmp->getLoads().second; }
+ std::optional<Constant*> getConstant() const {
+ return Cmp->getConstant();
+ }
+ bool isConstCmp() const {
+ return Cmp->isConstCmp();
+ }
+ Comparison* getCmp() const {
+ return Cmp;
+ }
+ bool operator<(const BCECmpBlock &O) const {
+ return *Cmp < *O.getCmp();
+ }
- // const BCEAtom &Lhs() const { return Cmp.Lhs; }
- // const BCEAtom &Rhs() const { return Cmp.Rhs; }
- // int SizeBits() const { return Cmp.SizeBits; }
+ int SizeBits() const { return Cmp->SizeBits; }
// Returns true if the block does other works besides comparison.
bool doesOtherWork() const;
@@ -273,7 +308,7 @@ class BCECmpBlock {
unsigned OrigOrder = 0;
private:
- std::vector<Comparison*> Cmps;
+ Comparison* Cmp;
};
bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
@@ -287,7 +322,8 @@ bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
};
- if (MayClobber(Cmp.Lhs.LoadI) || MayClobber(Cmp.Rhs.LoadI))
+ auto [Lhs,Rhs] = Cmp->getLoads();
+ if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
return false;
}
// Make sure this instruction does not use any of the BCE cmp block
@@ -345,36 +381,9 @@ class IntraCmpChain {
CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
return *this;
}
- BCECmpBlock::InstructionSet getAllInsts() {
- BCECmpBlock::InstructionSet Insts;
- for (auto Cmp : CmpChain) {
- // TODO: this mess should be able to get OOP'd
- if (auto* BceCmpI = dyn_cast<BCECmp>(&Cmp)) {
- Insts.insert(BceCmpI->Lhs.LoadI);
- Insts.insert(BceCmpI->Rhs.LoadI);
- Insts.insert(BceCmpI->CmpI);
- if (BceCmpI->Lhs.GEP)
- Insts.insert(BceCmpI->Lhs.GEP);
- if (BceCmpI->Rhs.GEP)
- Insts.insert(BceCmpI->Rhs.GEP);
- } else if (auto* BceConstCmpI = dyn_cast<BCEConstCmp>(&Cmp)) {
- Insts.insert(BceCmpI->Lhs.LoadI);
- Insts.insert(BceCmpI->CmpI);
- if (BceCmpI->Lhs.GEP)
- Insts.insert(BceCmpI->Lhs.GEP);
- }
- }
- return Insts;
- }
std::vector<Comparison*> getCmpChain() const {
return CmpChain;
}
-
- // Determines if all comparisons in the comparison chain are all either `BCECmp` or all `BCEConstCmp`
- bool isAllSameCmp() {
- return llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCECmp>(c);}) ||
- llvm::all_of(CmpChain, [](Comparison& c) {return isa<BCEConstCmp>(c);});
- }
};
// Visit the given comparison. If this is a comparison between two valid
@@ -489,32 +498,39 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
}
- std::optional<IntraCmpChain> CmpChain = visitComparison(Cond, ExpectedPredicate, BaseId);
- if (!CmpChain)
+ auto* CmpI = dyn_cast<ICmpInst>(Cond);
+ if (!CmpI)
return std::nullopt;
+ LLVM_DEBUG(dbgs() << "icmp\n");
- if (!CmpChain->isAllSameCmp())
+ std::optional<Comparison*> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
+ if (!Result)
return std::nullopt;
- std::vector<Comparison*> SortedCmpChain(CmpChain->getCmpChain());
- llvm::sort(SortedCmpChain, [](Comparison* l, Comparison* r) {
- return l->getLoads() < r->getLoads();
- });
-
- BCECmpBlock::InstructionSet BlockInsts(CmpChain->getAllInsts());
+ BCECmpBlock::InstructionSet BlockInsts;
+ auto [Lhs,Rhs] = (*Result)->getLoads();
+ BlockInsts.insert(Lhs->LoadI);
+ if (Lhs->GEP)
+ BlockInsts.insert(Lhs->GEP);
+ if (Rhs) {
+ BlockInsts.insert((*Rhs)->LoadI);
+ if ((*Rhs)->GEP)
+ BlockInsts.insert((*Rhs)->GEP);
+ }
+ BlockInsts.insert((*Result)->CmpI);
BlockInsts.insert(BranchI);
- return BCECmpBlock(SortedCmpChain, Block, BlockInsts);
+ return BCECmpBlock(std::move(*Result), Block, BlockInsts);
}
static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
BCECmpBlock &&Comparison) {
- LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
- << "': Found cmp of " << Comparison.SizeBits()
- << " bits between " << Comparison.Lhs().BaseId << " + "
- << Comparison.Lhs().Offset << " and "
- << Comparison.Rhs().BaseId << " + "
- << Comparison.Rhs().Offset << "\n");
- LLVM_DEBUG(dbgs() << "\n");
+ // LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+ // << "': Found cmp of " << Comparison.SizeBits()
+ // << " bits between " << Comparison.Lhs().BaseId << " + "
+ // << Comparison.Lhs().Offset << " and "
+ // << Comparison.Rhs().BaseId << " + "
+ // << Comparison.Rhs().Offset << "\n");
+ // LLVM_DEBUG(dbgs() << "\n");
Comparison.OrigOrder = Comparisons.size();
Comparisons.push_back(std::move(Comparison));
}
@@ -544,10 +560,16 @@ class BCECmpChain {
};
static bool areContiguous(const BCECmpBlock &First, const BCECmpBlock &Second) {
- return First.Lhs().BaseId == Second.Lhs().BaseId &&
- First.Rhs().BaseId == Second.Rhs().BaseId &&
- First.Lhs().Offset + First.SizeBits() / 8 == Second.Lhs().Offset &&
- First.Rhs().Offset + First.SizeBits() / 8 == Second.Rhs().Offset;
+ bool HasContigLhs = First.Lhs()->BaseId == Second.Lhs()->BaseId &&
+ First.Lhs()->Offset + First.SizeBits() / 8 == Second.Lhs()->Offset;
+ bool HasContigRhs = true;
+ auto FirstRhs = First.Rhs();
+ auto SecondRhs = Second.Rhs();
+ if (FirstRhs && SecondRhs)
+ HasContigRhs = (*FirstRhs)->BaseId == (*SecondRhs)->BaseId &&
+ (*FirstRhs)->Offset + First.SizeBits() / 8 == (*SecondRhs)->Offset;
+
+ return HasContigLhs && HasContigRhs;
}
static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
@@ -566,8 +588,7 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
// Sort to detect continuous offsets.
llvm::sort(Blocks,
[](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
- return std::tie(LhsBlock.Lhs(), LhsBlock.Rhs()) <
- std::tie(RhsBlock.Lhs(), RhsBlock.Rhs());
+ return LhsBlock < RhsBlock;
});
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
@@ -592,6 +613,26 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
return MergedBlocks;
}
+// A valid comparison chain means that all comparisons are of the same kind (either all `BCECmp` or all `BCEConstCmp`).
+// Additionally if all comparisons are `BCEConstCmp` they all need to have the same type to build a valid LLVM constant array.
+// TODO: Could even build a memory chain of different types using seperate allocations
+bool isValidCmpChain(std::vector<BCECmpBlock> Comparisons) {
+ BCECmpBlock* PrevCmp = nullptr;
+ for (BCECmpBlock BceCmpBlock : Comparisons) {
+ if (PrevCmp) {
+ if (PrevCmp->isConstCmp() != BceCmpBlock.isConstCmp())
+ return false;
+ if (PrevCmp->isConstCmp()){
+ if (PrevCmp->Lhs()->LoadI->getType() != BceCmpBlock.Lhs()->LoadI->getType())
+ return false;
+ }
+ }
+
+ PrevCmp = &BceCmpBlock;
+ }
+ return true;
+}
+
BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
AliasAnalysis &AA)
: Phi_(Phi) {
@@ -670,6 +711,12 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
return;
}
+
+ if(!isValidCmpChain(Comparisons)) {
+ LLVM_DEBUG(dbgs() << "invalid comparison chain");
+ return;
+ }
+
EntryBlock_ = Comparisons[0].BB;
MergedBlocks_ = mergeBlocks(std::move(Comparisons));
}
@@ -738,14 +785,34 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
IRBuilder<> Builder(BB);
// Add the GEPs from the first BCECmpBlock.
Value *Lhs, *Rhs;
- if (FirstCmp.Lhs().GEP)
- Lhs = Builder.Insert(FirstCmp.Lhs().GEP->clone());
+
+ // memcmp expects a 'size_t' argument and returns 'int'.
+ unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+ unsigned IntBits = TLI.getIntSize();
+ const unsigned TotalSizeBits = std::accumulate(
+ Comparisons.begin(), Comparisons.end(), 0u,
+ [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
+
+ if (FirstCmp.Lhs()->GEP)
+ Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
else
- Lhs = FirstCmp.Lhs().LoadI->getPointerOperand();
- if (FirstCmp.Rhs().GEP)
- Rhs = Builder.Insert(FirstCmp.Rhs().GEP->clone());
+ Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
+ // Build constant-array to compare to
+ if (FirstCmp.isConstCmp()) {
+ auto* ArrayType = ArrayType::get(FirstCmp.Lhs()->LoadI->getType(),TotalSizeBits / 8);
+ auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+ std::vector<Constant*> Constants;
+ for (const auto& BceBlock : Comparisons) {
+ // safe since we checked before that second operand is constant-int
+ Constants.emplace_back(*BceBlock.getConstant());
+ }
+ auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+ Builder.CreateStore(ArrayConstant,ArrayAlloca);
+ Rhs = ArrayAlloca;
+ } else if ((*FirstCmp.Rhs())->GEP)
+ Rhs = Builder.Insert((*FirstCmp.Rhs())->GEP->clone());
else
- Rhs = FirstCmp.Rhs().LoadI->getPointerOperand();
+ Rhs = (*FirstCmp.Rhs())->LoadI->getPointerOperand();
Value *IsEqual = nullptr;
LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
@@ -764,21 +831,17 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
if (Comparisons.size() == 1) {
LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
// Use clone to keep the metadata
- Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs().LoadI->clone());
- Instruction *const RhsLoad = Builder.Insert(FirstCmp.Rhs().LoadI->clone());
+ Instruction *const LhsLoad = Builder.Insert((*FirstCmp.Lhs()).LoadI->clone());
LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
- RhsLoad->replaceUsesOfWith(RhsLoad->getOperand(0), Rhs);
// There are no blocks to merge, just do the comparison.
- IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
+ if (FirstCmp.isConstCmp())
+ IsEqual = Builder.CreateICmpEQ(LhsLoad, *FirstCmp.getConstant());
+ else {
+ Instruction *const RhsLoad = Builder.Insert((*FirstCmp.Rhs())->LoadI->clone());
+ RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
+ IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
+ }
} else {
- const unsigned TotalSizeBits = std::accumulate(
- Comparisons.begin(), Comparisons.end(), 0u,
- [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
-
- // memcmp expects a 'size_t' argument and returns 'int'.
- unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
- unsigned IntBits = TLI.getIntSize();
-
// Create memcmp() == 0.
const auto &DL = Phi.getDataLayout();
Value *const MemCmpCall = emitMemCmp(
@@ -995,7 +1058,6 @@ void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<Common
auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
// TODO: check for alignment
- // auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
// Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
std::vector<Constant*> Constants;
@@ -1025,7 +1087,7 @@ M->getOrInsertGlobal("globalKey", ArrayType);
ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
removeUnusedOperands(deadOperands);
- dbgs() << "DONE merging";
+ // dbgs() << "DONE merging";
}
// Combines Icmp instructions if they operate on adjacent memory
@@ -1112,32 +1174,30 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
}
// Try to merge remaining select nodes that haven't been merged from phi-node merging
- for (BasicBlock &BB : F) {
- // from bottom up to find the root result of all comparisons
- for (Instruction &I : llvm::reverse(BB)) {
- if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
- auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
- auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
- auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
- if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
- continue;
-
- Value* BasePtr;
- std::vector<CommonCmp> cmps;
- if (auto bp = constantCmp(Cmp1,&cmps))
- BasePtr = *bp;
- if (auto bp = constantCmp(Cmp2,&cmps)) {
- if (BasePtr != bp) continue;
- }
-
- MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
- break;
- }
- }
- }
-
- F.dump();
+ // for (BasicBlock &BB : F) {
+ // // from bottom up to find the root result of all comparisons
+ // for (Instruction &I : llvm::reverse(BB)) {
+ // if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
+ // auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+ // auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+ // auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+ // if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
+ // continue;
+
+ // Value* BasePtr;
+ // std::vector<CommonCmp> cmps;
+ // if (auto bp = constantCmp(Cmp1,&cmps))
+ // BasePtr = *bp;
+ // if (auto bp = constantCmp(Cmp2,&cmps)) {
+ // if (BasePtr != bp) continue;
+ // }
+
+ // MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
+ // break;
+ // }
+ // }
+ // }
return MadeChange;
}
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
new file mode 100644
index 0000000000000..92c1d187aa08f
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --force-update
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+
+; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
+
+define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
+; CHECK-LABEL: @test(
+; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
+; CHECK-NEXT: [[TMP0:%.*]] = alloca [3 x i8], align 1
+; CHECK-NEXT: store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
+; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br label [[IF_END5:%.*]]
+; CHECK: land.end:
+; CHECK-NEXT: ret i1 [[TMP1]]
+;
+entry:
+ %0 = load i8, ptr %p, align 1
+ %cmp = icmp eq i8 %0, -1
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %cmp5 = icmp eq i8 %1, -56
+ br i1 %cmp5, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true
+ %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 2
+ %2 = load i8, ptr %arrayidx2, align 1
+ %cmp8 = icmp eq i8 %2, -66
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true, %entry
+ %3 = phi i1 [ false, %land.lhs.true ], [ false, %entry ], [ %cmp8, %land.rhs ]
+ ret i1 %3
+}
>From 79b1565a8546b8272944b0da996e76ac096a3c3b Mon Sep 17 00:00:00 2001
From: PhilippR <phil.black at gmx.net>
Date: Thu, 20 Feb 2025 18:54:07 +0100
Subject: [PATCH 04/23] [MergeICmps] Use RTTI; Can merge mixed comparison
chains
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 319 +++++++++---------
.../Transforms/MergeICmps/X86/const-cmp-bb.ll | 2 +-
.../MergeICmps/X86/mixed-comparisons.ll | 71 ++++
3 files changed, 236 insertions(+), 156 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 93ecfc5d780e4..df00fff3194c2 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -177,25 +177,31 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
}
+typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
+
struct Comparison {
+public:
+ enum CompKind {
+ CK_ConstCmp,
+ CK_BceCmp,
+ };
+private:
+ const CompKind Kind;
+public:
int SizeBits;
const ICmpInst *CmpI;
using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
- Comparison(int SizeBits, const ICmpInst *CmpI) : SizeBits(SizeBits), CmpI(CmpI) {}
+ Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
+ : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
+ CompKind getKind() const { return Kind; }
+
virtual ~Comparison() = default;
virtual LoadOperands getLoads() = 0;
- virtual std::optional<Constant*> getConstant() = 0;
- virtual bool isConstCmp()const = 0;
- bool operator<(Comparison &O) {
- auto [Lhs,Rhs] = getLoads();
- auto [OtherLhs,OtherRhs] = O.getLoads();
-
- if (!isConstCmp())
- return std::tie(*Lhs,**Rhs) < std::tie(*OtherLhs,**OtherRhs);
- return *Lhs < *OtherLhs;
- }
+ virtual InstructionSet getInsts() = 0;
+ bool areContiguous(const Comparison& Other) const;
+ bool operator<(const Comparison &Other) const;
};
// A comparison between a BCE atom and an integer constant.
@@ -211,17 +217,21 @@ struct BCEConstCmp : public Comparison {
Constant* Const;
BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
- : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+ : Comparison(CK_ConstCmp, SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
+ static bool classof(const Comparison* C) {
+ return C->getKind() == CK_ConstCmp;
+ }
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,std::nullopt);
}
- std::optional<Constant*> getConstant() override {
- return Const;
- }
- bool isConstCmp() const override {
- return true;
+ InstructionSet getInsts() override {
+ InstructionSet BlockInsts{CmpI,Lhs.LoadI};
+ if (Lhs.GEP)
+ BlockInsts.insert(Lhs.GEP);
+ return BlockInsts;
}
+
};
// A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -235,21 +245,55 @@ struct BCECmp : public Comparison {
BCEAtom Rhs;
BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
- : Comparison(SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R)) {
+ : Comparison(CK_BceCmp, SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R)) {
if (Rhs < Lhs) std::swap(Rhs, Lhs);
}
+ static bool classof(const Comparison* C) {
+ return C->getKind() == CK_BceCmp;
+ }
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,&Rhs);
}
- std::optional<Constant*> getConstant() override {
- return std::nullopt;
- }
- bool isConstCmp() const override {
- return false;
+ InstructionSet getInsts() override {
+ InstructionSet BlockInsts{CmpI, Lhs.LoadI, Rhs.LoadI};
+ if (Lhs.GEP)
+ BlockInsts.insert(Lhs.GEP);
+ if (Rhs.GEP)
+ BlockInsts.insert(Rhs.GEP);
+ return BlockInsts;
}
};
+bool Comparison::areContiguous(const Comparison& Other) const {
+ assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+ if (isa<BCEConstCmp>(this)) {
+ const auto& First = cast<BCEConstCmp>(this);
+ const auto& Second = cast<BCEConstCmp>(Other);
+
+ return First->Lhs.BaseId == Second.Lhs.BaseId &&
+ First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset;
+ }
+ const auto& First = cast<BCECmp>(this);
+ const auto& Second = cast<BCECmp>(Other);
+
+ return First->Lhs.BaseId == Second.Lhs.BaseId &&
+ First->Rhs.BaseId == Second.Rhs.BaseId &&
+ First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset &&
+ First->Rhs.Offset + First->SizeBits / 8 == Second.Rhs.Offset;
+}
+bool Comparison::operator<(const Comparison& Other) const {
+ assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+ if (isa<BCEConstCmp>(this)) {
+ const auto& First = cast<BCEConstCmp>(this);
+ const auto& Second = cast<BCEConstCmp>(Other);
+ return First->Lhs < Second.Lhs;
+ }
+ const auto& First = cast<BCECmp>(this);
+ const auto& Second = cast<BCECmp>(Other);
+ return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
+}
+
// A basic block with a comparison between two BCE atoms.
// The block might do extra work besides the atom comparison, in which case
@@ -258,27 +302,13 @@ struct BCECmp : public Comparison {
// (see canSplit()).
class BCECmpBlock {
public:
- typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
-
BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
: BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
- const std::optional<BCEAtom*> Rhs() const { return Cmp->getLoads().second; }
- std::optional<Constant*> getConstant() const {
- return Cmp->getConstant();
- }
- bool isConstCmp() const {
- return Cmp->isConstCmp();
- }
- Comparison* getCmp() const {
+ const Comparison* getCmp() const {
return Cmp;
}
- bool operator<(const BCECmpBlock &O) const {
- return *Cmp < *O.getCmp();
- }
-
- int SizeBits() const { return Cmp->SizeBits; }
// Returns true if the block does other works besides comparison.
bool doesOtherWork() const;
@@ -426,41 +456,40 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
}
// Chain of comparisons inside a single basic block connected using `select` nodes.
-std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
-
-std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
- ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
- if (!SelectI->hasOneUse()) {
- LLVM_DEBUG(dbgs() << "select has several uses\n");
- return std::nullopt;
- }
- auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
- auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
- auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
- if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
- return std::nullopt;
-
- auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
- if (!Lhs)
- return std::nullopt;
- auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
- if (!Rhs)
- return std::nullopt;
-
- return Lhs->concat(*Rhs);
-}
-
-std::optional<IntraCmpChain> visitComparison(Value *Cond,
- ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
- if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
- return visitICmp(CmpI, ExpectedPredicate, BaseId);
- if (auto *SelectI = dyn_cast<SelectInst>(Cond))
- return visitSelect(SelectI, ExpectedPredicate, BaseId);
-
- return std::nullopt;
-}
-
+// std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+
+// std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+// ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
+// if (!SelectI->hasOneUse()) {
+// LLVM_DEBUG(dbgs() << "select has several uses\n");
+// return std::nullopt;
+// }
+// auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+// auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+// auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+
+// if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+// return std::nullopt;
+
+// auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
+// if (!Lhs)
+// return std::nullopt;
+// auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
+// if (!Rhs)
+// return std::nullopt;
+
+// return Lhs->concat(*Rhs);
+// }
+
+// std::optional<IntraCmpChain> visitComparison(Value *Cond,
+// ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
+// if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+// return visitICmp(CmpI, ExpectedPredicate, BaseId);
+// if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+// return visitSelect(SelectI, ExpectedPredicate, BaseId);
+
+// return std::nullopt;
+// }
// Visit the given comparison block. If this is a comparison between two valid
// BCE atoms, returns the comparison.
@@ -507,30 +536,29 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
if (!Result)
return std::nullopt;
- BCECmpBlock::InstructionSet BlockInsts;
- auto [Lhs,Rhs] = (*Result)->getLoads();
- BlockInsts.insert(Lhs->LoadI);
- if (Lhs->GEP)
- BlockInsts.insert(Lhs->GEP);
- if (Rhs) {
- BlockInsts.insert((*Rhs)->LoadI);
- if ((*Rhs)->GEP)
- BlockInsts.insert((*Rhs)->GEP);
- }
- BlockInsts.insert((*Result)->CmpI);
+ InstructionSet BlockInsts((*Result)->getInsts());
BlockInsts.insert(BranchI);
return BCECmpBlock(std::move(*Result), Block, BlockInsts);
}
+// void emitDebugInfo(BCECmpBlock &&Comparison) {
+// LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+// << "': Found constant-cmp of " << Comparison.getCmp().SizeBits
+// << " bits including " << Comparison.getCmp()->Lhs.BaseId << " + "
+// << Comparison.getCmp().Lhs.Offset << "\n");
+
+// LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
+// << "': Found cmp of " << Comparison.getCmp().SizeBits
+// << " bits between " << Comparison.getCmp().Lhs.BaseId << " + "
+// << Comparison.Lhs.Offset << " and "
+// << Comparison.Rhs.BaseId << " + "
+// << Comparison.Rhs.Offset << "\n");
+// LLVM_DEBUG(dbgs() << "\n");
+// }
+
static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
BCECmpBlock &&Comparison) {
- // LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
- // << "': Found cmp of " << Comparison.SizeBits()
- // << " bits between " << Comparison.Lhs().BaseId << " + "
- // << Comparison.Lhs().Offset << " and "
- // << Comparison.Rhs().BaseId << " + "
- // << Comparison.Rhs().Offset << "\n");
- // LLVM_DEBUG(dbgs() << "\n");
+ // emitDebugInfo(Comparison);
Comparison.OrigOrder = Comparisons.size();
Comparisons.push_back(std::move(Comparison));
}
@@ -554,24 +582,12 @@ class BCECmpChain {
private:
PHINode &Phi_;
// The list of all blocks in the chain, grouped by contiguity.
+ // First all BCE comparisons then all BCE-Const comparisons.
std::vector<ContiguousBlocks> MergedBlocks_;
// The original entry block (before sorting);
BasicBlock *EntryBlock_;
};
-static bool areContiguous(const BCECmpBlock &First, const BCECmpBlock &Second) {
- bool HasContigLhs = First.Lhs()->BaseId == Second.Lhs()->BaseId &&
- First.Lhs()->Offset + First.SizeBits() / 8 == Second.Lhs()->Offset;
- bool HasContigRhs = true;
- auto FirstRhs = First.Rhs();
- auto SecondRhs = Second.Rhs();
- if (FirstRhs && SecondRhs)
- HasContigRhs = (*FirstRhs)->BaseId == (*SecondRhs)->BaseId &&
- (*FirstRhs)->Offset + First.SizeBits() / 8 == (*SecondRhs)->Offset;
-
- return HasContigLhs && HasContigRhs;
-}
-
static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
for (const BCECmpBlock &Block : Blocks)
@@ -579,7 +595,7 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
return MinOrigOrder;
}
-/// Given a chain of comparison blocks, groups the blocks into contiguous
+/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
/// ranges that can be merged together into a single comparison.
static std::vector<BCECmpChain::ContiguousBlocks>
mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
@@ -588,12 +604,12 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
// Sort to detect continuous offsets.
llvm::sort(Blocks,
[](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
- return LhsBlock < RhsBlock;
+ return *LhsBlock.getCmp() < *RhsBlock.getCmp();
});
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
for (BCECmpBlock &Block : Blocks) {
- if (!LastMergedBlock || !areContiguous(LastMergedBlock->back(), Block)) {
+ if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
MergedBlocks.emplace_back();
LastMergedBlock = &MergedBlocks.back();
} else {
@@ -613,26 +629,6 @@ mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
return MergedBlocks;
}
-// A valid comparison chain means that all comparisons are of the same kind (either all `BCECmp` or all `BCEConstCmp`).
-// Additionally if all comparisons are `BCEConstCmp` they all need to have the same type to build a valid LLVM constant array.
-// TODO: Could even build a memory chain of different types using seperate allocations
-bool isValidCmpChain(std::vector<BCECmpBlock> Comparisons) {
- BCECmpBlock* PrevCmp = nullptr;
- for (BCECmpBlock BceCmpBlock : Comparisons) {
- if (PrevCmp) {
- if (PrevCmp->isConstCmp() != BceCmpBlock.isConstCmp())
- return false;
- if (PrevCmp->isConstCmp()){
- if (PrevCmp->Lhs()->LoadI->getType() != BceCmpBlock.Lhs()->LoadI->getType())
- return false;
- }
- }
-
- PrevCmp = &BceCmpBlock;
- }
- return true;
-}
-
BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
AliasAnalysis &AA)
: Phi_(Phi) {
@@ -705,20 +701,28 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
}
enqueueBlock(Comparisons, std::move(*Comparison));
}
-
+
// It is possible we have no suitable comparison to merge.
if (Comparisons.empty()) {
LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
return;
}
- if(!isValidCmpChain(Comparisons)) {
- LLVM_DEBUG(dbgs() << "invalid comparison chain");
- return;
- }
-
EntryBlock_ = Comparisons[0].BB;
- MergedBlocks_ = mergeBlocks(std::move(Comparisons));
+
+ std::vector<BCECmpBlock> ConstComparisons, BceComparisons;
+ auto isConstCmp = [](BCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
+ // TODO: too many copies here
+ std::partition_copy(Comparisons.begin(), Comparisons.end(),
+ std::back_inserter(ConstComparisons),
+ std::back_inserter(BceComparisons),
+ isConstCmp);
+
+ auto MergedConstCmpBlocks = mergeBlocks(std::move(ConstComparisons));
+ auto MergedBCECmpBlocks = mergeBlocks(std::move(BceComparisons));
+
+ MergedBlocks_.insert(MergedBlocks_.end(),MergedBCECmpBlocks.begin(),MergedBCECmpBlocks.end());
+ MergedBlocks_.insert(MergedBlocks_.end(),MergedConstCmpBlocks.begin(),MergedConstCmpBlocks.end());
}
namespace {
@@ -786,34 +790,27 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
// Add the GEPs from the first BCECmpBlock.
Value *Lhs, *Rhs;
- // memcmp expects a 'size_t' argument and returns 'int'.
- unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
- unsigned IntBits = TLI.getIntSize();
- const unsigned TotalSizeBits = std::accumulate(
- Comparisons.begin(), Comparisons.end(), 0u,
- [](int Size, const BCECmpBlock &C) { return Size + C.SizeBits(); });
-
if (FirstCmp.Lhs()->GEP)
Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
else
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
// Build constant-array to compare to
- if (FirstCmp.isConstCmp()) {
- auto* ArrayType = ArrayType::get(FirstCmp.Lhs()->LoadI->getType(),TotalSizeBits / 8);
+ if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
+ auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
std::vector<Constant*> Constants;
for (const auto& BceBlock : Comparisons) {
- // safe since we checked before that second operand is constant-int
- Constants.emplace_back(*BceBlock.getConstant());
+ Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
}
auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
Builder.CreateStore(ArrayConstant,ArrayAlloca);
Rhs = ArrayAlloca;
- } else if ((*FirstCmp.Rhs())->GEP)
- Rhs = Builder.Insert((*FirstCmp.Rhs())->GEP->clone());
- else
- Rhs = (*FirstCmp.Rhs())->LoadI->getPointerOperand();
-
+ } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+ if (FirstBceCmp->Rhs.GEP)
+ Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
+ else
+ Rhs = FirstBceCmp->Rhs.LoadI->getPointerOperand();
+ }
Value *IsEqual = nullptr;
LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
<< BB->getName() << "\n");
@@ -831,17 +828,25 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
if (Comparisons.size() == 1) {
LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
// Use clone to keep the metadata
- Instruction *const LhsLoad = Builder.Insert((*FirstCmp.Lhs()).LoadI->clone());
+ Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs()->LoadI->clone());
LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
// There are no blocks to merge, just do the comparison.
- if (FirstCmp.isConstCmp())
- IsEqual = Builder.CreateICmpEQ(LhsLoad, *FirstCmp.getConstant());
- else {
- Instruction *const RhsLoad = Builder.Insert((*FirstCmp.Rhs())->LoadI->clone());
+ if (auto* ConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp()))
+ IsEqual = Builder.CreateICmpEQ(LhsLoad, ConstCmp->Const);
+ else if (const auto& BceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+ Instruction *const RhsLoad = Builder.Insert(BceCmp->Rhs.LoadI->clone());
RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
}
} else {
+ // memcmp expects a 'size_t' argument and returns 'int'.
+ unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+ unsigned IntBits = TLI.getIntSize();
+ const unsigned TotalSizeBits = std::accumulate(
+ Comparisons.begin(), Comparisons.end(), 0u,
+ [](int Size, const BCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+
+
// Create memcmp() == 0.
const auto &DL = Phi.getDataLayout();
Value *const MemCmpCall = emitMemCmp(
@@ -1153,6 +1158,10 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
DominatorTree *DT) {
LLVM_DEBUG(dbgs() << "MergeICmpsLegacyPass: " << F.getName() << "\n");
+
+ dbgs() << "after target\n";
+ dbgs() << TTI.enableMemCmpExpansion(F.hasOptSize(), true);
+
// We only try merging comparisons if the target wants to expand memcmp later.
// The rationale is to avoid turning small chains into memcmp calls.
if (!TTI.enableMemCmpExpansion(F.hasOptSize(), true))
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 92c1d187aa08f..24cbceae9173d 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -10,7 +10,7 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
; CHECK-NEXT: store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT: br label [[IF_END5:%.*]]
+; CHECK-NEXT: br label [[LAND_END5:%.*]]
; CHECK: land.end:
; CHECK-NEXT: ret i1 [[TMP1]]
;
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
new file mode 100644
index 0000000000000..150a0300de947
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -0,0 +1,71 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+%S = type { i32, i1, i1, i16, i32, i32, i32 }
+
+; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
+; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
+
+define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed(
+; This is the classic BCE comparison block
+; CHECK: "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; This is the new BCE to constant comparison block
+; CHECK: "entry+land.rhs+land.lhs.true8":
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = alloca [3 x i32], align 4
+; CHECK-NEXT: store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[ENTRY_LAND_RHS]] ], [ false, [[LAND_LHS_TRUE10:%.*]] ]
+; CHECK-NEXT: ret i1 [[TMP4]]
+;
+entry:
+ %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %0 = load i32, ptr %e, align 4
+ %cmp = icmp eq i32 %0, 255
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr %a, align 4
+ %2 = load i32, ptr %b, align 4
+ %cmp3 = icmp eq i32 %1, %2
+ br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4: ; preds = %land.lhs.true
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+ %3 = load i8, ptr %c, align 1
+ %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+ %4 = load i8, ptr %c5, align 1
+ %cmp7 = icmp eq i8 %3, %4
+ br i1 %cmp7, label %land.lhs.true8, label %land.end
+
+land.lhs.true8: ; preds = %land.lhs.true4
+ %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+ %5 = load i32, ptr %g, align 4
+ %cmp9 = icmp eq i32 %5, 100
+ br i1 %cmp9, label %land.lhs.true10, label %land.end
+
+land.lhs.true10: ; preds = %land.lhs.true8
+ %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %6 = load i8, ptr %b11, align 4
+ %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+ %7 = load i8, ptr %b13, align 4
+ %cmp15 = icmp eq i8 %6, %7
+ br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true10
+ %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+ %8 = load i32, ptr %f, align 4
+ %cmp16 = icmp eq i32 %8, 200
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true8, %land.lhs.true4, %land.lhs.true, %entry
+ %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true8 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+ ret i1 %9
+}
>From fdc482fecf42018596ee99f6b4e0b339d32bfbc9 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 26 Feb 2025 20:14:33 +0100
Subject: [PATCH 05/23] [MergeIcmps] Supports basic blocks using select insts
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 567 +++++++-----------
.../Transforms/MergeICmps/X86/const-cmp-bb.ll | 2 +-
.../MergeICmps/X86/many-const-cmp-select.ll | 69 +++
3 files changed, 300 insertions(+), 338 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index df00fff3194c2..4456fbfb9a60a 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -294,46 +294,88 @@ bool Comparison::operator<(const Comparison& Other) const {
return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
}
+// Represents multiple comparisons inside of a single basic block.
+// This happens if multiple basic blocks have previously been merged into a single using a select node.
+class IntraCmpChain {
+ std::vector<Comparison*> CmpChain;
+
+public:
+ IntraCmpChain(Comparison* C) : CmpChain{C} {}
+ IntraCmpChain combine(const IntraCmpChain OtherChain) {
+ CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+ return *this;
+ }
+ std::vector<Comparison*> getCmpChain() const {
+ return CmpChain;
+ }
+};
+
+
+// A basic block that contains one or more comparisons
+class MultBCECmpBlock {
+ public:
+ MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+ : BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
+
+ // // Returns true if each comparison in this basic block is being merged.
+ // // Necessary because otherwise would leave basic block in invalid state.
+ // bool hasAllCmpsMerged() const;
+
+ // Returns true if the block does other works besides comparison.
+ bool doesOtherWork() const;
+
+ std::vector<Comparison*> getCmps() {
+ return Cmps;
+ }
+
+ // // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
+ // // instructions in the block.
+ // bool canSplit(AliasAnalysis &AA) const;
+
+ // // Return true if this all the relevant instructions in the BCE-cmp-block can
+ // // be sunk below this instruction. By doing this, we know we can separate the
+ // // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
+ // // block.
+ // bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
+
+ // The basic block where this comparison happens.
+ BasicBlock *BB;
+ // Instructions relating to the BCECmp and branch.
+ InstructionSet BlockInsts;
+ // The block requires splitting.
+ bool RequireSplit = false;
+ // Original order of this block in the chain.
+ unsigned OrigOrder = 0;
+
+private:
+ std::vector<Comparison*> Cmps;
+};
-// A basic block with a comparison between two BCE atoms.
+// A basic block with single a comparison between two BCE atoms.
// The block might do extra work besides the atom comparison, in which case
// doesOtherWork() returns true. Under some conditions, the block can be
// split into the atom comparison part and the "other work" part
// (see canSplit()).
-class BCECmpBlock {
+class SingleBCECmpBlock {
public:
- BCECmpBlock(Comparison* Cmp, BasicBlock *BB, InstructionSet BlockInsts)
- : BB(BB), BlockInsts(std::move(BlockInsts)), Cmp(std::move(Cmp)) {}
+ SingleBCECmpBlock(MultBCECmpBlock M, unsigned i) {
+ BB = M.BB;
+ Cmp = M.getCmps()[i];
+ OrigOrder = M.OrigOrder;
+ }
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
const Comparison* getCmp() const {
return Cmp;
}
- // Returns true if the block does other works besides comparison.
- bool doesOtherWork() const;
-
- // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
- // instructions in the block.
- bool canSplit(AliasAnalysis &AA) const;
-
- // Return true if this all the relevant instructions in the BCE-cmp-block can
- // be sunk below this instruction. By doing this, we know we can separate the
- // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
- // block.
- bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
-
// We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
// instructions. Split the old block and move all non-BCE-cmp-insts into the
// new parent block.
- void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
+ // void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
// The basic block where this comparison happens.
BasicBlock *BB;
- // Instructions relating to the BCECmp and branch.
- InstructionSet BlockInsts;
- // The block requires splitting.
- bool RequireSplit = false;
// Original order of this block in the chain.
unsigned OrigOrder = 0;
@@ -341,56 +383,58 @@ class BCECmpBlock {
Comparison* Cmp;
};
-bool BCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
- AliasAnalysis &AA) const {
- // If this instruction may clobber the loads and is in middle of the BCE cmp
- // block instructions, then bail for now.
- if (Inst->mayWriteToMemory()) {
- auto MayClobber = [&](LoadInst *LI) {
- // If a potentially clobbering instruction comes before the load,
- // we can still safely sink the load.
- return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
- isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
- };
- auto [Lhs,Rhs] = Cmp->getLoads();
- if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
- return false;
- }
- // Make sure this instruction does not use any of the BCE cmp block
- // instructions as operand.
- return llvm::none_of(Inst->operands(), [&](const Value *Op) {
- const Instruction *OpI = dyn_cast<Instruction>(Op);
- return OpI && BlockInsts.contains(OpI);
- });
-}
+// bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
+// AliasAnalysis &AA) const {
+// // If this instruction may clobber the loads and is in middle of the BCE cmp
+// // block instructions, then bail for now.
+// if (Inst->mayWriteToMemory()) {
+// auto MayClobber = [&](LoadInst *LI) {
+// // If a potentially clobbering instruction comes before the load,
+// // we can still safely sink the load.
+// return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
+// isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
+// };
+// for (auto* Cmp : Cmps.getCmpChain()) {
+// auto [Lhs,Rhs] = Cmp->getLoads();
+// if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
+// return false;
+// }
+// }
+// // Make sure this instruction does not use any of the BCE cmp block
+// // instructions as operand.
+// return llvm::none_of(Inst->operands(), [&](const Value *Op) {
+// const Instruction *OpI = dyn_cast<Instruction>(Op);
+// return OpI && BlockInsts.contains(OpI);
+// });
+// }
-void BCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
- llvm::SmallVector<Instruction *, 4> OtherInsts;
- for (Instruction &Inst : *BB) {
- if (BlockInsts.count(&Inst))
- continue;
- assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
- // This is a non-BCE-cmp-block instruction. And it can be separated
- // from the BCE-cmp-block instruction.
- OtherInsts.push_back(&Inst);
- }
+// void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
+// llvm::SmallVector<Instruction *, 4> OtherInsts;
+// for (Instruction &Inst : *BB) {
+// if (BlockInsts.count(&Inst))
+// continue;
+// assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+// // This is a non-BCE-cmp-block instruction. And it can be separated
+// // from the BCE-cmp-block instruction.
+// OtherInsts.push_back(&Inst);
+// }
- // Do the actual spliting.
- for (Instruction *Inst : reverse(OtherInsts))
- Inst->moveBeforePreserving(*NewParent, NewParent->begin());
-}
+// // Do the actual splitting.
+// for (Instruction *Inst : reverse(OtherInsts))
+// Inst->moveBeforePreserving(*NewParent, NewParent->begin());
+// }
-bool BCECmpBlock::canSplit(AliasAnalysis &AA) const {
- for (Instruction &Inst : *BB) {
- if (!BlockInsts.count(&Inst)) {
- if (!canSinkBCECmpInst(&Inst, AA))
- return false;
- }
- }
- return true;
-}
+// bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
+// for (Instruction &Inst : *BB) {
+// if (!BlockInsts.count(&Inst)) {
+// if (!canSinkBCECmpInst(&Inst, AA))
+// return false;
+// }
+// }
+// return true;
+// }
-bool BCECmpBlock::doesOtherWork() const {
+bool MultBCECmpBlock::doesOtherWork() const {
// TODO(courbet): Can we allow some other things ? This is very conservative.
// We might be able to get away with anything does not have any side
// effects outside of the basic block.
@@ -402,25 +446,11 @@ bool BCECmpBlock::doesOtherWork() const {
return false;
}
-class IntraCmpChain {
- std::vector<Comparison*> CmpChain;
-
-public:
- IntraCmpChain(Comparison* C) : CmpChain{C} {}
- IntraCmpChain concat(const IntraCmpChain OtherChain) {
- CmpChain.insert(CmpChain.end(),OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
- return *this;
- }
- std::vector<Comparison*> getCmpChain() const {
- return CmpChain;
- }
-};
-
// Visit the given comparison. If this is a comparison between two valid
// BCE atoms, or between a BCE atom and a constant, returns the comparison.
std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
const ICmpInst::Predicate ExpectedPredicate,
- BaseIdentifier &BaseId) {
+ BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
// The comparison can only be used once:
// - For intermediate blocks, as a branch condition.
// - For the final block, as an incoming value for the Phi.
@@ -456,44 +486,46 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
}
// Chain of comparisons inside a single basic block connected using `select` nodes.
-// std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&);
+std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&, InstructionSet*);
-// std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
-// ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId) {
-// if (!SelectI->hasOneUse()) {
-// LLVM_DEBUG(dbgs() << "select has several uses\n");
-// return std::nullopt;
-// }
-// auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
-// auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
-// auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
+ ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId, InstructionSet *BlockInsts) {
+ if (!SelectI->hasOneUse()) {
+ LLVM_DEBUG(dbgs() << "select has several uses\n");
+ return std::nullopt;
+ }
+ auto* Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+ auto* Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
+ auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+ auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-// if (!Cmp1 || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
-// return std::nullopt;
+ if (!(Cmp1 || Sel1) || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
+ return std::nullopt;
-// auto Lhs = visitComparison(Cmp1,ExpectedPredicate,BaseId);
-// if (!Lhs)
-// return std::nullopt;
-// auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId);
-// if (!Rhs)
-// return std::nullopt;
+ auto Lhs = visitComparison(SelectI->getOperand(0),ExpectedPredicate,BaseId,BlockInsts);
+ if (!Lhs)
+ return std::nullopt;
+ auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId,BlockInsts);
+ if (!Rhs)
+ return std::nullopt;
-// return Lhs->concat(*Rhs);
-// }
+ BlockInsts->insert(SelectI);
+ return Lhs->combine(std::move(*Rhs));
+}
-// std::optional<IntraCmpChain> visitComparison(Value *Cond,
-// ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId) {
-// if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
-// return visitICmp(CmpI, ExpectedPredicate, BaseId);
-// if (auto *SelectI = dyn_cast<SelectInst>(Cond))
-// return visitSelect(SelectI, ExpectedPredicate, BaseId);
+std::optional<IntraCmpChain> visitComparison(Value *Cond,
+ ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+ if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
+ return visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+ if (auto *SelectI = dyn_cast<SelectInst>(Cond))
+ return visitSelect(SelectI, ExpectedPredicate, BaseId, BlockInsts);
-// return std::nullopt;
-// }
+ return std::nullopt;
+}
// Visit the given comparison block. If this is a comparison between two valid
// BCE atoms, returns the comparison.
-std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
+std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
BasicBlock *const Block,
const BasicBlock *const PhiBlock,
BaseIdentifier &BaseId) {
@@ -527,18 +559,19 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
FalseBlock == PhiBlock ? ICmpInst::ICMP_EQ : ICmpInst::ICMP_NE;
}
- auto* CmpI = dyn_cast<ICmpInst>(Cond);
- if (!CmpI)
- return std::nullopt;
- LLVM_DEBUG(dbgs() << "icmp\n");
-
- std::optional<Comparison*> Result = visitICmp(CmpI, ExpectedPredicate, BaseId);
- if (!Result)
+ InstructionSet BlockInsts;
+ std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
+ if (!Result) {
+ dbgs() << "invalid result\n";
return std::nullopt;
+ }
- InstructionSet BlockInsts((*Result)->getInsts());
+ for (auto* Cmp : Result->getCmpChain()) {
+ auto CmpInsts = Cmp->getInsts();
+ BlockInsts.insert(CmpInsts.begin(), CmpInsts.end());
+ }
BlockInsts.insert(BranchI);
- return BCECmpBlock(std::move(*Result), Block, BlockInsts);
+ return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
}
// void emitDebugInfo(BCECmpBlock &&Comparison) {
@@ -556,17 +589,18 @@ std::optional<BCECmpBlock> visitCmpBlock(Value *const Val,
// LLVM_DEBUG(dbgs() << "\n");
// }
-static inline void enqueueBlock(std::vector<BCECmpBlock> &Comparisons,
- BCECmpBlock &&Comparison) {
+static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
+ MultBCECmpBlock &&CmpBlock) {
// emitDebugInfo(Comparison);
- Comparison.OrigOrder = Comparisons.size();
- Comparisons.push_back(std::move(Comparison));
+ CmpBlock.OrigOrder = Comparisons.size();
+ for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++)
+ Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i));
}
// A chain of comparisons.
class BCECmpChain {
public:
- using ContiguousBlocks = std::vector<BCECmpBlock>;
+ using ContiguousBlocks = std::vector<SingleBCECmpBlock>;
BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
AliasAnalysis &AA);
@@ -582,7 +616,7 @@ class BCECmpChain {
private:
PHINode &Phi_;
// The list of all blocks in the chain, grouped by contiguity.
- // First all BCE comparisons then all BCE-Const comparisons.
+ // First all BCE comparisons followed by all BCE-Const comparisons.
std::vector<ContiguousBlocks> MergedBlocks_;
// The original entry block (before sorting);
BasicBlock *EntryBlock_;
@@ -590,7 +624,7 @@ class BCECmpChain {
static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
- for (const BCECmpBlock &Block : Blocks)
+ for (const SingleBCECmpBlock &Block : Blocks)
MinOrigOrder = std::min(MinOrigOrder, Block.OrigOrder);
return MinOrigOrder;
}
@@ -598,17 +632,17 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
/// ranges that can be merged together into a single comparison.
static std::vector<BCECmpChain::ContiguousBlocks>
-mergeBlocks(std::vector<BCECmpBlock> &&Blocks) {
+mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
std::vector<BCECmpChain::ContiguousBlocks> MergedBlocks;
// Sort to detect continuous offsets.
llvm::sort(Blocks,
- [](const BCECmpBlock &LhsBlock, const BCECmpBlock &RhsBlock) {
+ [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
return *LhsBlock.getCmp() < *RhsBlock.getCmp();
});
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
- for (BCECmpBlock &Block : Blocks) {
+ for (SingleBCECmpBlock &Block : Blocks) {
if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
MergedBlocks.emplace_back();
LastMergedBlock = &MergedBlocks.back();
@@ -634,46 +668,46 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
: Phi_(Phi) {
assert(!Blocks.empty() && "a chain should have at least one block");
// Now look inside blocks to check for BCE comparisons.
- std::vector<BCECmpBlock> Comparisons;
+ std::vector<SingleBCECmpBlock> Comparisons;
BaseIdentifier BaseId;
for (BasicBlock *const Block : Blocks) {
assert(Block && "invalid block");
- std::optional<BCECmpBlock> Comparison = visitCmpBlock(
+ std::optional<MultBCECmpBlock> CmpBlock = visitCmpBlock(
Phi.getIncomingValueForBlock(Block), Block, Phi.getParent(), BaseId);
- if (!Comparison) {
+ if (!CmpBlock) {
LLVM_DEBUG(dbgs() << "chain with invalid BCECmpBlock, no merge.\n");
return;
}
- if (Comparison->doesOtherWork()) {
- LLVM_DEBUG(dbgs() << "block '" << Comparison->BB->getName()
+ if (CmpBlock->doesOtherWork()) {
+ LLVM_DEBUG(dbgs() << "block '" << CmpBlock->BB->getName()
<< "' does extra work besides compare\n");
- if (Comparisons.empty()) {
- // This is the initial block in the chain, in case this block does other
- // work, we can try to split the block and move the irrelevant
- // instructions to the predecessor.
- //
- // If this is not the initial block in the chain, splitting it wont
- // work.
- //
- // As once split, there will still be instructions before the BCE cmp
- // instructions that do other work in program order, i.e. within the
- // chain before sorting. Unless we can abort the chain at this point
- // and start anew.
- //
- // NOTE: we only handle blocks a with single predecessor for now.
- if (Comparison->canSplit(AA)) {
- LLVM_DEBUG(dbgs()
- << "Split initial block '" << Comparison->BB->getName()
- << "' that does extra work besides compare\n");
- Comparison->RequireSplit = true;
- enqueueBlock(Comparisons, std::move(*Comparison));
- } else {
- LLVM_DEBUG(dbgs()
- << "ignoring initial block '" << Comparison->BB->getName()
- << "' that does extra work besides compare\n");
- }
- continue;
- }
+ // if (Comparisons.empty()) {
+ // // This is the initial block in the chain, in case this block does other
+ // // work, we can try to split the block and move the irrelevant
+ // // instructions to the predecessor.
+ // //
+ // // If this is not the initial block in the chain, splitting it wont
+ // // work.
+ // //
+ // // As once split, there will still be instructions before the BCE cmp
+ // // instructions that do other work in program order, i.e. within the
+ // // chain before sorting. Unless we can abort the chain at this point
+ // // and start anew.
+ // //
+ // // NOTE: we only handle blocks a with single predecessor for now.
+ // if (Comparison->canSplit(AA)) {
+ // LLVM_DEBUG(dbgs()
+ // << "Split initial block '" << Comparison->BB->getName()
+ // << "' that does extra work besides compare\n");
+ // Comparison->RequireSplit = true;
+ // enqueueBlock(Comparisons, std::move(*Comparison));
+ // } else {
+ // LLVM_DEBUG(dbgs()
+ // << "ignoring initial block '" << Comparison->BB->getName()
+ // << "' that does extra work besides compare\n");
+ // }
+ // continue;
+ // }
// TODO(courbet): Right now we abort the whole chain. We could be
// merging only the blocks that don't do other work and resume the
// chain from there. For example:
@@ -699,7 +733,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
// We could still merge bb1 and bb2 though.
return;
}
- enqueueBlock(Comparisons, std::move(*Comparison));
+ enqueueBlock(Comparisons, std::move(*CmpBlock));
}
// It is possible we have no suitable comparison to merge.
@@ -710,8 +744,11 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
EntryBlock_ = Comparisons[0].BB;
- std::vector<BCECmpBlock> ConstComparisons, BceComparisons;
- auto isConstCmp = [](BCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
+ // TODO: check for contiguous comparisons across all blocks and if all cmps in a
+ // bb are part of contiguous then split that block inato multiple
+
+ std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
+ auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
// TODO: too many copies here
std::partition_copy(Comparisons.begin(), Comparisons.end(),
std::back_inserter(ConstComparisons),
@@ -734,18 +771,18 @@ class MergedBlockName {
SmallString<16> Scratch;
public:
- explicit MergedBlockName(ArrayRef<BCECmpBlock> Comparisons)
+ explicit MergedBlockName(ArrayRef<SingleBCECmpBlock> Comparisons)
: Name(makeName(Comparisons)) {}
const StringRef Name;
private:
- StringRef makeName(ArrayRef<BCECmpBlock> Comparisons) {
+ StringRef makeName(ArrayRef<SingleBCECmpBlock> Comparisons) {
assert(!Comparisons.empty() && "no basic block");
// Fast path: only one block, or no names at all.
if (Comparisons.size() == 1)
return Comparisons[0].BB->getName();
const int size = std::accumulate(Comparisons.begin(), Comparisons.end(), 0,
- [](int i, const BCECmpBlock &Cmp) {
+ [](int i, const SingleBCECmpBlock &Cmp) {
return i + Cmp.BB->getName().size();
});
if (size == 0)
@@ -773,14 +810,14 @@ class MergedBlockName {
} // namespace
// Merges the given contiguous comparison blocks into one memcmp block.
-static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
+static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
BasicBlock *const InsertBefore,
BasicBlock *const NextCmpBlock,
PHINode &Phi, const TargetLibraryInfo &TLI,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
assert(!Comparisons.empty() && "merging zero comparisons");
LLVMContext &Context = NextCmpBlock->getContext();
- const BCECmpBlock &FirstCmp = Comparisons[0];
+ const SingleBCECmpBlock &FirstCmp = Comparisons[0];
// Create a new cmp block before next cmp block.
BasicBlock *const BB =
@@ -796,15 +833,17 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
// Build constant-array to compare to
if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
- auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
- auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
- std::vector<Constant*> Constants;
- for (const auto& BceBlock : Comparisons) {
- Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+ if (Comparisons.size() > 1) {
+ auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
+ auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
+ std::vector<Constant*> Constants;
+ for (const auto& BceBlock : Comparisons) {
+ Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+ }
+ auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
+ Builder.CreateStore(ArrayConstant,ArrayAlloca);
+ Rhs = ArrayAlloca;
}
- auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
- Builder.CreateStore(ArrayConstant,ArrayAlloca);
- Rhs = ArrayAlloca;
} else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
if (FirstBceCmp->Rhs.GEP)
Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
@@ -818,12 +857,12 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
// If there is one block that requires splitting, we do it now, i.e.
// just before we know we will collapse the chain. The instructions
// can be executed before any of the instructions in the chain.
- const auto ToSplit = llvm::find_if(
- Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
- if (ToSplit != Comparisons.end()) {
- LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
- ToSplit->split(BB, AA);
- }
+ // const auto ToSplit = llvm::find_if(
+ // Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
+ // if (ToSplit != Comparisons.end()) {
+ // LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
+ // ToSplit->split(BB, AA);
+ // }
if (Comparisons.size() == 1) {
LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
@@ -844,7 +883,7 @@ static BasicBlock *mergeComparisons(ArrayRef<BCECmpBlock> Comparisons,
unsigned IntBits = TLI.getIntSize();
const unsigned TotalSizeBits = std::accumulate(
Comparisons.begin(), Comparisons.end(), 0u,
- [](int Size, const BCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+ [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
// Create memcmp() == 0.
@@ -916,7 +955,11 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
// Delete merged blocks. This also removes incoming values in phi.
SmallVector<BasicBlock *, 16> DeadBlocks;
for (const auto &Blocks : MergedBlocks_) {
- for (const BCECmpBlock &Block : Blocks) {
+ for (const SingleBCECmpBlock &Block : Blocks) {
+ // Many single blocks can refer to the same multblock coming from an select instruction
+ // TODO: preferrably use a set instead
+ if (llvm::is_contained(DeadBlocks, Block.BB))
+ continue;
LLVM_DEBUG(dbgs() << "Deleting merged block " << Block.BB->getName()
<< "\n");
DeadBlocks.push_back(Block.BB);
@@ -1033,135 +1076,11 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
return CmpChain.simplify(TLI, AA, DTU);
}
-void removeUnusedOperands(SmallVector<Value *, 8> toCheck) {
- while (!toCheck.empty()) {
- Value *V = toCheck.pop_back_val();
-
- // Only process instructions (skip constants, globals, etc.)
- if (Instruction *OpI = dyn_cast<Instruction>(V)) {
- if (OpI->use_empty()) {
- toCheck.append(OpI->operands().begin(),OpI->operands().end());
- OpI->eraseFromParent();
- }
- }
- }
-}
-
-struct CommonCmp {
- ICmpInst* CmpI;
- unsigned Offset;
-};
-
-void mergeAdjacentComparisons(SelectInst* SelectI,Value* Base,std::vector<CommonCmp> AdjacentMem,const TargetLibraryInfo &TLI) {
- IRBuilder<> Builder(SelectI);
- auto* M = SelectI->getModule();
- LLVMContext &Context = SelectI->getContext();
- const auto &DL = SelectI->getDataLayout();
-
- auto First = AdjacentMem[0];
- auto *CmpType = First.CmpI->getOperand(0)->getType();
- auto* ArrayType = ArrayType::get(CmpType,AdjacentMem.size());
- auto* ArraySize = ConstantInt::get(Type::getInt64Ty(Context), DL.getTypeAllocSize(ArrayType));
- // TODO: check for alignment
- // Builder.CreateLifetimeStart(ArrayAlloca,ArraySize);
-
- std::vector<Constant*> Constants;
- for (const auto& CI : AdjacentMem) {
- // safe since we checked before that second operand is constantint
- Constants.emplace_back(cast<Constant>(CI.CmpI->getOperand(1)));
- }
- auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
-M->getOrInsertGlobal("globalKey", ArrayType);
- GlobalVariable* gVar = M->getNamedGlobal("globalKey");
- gVar->setLinkage(GlobalValue::PrivateLinkage);
- gVar->setInitializer(ArrayConstant);
- gVar->setConstant(true);
- // Builder.CreateStore(ArrayConstant,ArrayAlloca);
-
- // TODO: adjust base-ptr to point to start of load-offset
- // TODO: also have to handle !=
- Value *const MemCmpCall = emitMemCmp(
- Base, gVar,
- ArraySize,
- Builder, DL, &TLI);
- // Builder.CreateLifetimeEnd(ArrayAlloca,ArraySize);
- auto *MergedCmp = new ICmpInst(ICmpInst::ICMP_EQ,MemCmpCall, ConstantInt::get(Type::getInt32Ty(Context), 0));
-
- BasicBlock::iterator ii(SelectI);
- SmallVector<Value *, 8> deadOperands(SelectI->operands());
- ReplaceInstWithInst(SelectI->getParent(),ii,MergedCmp);
- removeUnusedOperands(deadOperands);
-
- // dbgs() << "DONE merging";
-}
-
-// Combines Icmp instructions if they operate on adjacent memory
-// TODO: check that base address' memory isn't modified between comparisons
-bool tryMergeIcmps(SelectInst* SelectI, Value* Base, std::vector<CommonCmp> &Icmps,const TargetLibraryInfo &TLI) {
- assert(!Icmps.empty() && "if entry exists then has at least one cmp");
- bool hasMerged = false;
-
- std::vector<CommonCmp> AdjacentMem{Icmps[0]};
- auto Prev = Icmps[0];
- for (auto& Cmp : llvm::drop_begin(Icmps)) {
- if (Cmp.Offset == (Prev.Offset + 1)) {
- AdjacentMem.emplace_back(Cmp);
- } else if (AdjacentMem.size() > 1) {
- mergeAdjacentComparisons(SelectI,Base, AdjacentMem,TLI);
- hasMerged = true;
- AdjacentMem.clear();
- AdjacentMem.emplace_back(Cmp);
- }
- Prev = Cmp;
- }
-
- if (AdjacentMem.size() > 1) {
- mergeAdjacentComparisons(SelectI, Base, AdjacentMem,TLI);
- hasMerged = true;
- }
-
- return hasMerged;
-}
-
-// Given an operand from a load, return the original base pointer and
-// if operand is GEP also it's offset from base pointer
-// but only if offset is known at compile time
-std::tuple<Value*, std::optional<unsigned>> findPtrAndOffset(Value* V, unsigned Offset) {
- if (const auto& GepI = dyn_cast<GetElementPtrInst>(V)){
- if (const auto& Index = dyn_cast<ConstantInt>(GepI->getOperand(1))) {
- if (Index->getBitWidth() <= 64) {
- return findPtrAndOffset(GepI->getPointerOperand(), Offset + Index->getZExtValue());
- }
- }
- return {V,std::nullopt};
- }
-
- return {V,Offset};
-}
-
-
-std::optional<Value*> constantCmp(ICmpInst* CmpI,std::vector<CommonCmp>* cmps) {
- auto const& LoadI = dyn_cast<LoadInst>(CmpI->getOperand(0));
- auto const& ConstantI = dyn_cast<ConstantInt>(CmpI->getOperand(1));
- if (!LoadI || !ConstantI)
- return std::nullopt;
-
- auto [BasePtr, Offset] = findPtrAndOffset(LoadI->getOperand(0),0);
- if (Offset)
- cmps->emplace_back(CommonCmp {CmpI, *Offset});
-
- return BasePtr;
-}
-
static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
const TargetTransformInfo &TTI, AliasAnalysis &AA,
DominatorTree *DT) {
LLVM_DEBUG(dbgs() << "MergeICmpsLegacyPass: " << F.getName() << "\n");
-
- dbgs() << "after target\n";
- dbgs() << TTI.enableMemCmpExpansion(F.hasOptSize(), true);
-
// We only try merging comparisons if the target wants to expand memcmp later.
// The rationale is to avoid turning small chains into memcmp calls.
if (!TTI.enableMemCmpExpansion(F.hasOptSize(), true))
@@ -1182,32 +1101,6 @@ static bool runImpl(Function &F, const TargetLibraryInfo &TLI,
MadeChange |= processPhi(*Phi, TLI, AA, DTU);
}
- // Try to merge remaining select nodes that haven't been merged from phi-node merging
- // for (BasicBlock &BB : F) {
- // // from bottom up to find the root result of all comparisons
- // for (Instruction &I : llvm::reverse(BB)) {
- // if (auto const &SelectI = dyn_cast<SelectInst>(&I)) {
- // auto const& Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
- // auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
- // auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
-
- // if (!Cmp1 || !Cmp2 ||!ConstantI ||!ConstantI->isZeroValue())
- // continue;
-
- // Value* BasePtr;
- // std::vector<CommonCmp> cmps;
- // if (auto bp = constantCmp(Cmp1,&cmps))
- // BasePtr = *bp;
- // if (auto bp = constantCmp(Cmp2,&cmps)) {
- // if (BasePtr != bp) continue;
- // }
-
- // MadeChange |= tryMergeIcmps(SelectI,BasePtr,cmps,TLI);
- // break;
- // }
- // }
- // }
-
return MadeChange;
}
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 24cbceae9173d..f05422fd9aea1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -7,7 +7,7 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
; CHECK-LABEL: @test(
; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
; CHECK-NEXT: [[TMP0:%.*]] = alloca [3 x i8], align 1
-; CHECK-NEXT: store [3 x i8] c"\FF\C8\BE", ptr [[O1:%.*]], align 1
+; CHECK-NEXT: store [3 x i8] c"\FF\C8\BE", ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
new file mode 100644
index 0000000000000..4a91947b0086b
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -0,0 +1,69 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+
+; Can merge contiguous const-comparison basic blocks that include a select statement.
+
+define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
+; CHECK-LABEL: @is_all_ones_many(
+; CHECK-NEXT: "entry+entry+entry+land.lhs.true11":
+; CHECK-NEXT: [[TMP0:%.*]] = alloca [4 x i8], align 1
+; CHECK-NEXT: store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
+; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
+; CHECK: "land.lhs.true16+land.lhs.true21":
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
+; CHECK-NEXT: store [2 x i8] c"\02\07", ptr [[TMP3]], align 1
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
+; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
+; CHECK: land.rhs1:
+; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
+; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
+; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[TMP8:%.*]] = phi i1 [ [[TMP7]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret i1 [[TMP8]]
+;
+entry:
+ %0 = load i8, ptr %p, align 1
+ %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 2
+ %2 = load i8, ptr %arrayidx2, align 1
+ %cmp = icmp eq i8 %0, -1
+ %cmp5 = icmp eq i8 %1, -56
+ %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+ %cmp9 = icmp eq i8 %2, -66
+ %or.cond28 = select i1 %or.cond, i1 %cmp9, i1 false
+ br i1 %or.cond28, label %land.lhs.true11, label %land.end
+
+land.lhs.true11: ; preds = %entry
+ %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 3
+ %3 = load i8, ptr %arrayidx12, align 1
+ %cmp14 = icmp eq i8 %3, 1
+ br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16: ; preds = %land.lhs.true11
+ %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+ %4 = load i8, ptr %arrayidx17, align 1
+ %cmp19 = icmp eq i8 %4, 2
+ br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21: ; preds = %land.lhs.true16
+ %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 7
+ %5 = load i8, ptr %arrayidx22, align 1
+ %cmp24 = icmp eq i8 %5, 7
+ br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true21
+ %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 9
+ %6 = load i8, ptr %arrayidx26, align 1
+ %cmp28 = icmp eq i8 %6, 9
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+ %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+ ret i1 %7
+}
>From 95ccfccf83ee7631de38e01d24754987edf6c86d Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 26 Feb 2025 20:52:10 +0100
Subject: [PATCH 06/23] [MergeIcmps] Only print merged bb-name once
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 41 +++++++++++--------
.../MergeICmps/X86/many-const-cmp-select.ll | 2 +-
2 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 4456fbfb9a60a..f60c3aabd7547 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -43,6 +43,7 @@
#include "llvm/Transforms/Scalar/MergeICmps.h"
#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/UniqueVector.h"
#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"
@@ -358,17 +359,18 @@ class MultBCECmpBlock {
// (see canSplit()).
class SingleBCECmpBlock {
public:
- SingleBCECmpBlock(MultBCECmpBlock M, unsigned i) {
- BB = M.BB;
- Cmp = M.getCmps()[i];
- OrigOrder = M.OrigOrder;
- }
+ SingleBCECmpBlock(MultBCECmpBlock M, unsigned I)
+ : BB(M.BB), OrigOrder(M.OrigOrder), Cmp(M.getCmps()[I]) {}
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
const Comparison* getCmp() const {
return Cmp;
}
+ bool operator<(const SingleBCECmpBlock &O) const {
+ return *Cmp < *O.Cmp;
+ }
+
// We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
// instructions. Split the old block and move all non-BCE-cmp-insts into the
// new parent block.
@@ -638,7 +640,7 @@ mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
// Sort to detect continuous offsets.
llvm::sort(Blocks,
[](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
- return *LhsBlock.getCmp() < *RhsBlock.getCmp();
+ return LhsBlock < RhsBlock;
});
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
@@ -744,9 +746,6 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
EntryBlock_ = Comparisons[0].BB;
- // TODO: check for contiguous comparisons across all blocks and if all cmps in a
- // bb are part of contiguous then split that block inato multiple
-
std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
// TODO: too many copies here
@@ -781,9 +780,14 @@ class MergedBlockName {
// Fast path: only one block, or no names at all.
if (Comparisons.size() == 1)
return Comparisons[0].BB->getName();
- const int size = std::accumulate(Comparisons.begin(), Comparisons.end(), 0,
- [](int i, const SingleBCECmpBlock &Cmp) {
- return i + Cmp.BB->getName().size();
+ // Since multiple comparisons can come from the same basic block
+ // (when using select inst) don't want to repeat same name twice
+ UniqueVector<StringRef> UniqueNames;
+ for (const auto& B : Comparisons)
+ UniqueNames.insert(B.BB->getName());
+ const int size = std::accumulate(UniqueNames.begin(), UniqueNames.end(), 0,
+ [](int i, const StringRef &Name) {
+ return i + Name.size();
});
if (size == 0)
return StringRef("", 0);
@@ -792,16 +796,17 @@ class MergedBlockName {
Scratch.clear();
// We'll have `size` bytes for name and `Comparisons.size() - 1` bytes for
// separators.
- Scratch.reserve(size + Comparisons.size() - 1);
+ Scratch.reserve(size + UniqueNames.size() - 1);
const auto append = [this](StringRef str) {
Scratch.append(str.begin(), str.end());
};
- append(Comparisons[0].BB->getName());
- for (int I = 1, E = Comparisons.size(); I < E; ++I) {
- const BasicBlock *const BB = Comparisons[I].BB;
- if (!BB->getName().empty()) {
+ // UniqueVector's index starts at 1
+ append(UniqueNames[1]);
+ for (int I = 2, E = UniqueNames.size(); I <= E; ++I) {
+ StringRef BBName = UniqueNames[I];
+ if (!BBName.empty()) {
append("+");
- append(BB->getName());
+ append(BBName);
}
}
return Scratch.str();
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index 4a91947b0086b..ce8de31134e0f 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -4,7 +4,7 @@
define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
; CHECK-LABEL: @is_all_ones_many(
-; CHECK-NEXT: "entry+entry+entry+land.lhs.true11":
+; CHECK-NEXT: "entry+land.lhs.true11":
; CHECK-NEXT: [[TMP0:%.*]] = alloca [4 x i8], align 1
; CHECK-NEXT: store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
>From 9c2c3869a9941dc3e27ebc4aad919a3e52e7317e Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 28 Feb 2025 17:10:34 +0100
Subject: [PATCH 07/23] [MergeIcmps] Added tests for merging
const-/bce-comparisons using select blocks
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 11 +-
.../MergeICmps/X86/mixed-cmp-bb-select.ll | 67 +++++
.../MergeICmps/X86/mixed-comparisons.ll | 2 +-
.../X86/not-split-unmerged-select.ll | 204 ++++++++++++++++
.../MergeICmps/X86/partial-select-merge.ll | 230 ++++++++++++++++++
.../Transforms/MergeICmps/X86/single-block.ll | 23 ++
6 files changed, 533 insertions(+), 4 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
create mode 100644 llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
create mode 100644 llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
create mode 100644 llvm/test/Transforms/MergeICmps/X86/single-block.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index f60c3aabd7547..779e9325a311a 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -1011,6 +1011,13 @@ std::vector<BasicBlock *> getOrderedBlocks(PHINode &Phi,
return Blocks;
}
+template<typename T>
+bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
+ auto* IncomingValue = Phi.getIncomingValue(I);
+ return !isa<T>(IncomingValue) ||
+ cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
+}
+
bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
DomTreeUpdater &DTU) {
LLVM_DEBUG(dbgs() << "processPhi()\n");
@@ -1042,9 +1049,7 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
LLVM_DEBUG(dbgs() << "skip: several non-constant values\n");
return false;
}
- if (!isa<ICmpInst>(Phi.getIncomingValue(I)) ||
- cast<ICmpInst>(Phi.getIncomingValue(I))->getParent() !=
- Phi.getIncomingBlock(I)) {
+ if (isInvalidPrevBlock<ICmpInst>(Phi,I) && isInvalidPrevBlock<SelectInst>(Phi,I)) {
// Non-constant incoming value is not from a cmp instruction or not
// produced by the last block. We could end up processing the value
// producing block more than once.
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
new file mode 100644
index 0000000000000..ad3326cc4df90
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -0,0 +1,67 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
+
+%S = type { i32, i8, i8, i16, i32, i32, i32 }
+
+define dso_local noundef zeroext i1 @cmp_mixed(
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed(
+; CHECK: "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CHECK: "entry+land.rhs+land.lhs.true4":
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = alloca [3 x i32], align 4
+; CHECK-NEXT: store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[ENTRY_LAND_RHS]] ], [ false, [[LAND_LHS_TRUE10:%.*]] ]
+; CHECK-NEXT: ret i1 [[TMP4]]
+;
+entry:
+ %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %0 = load i32, ptr %e, align 4
+ %cmp = icmp eq i32 %0, 255
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr %a, align 4
+ %2 = load i32, ptr %b, align 4
+ %cmp3 = icmp eq i32 %1, %2
+ br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4: ; preds = %land.lhs.true
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+ %3 = load i8, ptr %c, align 1
+ %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+ %4 = load i8, ptr %c5, align 1
+ %cmp7 = icmp eq i8 %3, %4
+ %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+ %5 = load i32, ptr %g, align 4
+ %cmp9 = icmp eq i32 %5, 100
+ %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+ br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10: ; preds = %land.lhs.true4
+ %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %6 = load i8, ptr %b11, align 4
+ %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+ %7 = load i8, ptr %b13, align 4
+ %cmp15 = icmp eq i8 %6, %7
+ br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true10
+ %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+ %8 = load i32, ptr %f, align 4
+ %cmp16 = icmp eq i32 %8, 200
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+ %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+ ret i1 %9
+}
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 150a0300de947..0470a24b0ce6c 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -1,7 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
-%S = type { i32, i1, i1, i16, i32, i32, i32 }
+%S = type { i32, i8, i8, i16, i32, i32, i32 }
; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
new file mode 100644
index 0000000000000..c160647271fb7
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -0,0 +1,204 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
+
+; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
+
+define dso_local noundef zeroext i1 @unmergable_select(
+ ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @unmergable_select(
+; REG: entry:
+; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
+; REG: land.lhs.true11:
+; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; REG: land.lhs.true16:
+; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; REG: land.lhs.true21:
+; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG: land.rhs:
+; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
+; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; REG-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; REG-NEXT: br label [[LAND_END]]
+; REG: land.end:
+; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
+; REG-NEXT: ret i1 [[RES]]
+;
+entry:
+ %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
+ %0 = load i8, ptr %arrayidx, align 1
+ %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+ %2 = load i8, ptr %arrayidx2, align 1
+ %cmp = icmp eq i8 %0, -1
+ %cmp5 = icmp eq i8 %1, -56
+ %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+ %cmp9 = icmp eq i8 %2, -66
+ %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+ br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11: ; preds = %entry
+ %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 12
+ %3 = load i8, ptr %arrayidx12, align 1
+ %cmp14 = icmp eq i8 %3, 1
+ br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16: ; preds = %land.lhs.true11
+ %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+ %4 = load i8, ptr %arrayidx17, align 1
+ %cmp19 = icmp eq i8 %4, 2
+ br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21: ; preds = %land.lhs.true16
+ %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+ %5 = load i8, ptr %arrayidx22, align 1
+ %cmp24 = icmp eq i8 %5, 7
+ br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true21
+ %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 14
+ %6 = load i8, ptr %arrayidx26, align 1
+ %cmp28 = icmp eq i8 %6, 9
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+ %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+ ret i1 %7
+}
+
+; p[12] and p[13] mergable, select blocks are split even though they aren't merged. simplifycfg merges them back.
+; NOTE: Ideally wouldn't always split and thus not rely on simplifycfg.
+
+define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @partial_merge_not_select(
+; REG: entry5:
+; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
+; REG: entry4:
+; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END]]
+; REG: entry3:
+; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
+; REG: "land.lhs.true11+land.rhs":
+; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; REG-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT: store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; REG: land.lhs.true162:
+; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; REG: land.lhs.true211:
+; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT: br label [[LAND_END]]
+; REG: land.end:
+; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, [[ENTRY_3]] ], [ false, [[ENTRY_4]] ], [ false, %entry5 ]
+; REG-NEXT: ret i1 [[RES]]
+;
+; CFG-LABEL: @partial_merge_not_select(
+; CFG: entry5:
+; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; CFG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CFG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CFG: "land.lhs.true11+land.rhs":
+; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CFG-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT: store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
+; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CFG-NEXT: [[SEL2:%.*]] = select i1 [[CMP3]], i1 [[CMP4]], i1 false
+; CFG-NEXT: br i1 [[SEL2]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CFG: land.lhs.true211:
+; CFG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CFG-NEXT: br label [[LAND_END]]
+; CFG: land.end:
+; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry5 ]
+; CFG-NEXT: ret i1 [[RES]]
+entry:
+ %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
+ %0 = load i8, ptr %arrayidx, align 1
+ %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+ %2 = load i8, ptr %arrayidx2, align 1
+ %cmp = icmp eq i8 %0, -1
+ %cmp5 = icmp eq i8 %1, -56
+ %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+ %cmp9 = icmp eq i8 %2, -66
+ %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+ br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11: ; preds = %entry
+ %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 12
+ %3 = load i8, ptr %arrayidx12, align 1
+ %cmp14 = icmp eq i8 %3, 1
+ br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16: ; preds = %land.lhs.true11
+ %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+ %4 = load i8, ptr %arrayidx17, align 1
+ %cmp19 = icmp eq i8 %4, 2
+ br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21: ; preds = %land.lhs.true16
+ %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+ %5 = load i8, ptr %arrayidx22, align 1
+ %cmp24 = icmp eq i8 %5, 7
+ br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true21
+ %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 13
+ %6 = load i8, ptr %arrayidx26, align 1
+ %cmp28 = icmp eq i8 %6, 9
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true21, %land.lhs.true16, %land.lhs.true11, %entry
+ %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+ ret i1 %7
+}
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
new file mode 100644
index 0000000000000..7cf05d5159b66
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -0,0 +1,230 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
+
+; REG checks the IR when only mergeicmps is run.
+; CFG checks the IR when simplifycfg is run afterwards to merge distinct blocks back together.
+
+; Can merge part of a select block even if not entire block mergable.
+
+%S = type { i32, i8, i8, i16, i32, i32, i32, i8 }
+
+define zeroext i1 @cmp_partially_mergable_select(
+ ptr nocapture readonly align 4 dereferenceable(24) %a,
+ ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
+; REG-LABEL: @cmp_partially_mergable_select(
+; REG: "land.lhs.true+land.rhs+land.lhs.true4":
+; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
+; REG: land.lhs.true103:
+; REG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; REG-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
+; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
+; REG-NEXT: br i1 [[CMP2]], label [[ENTRY2:%.*]], label [[LAND_END]]
+; REG: entry2:
+; REG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; REG-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
+; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
+; REG: land.lhs.true41:
+; REG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; REG-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
+; REG-NEXT: br label %land.end
+; REG: land.end:
+; REG-NEXT: [[TMP8:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[ENTRY2]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
+; REG-NEXT: ret i1 [[TMP8]]
+;
+; CFG-LABEL: @cmp_partially_mergable_select(
+; CFG: "land.lhs.true+land.rhs+land.lhs.true4":
+; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CFG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
+; CFG: land.lhs.true103:
+; CFG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; CFG-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
+; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
+; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
+; CFG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CFG-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
+; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
+; CFG-NEXT: [[SEL:%.*]] = select i1 %5, i1 %8, i1 false
+; CFG-NEXT: br i1 [[SEL]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
+; CFG: land.lhs.true41:
+; CFG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; CFG-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; CFG-NEXT: [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
+; CFG-NEXT: br label %land.end
+; CFG: land.end:
+; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
+; CFG-NEXT: ret i1 [[RES]]
+;
+entry:
+ %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %0 = load i32, ptr %e, align 4
+ %cmp = icmp eq i32 %0, 255
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr %a, align 4
+ %2 = load i32, ptr %b, align 4
+ %cmp3 = icmp eq i32 %1, %2
+ br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4: ; preds = %land.lhs.true
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+ %3 = load i8, ptr %c, align 1
+ %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+ %4 = load i8, ptr %c5, align 1
+ %cmp7 = icmp eq i8 %3, %4
+ %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+ %5 = load i32, ptr %g, align 4
+ %cmp9 = icmp eq i32 %5, 100
+ %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+ br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10: ; preds = %land.lhs.true4
+ %h = getelementptr inbounds nuw i8, ptr %a, i64 20
+ %6 = load i8, ptr %h, align 4
+ %h12 = getelementptr inbounds nuw i8, ptr %b, i64 20
+ %7 = load i8, ptr %h12, align 4
+ %cmp14 = icmp eq i8 %6, %7
+ br i1 %cmp14, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true10
+ %b15 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %8 = load i8, ptr %b15, align 4
+ %b17 = getelementptr inbounds nuw i8, ptr %b, i64 4
+ %9 = load i8, ptr %b17, align 4
+ %cmp19 = icmp eq i8 %8, %9
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+ %10 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp19, %land.rhs ]
+ ret i1 %10
+}
+
+
+; p[12] and p[13] are mergable. p[12] is inside of a select block which will be split up.
+; MergeICmps always splits up matching select blocks. The following simplifycfg pass merges them back together.
+
+define dso_local zeroext i1 @cmp_partially_mergable_select_array(
+ ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
+; REG-LABEL: @cmp_partially_mergable_select_array(
+; REG: entry5:
+; REG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; REG-NEXT: br i1 %2, label [[ENTRY4:%.*]], label [[LAND_END:%.*]]
+; REG: entry4:
+; REG-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
+; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND:%.*]], label [[LAND_END]]
+; REG: "entry+land.rhs":
+; REG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
+; REG-NEXT: [[TMP5:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT: store [2 x i8] c"\FF\09", ptr %7, align 1
+; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
+; REG: land.lhs.true113:
+; REG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
+; REG: land.lhs.true162:
+; REG-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; REG: land.lhs.true211:
+; REG-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
+; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; REG-NEXT: br label [[LAND_END]]
+; REG: land.end:
+; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_LAND]] ], [ false, [[ENTRY4]] ], [ false, %entry5 ]
+; REG-NEXT: ret i1 [[RES]]
+;
+;
+; CFG-LABEL: @cmp_partially_mergable_select_array(
+; CFG: entry5:
+; CFG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
+; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; CFG-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
+; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT: br i1 [[SEL0]], label [[ENTRY_LAND:%.*]], label [[LAND_END:%.*]]
+; CFG: "entry+land.rhs":
+; CFG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
+; CFG-NEXT: [[TMP5:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT: store [2 x i8] c"\FF\09", ptr %7, align 1
+; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
+; CFG-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
+; CFG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; CFG-NEXT: [[SEL1:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; CFG-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
+; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; CFG-NEXT: [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP4]], i1 false
+; CFG-NEXT: br i1 [[SEL2]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; CFG: land.lhs.true211:
+; CFG-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
+; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; CFG-NEXT: br label [[LAND_END]]
+; CFG: land.end:
+; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[ENTRY_LAND]] ], [ false, %entry5 ]
+; CFG-NEXT: ret i1 [[RES]]
+;
+entry:
+ %arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
+ %0 = load i8, ptr %arrayidx, align 1
+ %arrayidx1 = getelementptr inbounds nuw i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %arrayidx2 = getelementptr inbounds nuw i8, ptr %p, i64 3
+ %2 = load i8, ptr %arrayidx2, align 1
+ %cmp = icmp eq i8 %0, -1
+ %cmp5 = icmp eq i8 %1, -56
+ %or.cond = select i1 %cmp, i1 %cmp5, i1 false
+ %cmp9 = icmp eq i8 %2, -66
+ %or.cond30 = select i1 %or.cond, i1 %cmp9, i1 false
+ br i1 %or.cond30, label %land.lhs.true11, label %land.end
+
+land.lhs.true11:
+ %arrayidx12 = getelementptr inbounds nuw i8, ptr %p, i64 10
+ %3 = load i8, ptr %arrayidx12, align 1
+ %cmp14 = icmp eq i8 %3, 1
+ br i1 %cmp14, label %land.lhs.true16, label %land.end
+
+land.lhs.true16:
+ %arrayidx17 = getelementptr inbounds nuw i8, ptr %p, i64 6
+ %4 = load i8, ptr %arrayidx17, align 1
+ %cmp19 = icmp eq i8 %4, 2
+ br i1 %cmp19, label %land.lhs.true21, label %land.end
+
+land.lhs.true21:
+ %arrayidx22 = getelementptr inbounds nuw i8, ptr %p, i64 8
+ %5 = load i8, ptr %arrayidx22, align 1
+ %cmp24 = icmp eq i8 %5, 7
+ br i1 %cmp24, label %land.rhs, label %land.end
+
+land.rhs:
+ %arrayidx26 = getelementptr inbounds nuw i8, ptr %p, i64 13
+ %6 = load i8, ptr %arrayidx26, align 1
+ %cmp28 = icmp eq i8 %6, 9
+ br label %land.end
+
+land.end:
+ %7 = phi i1 [ false, %land.lhs.true21 ], [ false, %land.lhs.true16 ], [ false, %land.lhs.true11 ], [ false, %entry ], [ %cmp28, %land.rhs ]
+ ret i1 %7
+}
+
diff --git a/llvm/test/Transforms/MergeICmps/X86/single-block.ll b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
new file mode 100644
index 0000000000000..b5735c73ced4c
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
@@ -0,0 +1,23 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Merges adjacent comparisons with constants even if only in single basic block
+
+define i1 @merge_single(ptr nocapture noundef readonly dereferenceable(2) %p) {
+; CHECK-LABEL: @merge_single(
+; CHECK: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[P:%.*]], i64 1
+; CHECK-NEXT: [[TMP1:%.*]] = alloca [2 x i8], align 1
+; CHECK-NEXT: store [2 x i8] c"\FF\FF", ptr [[TMP1]], align 1
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP1]], i64 2)
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: ret i1 [[CMP0]]
+;
+entry:
+ %0 = load i8, ptr %p, align 1
+ %arrayidx1 = getelementptr inbounds i8, ptr %p, i64 1
+ %1 = load i8, ptr %arrayidx1, align 1
+ %cmp = icmp eq i8 %0, -1
+ %cmp3 = icmp eq i8 %1, -1
+ %2 = select i1 %cmp, i1 %cmp3, i1 false
+ ret i1 %2
+}
>From 52e03dfc88705f20c4b985fcfae776644a00f729 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 28 Feb 2025 19:25:46 +0100
Subject: [PATCH 08/23] [MergeIcmps] Reimplemented block-splitting for
multbceblocks; fixed block reordering
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 214 +++++++++---------
.../MergeICmps/X86/mixed-cmp-bb-select.ll | 2 -
.../MergeICmps/X86/mixed-comparisons.ll | 2 -
.../X86/not-split-unmerged-select.ll | 28 +--
.../MergeICmps/X86/partial-select-merge.ll | 103 ++++-----
5 files changed, 172 insertions(+), 177 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 779e9325a311a..2bf2eaaf3abcc 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -312,16 +312,12 @@ class IntraCmpChain {
};
-// A basic block that contains one or more comparisons
+// A basic block that contains one or more comparisons.
class MultBCECmpBlock {
public:
MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
: BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
- // // Returns true if each comparison in this basic block is being merged.
- // // Necessary because otherwise would leave basic block in invalid state.
- // bool hasAllCmpsMerged() const;
-
// Returns true if the block does other works besides comparison.
bool doesOtherWork() const;
@@ -329,24 +325,20 @@ class MultBCECmpBlock {
return Cmps;
}
- // // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
- // // instructions in the block.
- // bool canSplit(AliasAnalysis &AA) const;
+ // Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
+ // instructions in the block.
+ bool canSplit(AliasAnalysis &AA) const;
- // // Return true if this all the relevant instructions in the BCE-cmp-block can
- // // be sunk below this instruction. By doing this, we know we can separate the
- // // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
- // // block.
- // bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
+ // Return true if this all the relevant instructions in the BCE-cmp-block can
+ // be sunk below this instruction. By doing this, we know we can separate the
+ // BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
+ // block.
+ bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
// The basic block where this comparison happens.
BasicBlock *BB;
// Instructions relating to the BCECmp and branch.
InstructionSet BlockInsts;
- // The block requires splitting.
- bool RequireSplit = false;
- // Original order of this block in the chain.
- unsigned OrigOrder = 0;
private:
std::vector<Comparison*> Cmps;
@@ -359,8 +351,11 @@ class MultBCECmpBlock {
// (see canSplit()).
class SingleBCECmpBlock {
public:
- SingleBCECmpBlock(MultBCECmpBlock M, unsigned I)
- : BB(M.BB), OrigOrder(M.OrigOrder), Cmp(M.getCmps()[I]) {}
+ SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
+ : BB(M.BB), OrigOrder(OrigOrder), Cmp(M.getCmps()[I]) {}
+
+ SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
+ : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts) {}
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
const Comparison* getCmp() const {
@@ -374,67 +369,60 @@ class SingleBCECmpBlock {
// We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
// instructions. Split the old block and move all non-BCE-cmp-insts into the
// new parent block.
- // void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
+ void split(BasicBlock *NewParent, AliasAnalysis &AA) const;
// The basic block where this comparison happens.
BasicBlock *BB;
// Original order of this block in the chain.
unsigned OrigOrder = 0;
+ // The block requires splitting.
+ bool RequireSplit = false;
private:
Comparison* Cmp;
+ llvm::SmallVector<Instruction *, 4> SplitInsts;
};
-// bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
-// AliasAnalysis &AA) const {
-// // If this instruction may clobber the loads and is in middle of the BCE cmp
-// // block instructions, then bail for now.
-// if (Inst->mayWriteToMemory()) {
-// auto MayClobber = [&](LoadInst *LI) {
-// // If a potentially clobbering instruction comes before the load,
-// // we can still safely sink the load.
-// return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
-// isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
-// };
-// for (auto* Cmp : Cmps.getCmpChain()) {
-// auto [Lhs,Rhs] = Cmp->getLoads();
-// if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
-// return false;
-// }
-// }
-// // Make sure this instruction does not use any of the BCE cmp block
-// // instructions as operand.
-// return llvm::none_of(Inst->operands(), [&](const Value *Op) {
-// const Instruction *OpI = dyn_cast<Instruction>(Op);
-// return OpI && BlockInsts.contains(OpI);
-// });
-// }
+bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
+ AliasAnalysis &AA) const {
+ // If this instruction may clobber the loads and is in middle of the BCE cmp
+ // block instructions, then bail for now.
+ if (Inst->mayWriteToMemory()) {
+ auto MayClobber = [&](LoadInst *LI) {
+ // If a potentially clobbering instruction comes before the load,
+ // we can still safely sink the load.
+ return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
+ isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
+ };
+ for (auto* Cmp : Cmps) {
+ auto [Lhs,Rhs] = Cmp->getLoads();
+ if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
+ return false;
+ }
+ }
+ // Make sure this instruction does not use any of the BCE cmp block
+ // instructions as operand.
+ return llvm::none_of(Inst->operands(), [&](const Value *Op) {
+ const Instruction *OpI = dyn_cast<Instruction>(Op);
+ return OpI && BlockInsts.contains(OpI);
+ });
+}
-// void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
-// llvm::SmallVector<Instruction *, 4> OtherInsts;
-// for (Instruction &Inst : *BB) {
-// if (BlockInsts.count(&Inst))
-// continue;
-// assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
-// // This is a non-BCE-cmp-block instruction. And it can be separated
-// // from the BCE-cmp-block instruction.
-// OtherInsts.push_back(&Inst);
-// }
-
-// // Do the actual splitting.
-// for (Instruction *Inst : reverse(OtherInsts))
-// Inst->moveBeforePreserving(*NewParent, NewParent->begin());
-// }
+void SingleBCECmpBlock::split(BasicBlock *NewParent, AliasAnalysis &AA) const {
+ // Do the actual splitting.
+ for (Instruction *Inst : reverse(SplitInsts))
+ Inst->moveBeforePreserving(*NewParent, NewParent->begin());
+}
-// bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
-// for (Instruction &Inst : *BB) {
-// if (!BlockInsts.count(&Inst)) {
-// if (!canSinkBCECmpInst(&Inst, AA))
-// return false;
-// }
-// }
-// return true;
-// }
+bool MultBCECmpBlock::canSplit(AliasAnalysis &AA) const {
+ for (Instruction &Inst : *BB) {
+ if (!BlockInsts.count(&Inst)) {
+ if (!canSinkBCECmpInst(&Inst, AA))
+ return false;
+ }
+ }
+ return true;
+}
bool MultBCECmpBlock::doesOtherWork() const {
// TODO(courbet): Can we allow some other things ? This is very conservative.
@@ -592,11 +580,26 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
// }
static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
- MultBCECmpBlock &&CmpBlock) {
+ MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
// emitDebugInfo(Comparison);
- CmpBlock.OrigOrder = Comparisons.size();
- for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++)
- Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i));
+ for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
+ unsigned OrigOrder = Comparisons.size();
+ if (!RequireSplit || i != 0) {
+ Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder));
+ continue;
+ }
+ // If should split mult block then put all instructions at the beginning of the first block
+ llvm::SmallVector<Instruction *, 4> OtherInsts;
+ for (Instruction &Inst : *CmpBlock.BB) {
+ if (CmpBlock.BlockInsts.count(&Inst))
+ continue;
+ assert(CmpBlock.canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+ // This is a non-BCE-cmp-block instruction. And it can be separated
+ // from the BCE-cmp-block instruction.
+ OtherInsts.push_back(&Inst);
+ }
+ Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder, OtherInsts));
+ }
}
// A chain of comparisons.
@@ -683,33 +686,32 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
if (CmpBlock->doesOtherWork()) {
LLVM_DEBUG(dbgs() << "block '" << CmpBlock->BB->getName()
<< "' does extra work besides compare\n");
- // if (Comparisons.empty()) {
- // // This is the initial block in the chain, in case this block does other
- // // work, we can try to split the block and move the irrelevant
- // // instructions to the predecessor.
- // //
- // // If this is not the initial block in the chain, splitting it wont
- // // work.
- // //
- // // As once split, there will still be instructions before the BCE cmp
- // // instructions that do other work in program order, i.e. within the
- // // chain before sorting. Unless we can abort the chain at this point
- // // and start anew.
- // //
- // // NOTE: we only handle blocks a with single predecessor for now.
- // if (Comparison->canSplit(AA)) {
- // LLVM_DEBUG(dbgs()
- // << "Split initial block '" << Comparison->BB->getName()
- // << "' that does extra work besides compare\n");
- // Comparison->RequireSplit = true;
- // enqueueBlock(Comparisons, std::move(*Comparison));
- // } else {
- // LLVM_DEBUG(dbgs()
- // << "ignoring initial block '" << Comparison->BB->getName()
- // << "' that does extra work besides compare\n");
- // }
- // continue;
- // }
+ if (Comparisons.empty()) {
+ // This is the initial block in the chain, in case this block does other
+ // work, we can try to split the block and move the irrelevant
+ // instructions to the predecessor.
+ //
+ // If this is not the initial block in the chain, splitting it wont
+ // work.
+ //
+ // As once split, there will still be instructions before the BCE cmp
+ // instructions that do other work in program order, i.e. within the
+ // chain before sorting. Unless we can abort the chain at this point
+ // and start anew.
+ //
+ // NOTE: we only handle blocks a with single predecessor for now.
+ if (CmpBlock->canSplit(AA)) {
+ LLVM_DEBUG(dbgs()
+ << "Split initial block '" << CmpBlock->BB->getName()
+ << "' that does extra work besides compare\n");
+ enqueueBlock(Comparisons, std::move(*CmpBlock), AA, true);
+ } else {
+ LLVM_DEBUG(dbgs()
+ << "ignoring initial block '" << CmpBlock->BB->getName()
+ << "' that does extra work besides compare\n");
+ }
+ continue;
+ }
// TODO(courbet): Right now we abort the whole chain. We could be
// merging only the blocks that don't do other work and resume the
// chain from there. For example:
@@ -735,7 +737,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
// We could still merge bb1 and bb2 though.
return;
}
- enqueueBlock(Comparisons, std::move(*CmpBlock));
+ enqueueBlock(Comparisons, std::move(*CmpBlock), AA, false);
}
// It is possible we have no suitable comparison to merge.
@@ -862,12 +864,12 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
// If there is one block that requires splitting, we do it now, i.e.
// just before we know we will collapse the chain. The instructions
// can be executed before any of the instructions in the chain.
- // const auto ToSplit = llvm::find_if(
- // Comparisons, [](const BCECmpBlock &B) { return B.RequireSplit; });
- // if (ToSplit != Comparisons.end()) {
- // LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
- // ToSplit->split(BB, AA);
- // }
+ const auto ToSplit = llvm::find_if(
+ Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
+ if (ToSplit != Comparisons.end()) {
+ LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
+ ToSplit->split(BB, AA);
+ }
if (Comparisons.size() == 1) {
LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index ad3326cc4df90..74e7a9ce705de 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -2,8 +2,6 @@
; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
-%S = type { i32, i8, i8, i16, i32, i32, i32 }
-
define dso_local noundef zeroext i1 @cmp_mixed(
ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 0470a24b0ce6c..ec1c8660fde86 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -1,8 +1,6 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
-%S = type { i32, i8, i8, i16, i32, i32, i32 }
-
; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index c160647271fb7..cd409b0f007ee 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -92,19 +92,19 @@ land.end: ; preds = %land.rhs, %land.lhs
define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
; REG-LABEL: @partial_merge_not_select(
; REG: entry5:
-; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
-; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END]]
; REG: entry4:
-; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
-; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END]]
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END:%.*]]
; REG: entry3:
-; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
; REG: "land.lhs.true11+land.rhs":
; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
@@ -129,16 +129,16 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
;
; CFG-LABEL: @partial_merge_not_select(
; CFG: entry5:
-; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
+; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
; CFG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
-; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -66
+; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -1
+; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
; CFG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
; CFG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CFG: "land.lhs.true11+land.rhs":
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 7cf05d5159b66..9fabe7fb2fc61 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -6,8 +6,6 @@
; Can merge part of a select block even if not entire block mergable.
-%S = type { i32, i8, i8, i16, i32, i32, i32, i8 }
-
define zeroext i1 @cmp_partially_mergable_select(
ptr nocapture readonly align 4 dereferenceable(24) %a,
ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
@@ -114,75 +112,74 @@ land.end: ; preds = %land.rhs, %land.lhs
define dso_local zeroext i1 @cmp_partially_mergable_select_array(
ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
; REG-LABEL: @cmp_partially_mergable_select_array(
+; REG: "entry+land.rhs":
+; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; REG-NEXT: [[TMP0:%.*]] = alloca [2 x i8], align 1
+; REG-NEXT: store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
; REG: entry5:
-; REG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
-; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; REG-NEXT: br i1 %2, label [[ENTRY4:%.*]], label [[LAND_END:%.*]]
+; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
; REG: entry4:
-; REG-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
-; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND:%.*]], label [[LAND_END]]
-; REG: "entry+land.rhs":
-; REG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
-; REG-NEXT: [[TMP5:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT: store [2 x i8] c"\FF\09", ptr %7, align 1
-; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
; REG: land.lhs.true113:
-; REG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; REG-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
+; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
; REG: land.lhs.true162:
-; REG-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
+; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
; REG: land.lhs.true211:
-; REG-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
-; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
; REG-NEXT: br label [[LAND_END]]
; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_LAND]] ], [ false, [[ENTRY4]] ], [ false, %entry5 ]
+; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_4]] ], [ false, [[ENTRY_5]] ], [ false, %"entry+land.rhs" ]
; REG-NEXT: ret i1 [[RES]]
;
;
; CFG-LABEL: @cmp_partially_mergable_select_array(
-; CFG: entry5:
-; CFG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 1
-; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[TMP0]], align 1
-; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; CFG-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
-; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP3]], -66
-; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT: br i1 [[SEL0]], label [[ENTRY_LAND:%.*]], label [[LAND_END:%.*]]
; CFG: "entry+land.rhs":
-; CFG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr %p, i64 12
-; CFG-NEXT: [[TMP5:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT: store [2 x i8] c"\FF\09", ptr %7, align 1
-; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP4]], ptr [[TMP5]], i64 2)
-; CFG-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; CFG-NEXT: [[TMP7:%.*]] = load i8, ptr [[TMP6]], align 1
-; CFG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP7]], 1
-; CFG-NEXT: [[SEL1:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
-; CFG-NEXT: [[TMP8:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT: [[TMP9:%.*]] = load i8, ptr [[TMP8]], align 1
-; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP9]], 2
-; CFG-NEXT: [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP4]], i1 false
-; CFG-NEXT: br i1 [[SEL2]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
+; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; CFG-NEXT: [[TMP0:%.*]] = alloca [2 x i8], align 1
+; CFG-NEXT: store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
+; CFG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
+; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CFG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CFG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CFG-NEXT: [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP3]], i1 false
+; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CFG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CFG-NEXT: [[SEL3:%.*]] = select i1 [[SEL2]], i1 [[CMP4]], i1 false
+; CFG-NEXT: br i1 [[SEL3]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
; CFG: land.lhs.true211:
-; CFG-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT: [[TMP11:%.*]] = load i8, ptr [[TMP10]], align 1
-; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP11]], 7
+; CFG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CFG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
; CFG-NEXT: br label [[LAND_END]]
; CFG: land.end:
-; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[ENTRY_LAND]] ], [ false, %entry5 ]
+; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, %"entry+land.rhs" ]
; CFG-NEXT: ret i1 [[RES]]
;
entry:
>From a3005c017b1388c44ea3ec515b284ec5248e73bd Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 12:33:36 +0100
Subject: [PATCH 09/23] [MergeICmps] Added tests for splitting const and select
blocks
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 17 ++--
.../MergeICmps/X86/split-block-does-work.ll | 87 +++++++++++++++++++
2 files changed, 95 insertions(+), 9 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2bf2eaaf3abcc..18ee7a877d985 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -318,13 +318,13 @@ class MultBCECmpBlock {
MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
: BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
- // Returns true if the block does other works besides comparison.
- bool doesOtherWork() const;
-
std::vector<Comparison*> getCmps() {
return Cmps;
}
+ // Returns true if the block does other works besides comparison.
+ bool doesOtherWork() const;
+
// Returns true if the non-BCE-cmp instructions can be separated from BCE-cmp
// instructions in the block.
bool canSplit(AliasAnalysis &AA) const;
@@ -358,9 +358,7 @@ class SingleBCECmpBlock {
: BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts) {}
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
- const Comparison* getCmp() const {
- return Cmp;
- }
+ const Comparison* getCmp() const { return Cmp; }
bool operator<(const SingleBCECmpBlock &O) const {
return *Cmp < *O.Cmp;
@@ -579,7 +577,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
// LLVM_DEBUG(dbgs() << "\n");
// }
-static inline void enqueueBlock(std::vector<SingleBCECmpBlock> &Comparisons,
+// Enqueues a single comparison and if it's the first comparison of the first block then adds the `OtherInsts` to the block too. To split it.
+static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
// emitDebugInfo(Comparison);
for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
@@ -704,7 +703,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
LLVM_DEBUG(dbgs()
<< "Split initial block '" << CmpBlock->BB->getName()
<< "' that does extra work besides compare\n");
- enqueueBlock(Comparisons, std::move(*CmpBlock), AA, true);
+ enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, true);
} else {
LLVM_DEBUG(dbgs()
<< "ignoring initial block '" << CmpBlock->BB->getName()
@@ -737,7 +736,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
// We could still merge bb1 and bb2 though.
return;
}
- enqueueBlock(Comparisons, std::move(*CmpBlock), AA, false);
+ enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, false);
}
// It is possible we have no suitable comparison to merge.
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index c53d86d76ff3b..61304694548a2 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -4,6 +4,7 @@
%S = type { i32, i32, i32, i32 }
declare void @foo(...)
+declare void @bar(...)
; We can split %entry and create a memcmp(16 bytes).
define zeroext i1 @opeq1(
@@ -240,3 +241,89 @@ opeq1.exit:
%8 = phi i1 [ false, %entry ], [ false, %land.rhs.i] , [ false, %land.rhs.i.2 ], [ %cmp4.i, %land.rhs.i.3 ]
ret i1 %8
}
+
+; Call instruction mixed in with select block but doesn't clobber memory, so can safely sink and merge all comparisons.
+; Make sure that call order stays the same.
+define dso_local noundef zeroext i1 @unclobbered_select_cmp(
+; X86-LABEL: @unclobbered_select_cmp(
+; X86-NEXT: "entry+land.rhs":
+; X86-NEXT: call void (...) @foo() #[[ATTR2]]
+; X86-NEXT: call void (...) @bar() #[[ATTR2]]
+; X86-NEXT: [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
+; X86-NEXT: [[TMP0:%.*]] = alloca [3 x i8], align 1
+; X86-NEXT: store [3 x i8] c"d\03\C8", ptr [[TMP0]], align 1
+; X86-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
+; X86-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; X86-NEXT: br label [[LAND_END:%.*]]
+; X86: land.end:
+; X86-NEXT: ret i1 [[TMP1]]
+;
+ ptr nocapture readonly dereferenceable(5) %a) local_unnamed_addr nofree nosync {
+entry:
+ %q = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %0 = load i8, ptr %q, align 1
+ call void (...) @foo() inaccessiblememonly
+ %cmp = icmp eq i8 %0, 200
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 2
+ %1 = load i8, ptr %c, align 1
+ %cmp2 = icmp eq i8 %1, 100
+ call void (...) @bar() inaccessiblememonly
+ %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+ br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %entry
+ %b3 = getelementptr inbounds nuw i8, ptr %a, i64 3
+ %2 = load i8, ptr %b3, align 1
+ %cmp5 = icmp eq i8 %2, 3
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %entry
+ %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+ ret i1 %3
+}
+
+
+; Can only split first block. If subsequent block contains a clobber instruction then don't merge.
+define dso_local noundef zeroext i1 @not_split_sec_block(
+; X86-LABEL: @not_split_sec_block(
+; X86-NEXT: entry:
+; X86-NEXT: [[TMP0:%.*]] = load i8, ptr [[A:%.*]], align 1
+; X86-NEXT: call void (...) @foo() #[[ATTR2]]
+; X86-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -56
+; X86-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 2
+; X86-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP1]], align 1
+; X86-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP2]], 100
+; X86-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; X86-NEXT: br i1 [[SEL0]], label [[LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; X86: land.rhs:
+; X86-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 1
+; X86-NEXT: [[TMP4:%.*]] = load i8, ptr [[TMP3]], align 1
+; X86-NEXT: call void (...) @bar() #[[ATTR2]]
+; X86-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP4]], 3
+; X86-NEXT: br label [[LAND_END]]
+; X86: land.end:
+; X86-NEXT: [[RES:%.*]] = phi i1 [ false, %entry ], [ [[CMP2]], [[LAND_RHS]] ]
+; X86-NEXT: ret i1 [[RES]]
+;
+ ptr nocapture readonly dereferenceable(3) %a) local_unnamed_addr nofree nosync {
+entry:
+ %0 = load i8, ptr %a, align 1
+ call void (...) @foo() inaccessiblememonly
+ %cmp = icmp eq i8 %0, 200
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 2
+ %1 = load i8, ptr %c, align 1
+ %cmp2 = icmp eq i8 %1, 100
+ %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+ br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %entry
+ %b3 = getelementptr inbounds nuw i8, ptr %a, i64 1
+ %2 = load i8, ptr %b3, align 1
+; Even though this call doesn't clobber any memory, can only sink instructions from first block.
+ call void (...) @bar() inaccessiblememonly
+ %cmp5 = icmp eq i8 %2, 3
+ br label %land.end
+land.end:
+ %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+ ret i1 %3
+}
>From e3caafc42d23bbc0ec9ebfe9943b10af9b7410bc Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 17:06:06 +0100
Subject: [PATCH 10/23] [MergeICmps] Can build const-cmp-chains of different
types using llvm.structs
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 26 +++---
.../X86/mixed-type-const-comparisons.ll | 79 +++++++++++++++++++
2 files changed, 93 insertions(+), 12 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 18ee7a877d985..e75836e895175 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -266,6 +266,7 @@ struct BCECmp : public Comparison {
}
};
+// TODO: this can be improved to take alignment into account.
bool Comparison::areContiguous(const Comparison& Other) const {
assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
if (isa<BCEConstCmp>(this)) {
@@ -549,10 +550,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
InstructionSet BlockInsts;
std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
- if (!Result) {
- dbgs() << "invalid result\n";
+ if (!Result)
return std::nullopt;
- }
for (auto* Cmp : Result->getCmpChain()) {
auto CmpInsts = Cmp->getInsts();
@@ -577,7 +576,7 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
// LLVM_DEBUG(dbgs() << "\n");
// }
-// Enqueues a single comparison and if it's the first comparison of the first block then adds the `OtherInsts` to the block too. To split it.
+// Enqueues a single comparison and if it's the first comparison block then adds the `OtherInsts` to the block too to split it.
static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
// emitDebugInfo(Comparison);
@@ -837,18 +836,21 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
else
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
- // Build constant-array to compare to
- if (auto* FirstConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp())) {
+ // Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
+ if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
if (Comparisons.size() > 1) {
- auto* ArrayType = ArrayType::get(FirstConstCmp->Lhs.LoadI->getType(),Comparisons.size());
- auto* ArrayAlloca = Builder.CreateAlloca(ArrayType,nullptr);
std::vector<Constant*> Constants;
+ std::vector<Type*> Types;
for (const auto& BceBlock : Comparisons) {
- Constants.emplace_back(cast<BCEConstCmp>(BceBlock.getCmp())->Const);
+ auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+ Constants.emplace_back(ConstCmp->Const);
+ Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
}
- auto *ArrayConstant = ConstantArray::get(ArrayType, Constants);
- Builder.CreateStore(ArrayConstant,ArrayAlloca);
- Rhs = ArrayAlloca;
+ auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+ auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+ auto *StructConstant = ConstantStruct::get(StructType, Constants);
+ Builder.CreateStore(StructConstant, StructAlloca);
+ Rhs = StructAlloca;
}
} else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
if (FirstBceCmp->Rhs.GEP)
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
new file mode 100644
index 0000000000000..05aa99d31c5a1
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -0,0 +1,79 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; Tests if a const-cmp-chain of different types can still be merged.
+; This is usually the case when comparing different struct fields to constants.
+
+; Can only merge gep 0 with gep 4 due to alignment since gep 8 is not directly adjacent to gep 4.
+define dso_local zeroext i1 @is_all_ones_struct(
+; CHECK-LABEL: @is_all_ones_struct(
+; CHECK: entry1:
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
+; CHECK-NEXT: br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
+; CHECK: "land.rhs+land.lhs.true":
+; CHECK-NEXT: [[TMP2:%.*]] = alloca { i32, i8 }
+; CHECK-NEXT: store { i32, i8 } { i32 3, i8 100 }, ptr [[TMP2]]
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ [[CMP1]], [[MERGED]] ], [ false, %entry1 ]
+; CHECK-NEXT: ret i1 [[RES]]
+;
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+entry:
+ %c = getelementptr inbounds nuw i8, ptr %p, i64 8
+ %0 = load i32, ptr %c, align 4
+ %cmp = icmp eq i32 %0, 200
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %b = getelementptr inbounds nuw i8, ptr %p, i64 4
+ %1 = load i8, ptr %b, align 4
+ %cmp1 = icmp eq i8 %1, 100
+ br i1 %cmp1, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true
+ %2 = load i32, ptr %p, align 4
+ %cmp3 = icmp eq i32 %2, 3
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true, %entry
+ %3 = phi i1 [ false, %land.lhs.true ], [ false, %entry ], [ %cmp3, %land.rhs ]
+ ret i1 %3
+}
+
+
+; Can also still merge select blocks with different types.
+define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
+; CHECK-LABEL: @is_all_ones_struct_select_block(
+; CHECK: "entry+land.rhs":
+; CHECK-NEXT: [[TMP0:%.*]] = alloca { i32, i8, i8 }
+; CHECK-NEXT: store { i32, i8, i8 } { i32 200, i8 3, i8 100 }, ptr [[TMP0]]
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: ret i1 [[CMP1]]
+;
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(24) %p) local_unnamed_addr {
+entry:
+ %0 = load i32, ptr %p, align 4
+ %cmp = icmp eq i32 %0, 200
+ %c = getelementptr inbounds nuw i8, ptr %p, i64 5
+ %1 = load i8, ptr %c, align 1
+ %cmp2 = icmp eq i8 %1, 100
+ %or.cond = select i1 %cmp, i1 %cmp2, i1 false
+ br i1 %or.cond, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %entry
+ %b3 = getelementptr inbounds nuw i8, ptr %p, i64 4
+ %2 = load i8, ptr %b3, align 4
+ %cmp5 = icmp eq i8 %2, 3
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %entry
+ %3 = phi i1 [ false, %entry ], [ %cmp5, %land.rhs ]
+ ret i1 %3
+}
>From 9f18a021fe589306898a66e7b74c4cc17d615770 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 12 Mar 2025 18:38:39 +0100
Subject: [PATCH 11/23] [MergeICmps] Changed tests to allocate structs instead
of arrays for const-cmp
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 1 +
llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll | 4 ++--
.../Transforms/MergeICmps/X86/many-const-cmp-select.ll | 8 ++++----
.../test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll | 4 ++--
llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll | 4 ++--
.../MergeICmps/X86/not-split-unmerged-select.ll | 8 ++++----
.../Transforms/MergeICmps/X86/partial-select-merge.ll | 8 ++++----
.../Transforms/MergeICmps/X86/split-block-does-work.ll | 4 ++--
8 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index e75836e895175..690ad4d26d8ef 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -846,6 +846,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
Constants.emplace_back(ConstCmp->Const);
Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
}
+ // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
auto *StructConstant = ConstantStruct::get(StructType, Constants);
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index f05422fd9aea1..fd9faf2d343f9 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -6,8 +6,8 @@
define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
; CHECK-LABEL: @test(
; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca [3 x i8], align 1
-; CHECK-NEXT: store [3 x i8] c"\FF\C8\BE", ptr [[TMP0]], align 1
+; CHECK-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8 }, align 8
+; CHECK-NEXT: store { i8, i8, i8 } { i8 -1, i8 -56, i8 -66 }, ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index ce8de31134e0f..aa0e0e1763c3d 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -5,15 +5,15 @@
define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
; CHECK-LABEL: @is_all_ones_many(
; CHECK-NEXT: "entry+land.lhs.true11":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca [4 x i8], align 1
-; CHECK-NEXT: store [4 x i8] c"\FF\C8\BE\01", ptr [[TMP0]], align 1
+; CHECK-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8, i8 }
+; CHECK-NEXT: store { i8, i8, i8, i8 } { i8 -1, i8 -56, i8 -66, i8 1 }, ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true16+land.lhs.true21":
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
-; CHECK-NEXT: store [2 x i8] c"\02\07", ptr [[TMP3]], align 1
+; CHECK-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
+; CHECK-NEXT: store { i8, i8 } { i8 2, i8 7 }, ptr [[TMP3]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 74e7a9ce705de..55b1587bb7651 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -12,8 +12,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(
; CHECK-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CHECK: "entry+land.rhs+land.lhs.true4":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca [3 x i32], align 4
-; CHECK-NEXT: store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = alloca { i32, i32, i32 }
+; CHECK-NEXT: store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index ec1c8660fde86..1e8f307c2a4df 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -14,8 +14,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
; This is the new BCE to constant comparison block
; CHECK: "entry+land.rhs+land.lhs.true8":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca [3 x i32], align 4
-; CHECK-NEXT: store [3 x i32] [i32 255, i32 200, i32 100], ptr [[TMP1]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = alloca { i32, i32, i32 }
+; CHECK-NEXT: store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index cd409b0f007ee..a9fc2ff64205e 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -108,8 +108,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
; REG: "land.lhs.true11+land.rhs":
; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT: store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; REG-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
+; REG-NEXT: store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
@@ -143,8 +143,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
; CFG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CFG: "land.lhs.true11+land.rhs":
; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT: [[TMP3:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT: store [2 x i8] c"\01\09", ptr [[TMP3]], align 1
+; CFG-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
+; CFG-NEXT: store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 9fabe7fb2fc61..55562a47153e1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -114,8 +114,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
; REG-LABEL: @cmp_partially_mergable_select_array(
; REG: "entry+land.rhs":
; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT: [[TMP0:%.*]] = alloca [2 x i8], align 1
-; REG-NEXT: store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; REG-NEXT: [[TMP0:%.*]] = alloca { i8, i8 }
+; REG-NEXT: store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
@@ -152,8 +152,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
; CFG-LABEL: @cmp_partially_mergable_select_array(
; CFG: "entry+land.rhs":
; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT: [[TMP0:%.*]] = alloca [2 x i8], align 1
-; CFG-NEXT: store [2 x i8] c"\FF\09", ptr [[TMP0]], align 1
+; CFG-NEXT: [[TMP0:%.*]] = alloca { i8, i8 }
+; CFG-NEXT: store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
; CFG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 61304694548a2..c496740bfc7cf 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -250,8 +250,8 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
; X86-NEXT: call void (...) @foo() #[[ATTR2]]
; X86-NEXT: call void (...) @bar() #[[ATTR2]]
; X86-NEXT: [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT: [[TMP0:%.*]] = alloca [3 x i8], align 1
-; X86-NEXT: store [3 x i8] c"d\03\C8", ptr [[TMP0]], align 1
+; X86-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8 }
+; X86-NEXT: store { i8, i8, i8 } { i8 100, i8 3, i8 -56 }, ptr [[TMP0]], align 1
; X86-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
; X86-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; X86-NEXT: br label [[LAND_END:%.*]]
>From a4d9733ca0f7f6c61f76b85fa580f39c10387573 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Wed, 19 Mar 2025 20:41:52 +0100
Subject: [PATCH 12/23] [MergeICmps] Changed tests to use packed structs
---
llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll | 4 ++--
.../Transforms/MergeICmps/X86/many-const-cmp-select.ll | 8 ++++----
.../test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll | 4 ++--
llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll | 4 ++--
.../MergeICmps/X86/mixed-type-const-comparisons.ll | 8 ++++----
.../MergeICmps/X86/not-split-unmerged-select.ll | 8 ++++----
.../Transforms/MergeICmps/X86/partial-select-merge.ll | 8 ++++----
llvm/test/Transforms/MergeICmps/X86/single-block.ll | 4 ++--
.../Transforms/MergeICmps/X86/split-block-does-work.ll | 4 ++--
9 files changed, 26 insertions(+), 26 deletions(-)
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index fd9faf2d343f9..51c3c27583602 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -6,8 +6,8 @@
define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
; CHECK-LABEL: @test(
; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8 }, align 8
-; CHECK-NEXT: store { i8, i8, i8 } { i8 -1, i8 -56, i8 -66 }, ptr [[TMP0]], align 1
+; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>, align 8
+; CHECK-NEXT: store <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>, ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END5:%.*]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index aa0e0e1763c3d..0ca0f671d98a4 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -5,15 +5,15 @@
define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
; CHECK-LABEL: @is_all_ones_many(
; CHECK-NEXT: "entry+land.lhs.true11":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8, i8 }
-; CHECK-NEXT: store { i8, i8, i8, i8 } { i8 -1, i8 -56, i8 -66, i8 1 }, ptr [[TMP0]], align 1
+; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8, i8 }>
+; CHECK-NEXT: store <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>, ptr [[TMP0]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true16+land.lhs.true21":
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
-; CHECK-NEXT: store { i8, i8 } { i8 2, i8 7 }, ptr [[TMP3]], align 1
+; CHECK-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; CHECK-NEXT: store <{ i8, i8 }> <{ i8 2, i8 7 }>, ptr [[TMP3]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 55b1587bb7651..dfe57e6ef930a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -12,8 +12,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(
; CHECK-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CHECK: "entry+land.rhs+land.lhs.true4":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca { i32, i32, i32 }
-; CHECK-NEXT: store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index 1e8f307c2a4df..d88d7d824b5ed 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -14,8 +14,8 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
; This is the new BCE to constant comparison block
; CHECK: "entry+land.rhs+land.lhs.true8":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca { i32, i32, i32 }
-; CHECK-NEXT: store { i32, i32, i32 } { i32 255, i32 200, i32 100 }, ptr [[TMP1]], align 4
+; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
index 05aa99d31c5a1..15c5a382d1f46 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -12,8 +12,8 @@ define dso_local zeroext i1 @is_all_ones_struct(
; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
; CHECK-NEXT: br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.rhs+land.lhs.true":
-; CHECK-NEXT: [[TMP2:%.*]] = alloca { i32, i8 }
-; CHECK-NEXT: store { i32, i8 } { i32 3, i8 100 }, ptr [[TMP2]]
+; CHECK-NEXT: [[TMP2:%.*]] = alloca <{ i32, i8 }>
+; CHECK-NEXT: store <{ i32, i8 }> <{ i32 3, i8 100 }>, ptr [[TMP2]]
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END]]
@@ -49,8 +49,8 @@ land.end: ; preds = %land.rhs, %land.lhs
define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
; CHECK-LABEL: @is_all_ones_struct_select_block(
; CHECK: "entry+land.rhs":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca { i32, i8, i8 }
-; CHECK-NEXT: store { i32, i8, i8 } { i32 200, i8 3, i8 100 }, ptr [[TMP0]]
+; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i32, i8, i8 }>
+; CHECK-NEXT: store <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>, ptr [[TMP0]]
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index a9fc2ff64205e..874ea22e75106 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -108,8 +108,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
; REG: "land.lhs.true11+land.rhs":
; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
-; REG-NEXT: store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
+; REG-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; REG-NEXT: store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
@@ -143,8 +143,8 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
; CFG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CFG: "land.lhs.true11+land.rhs":
; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT: [[TMP3:%.*]] = alloca { i8, i8 }
-; CFG-NEXT: store { i8, i8 } { i8 1, i8 9 }, ptr [[TMP3]], align 1
+; CFG-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
+; CFG-NEXT: store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 55562a47153e1..20a3faa854836 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -114,8 +114,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
; REG-LABEL: @cmp_partially_mergable_select_array(
; REG: "entry+land.rhs":
; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT: [[TMP0:%.*]] = alloca { i8, i8 }
-; REG-NEXT: store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
+; REG-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8 }>
+; REG-NEXT: store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
@@ -152,8 +152,8 @@ define dso_local zeroext i1 @cmp_partially_mergable_select_array(
; CFG-LABEL: @cmp_partially_mergable_select_array(
; CFG: "entry+land.rhs":
; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT: [[TMP0:%.*]] = alloca { i8, i8 }
-; CFG-NEXT: store { i8, i8 } { i8 -1, i8 9 }, ptr [[TMP0]], align 1
+; CFG-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8 }>
+; CFG-NEXT: store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
; CFG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/single-block.ll b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
index b5735c73ced4c..cd321f435d1f3 100644
--- a/llvm/test/Transforms/MergeICmps/X86/single-block.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/single-block.ll
@@ -6,8 +6,8 @@ define i1 @merge_single(ptr nocapture noundef readonly dereferenceable(2) %p) {
; CHECK-LABEL: @merge_single(
; CHECK: entry:
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds i8, ptr [[P:%.*]], i64 1
-; CHECK-NEXT: [[TMP1:%.*]] = alloca [2 x i8], align 1
-; CHECK-NEXT: store [2 x i8] c"\FF\FF", ptr [[TMP1]], align 1
+; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i8, i8 }>, align 1
+; CHECK-NEXT: store <{ i8, i8 }> <{ i8 -1, i8 -1 }>, ptr [[TMP1]], align 1
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP1]], i64 2)
; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: ret i1 [[CMP0]]
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index c496740bfc7cf..5381d88ed7f52 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -250,8 +250,8 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
; X86-NEXT: call void (...) @foo() #[[ATTR2]]
; X86-NEXT: call void (...) @bar() #[[ATTR2]]
; X86-NEXT: [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT: [[TMP0:%.*]] = alloca { i8, i8, i8 }
-; X86-NEXT: store { i8, i8, i8 } { i8 100, i8 3, i8 -56 }, ptr [[TMP0]], align 1
+; X86-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>
+; X86-NEXT: store <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>, ptr [[TMP0]], align 1
; X86-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
; X86-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; X86-NEXT: br label [[LAND_END:%.*]]
>From d15a2ce0e6f6d850f67a205eee6ffbfa7f63c50b Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 19:19:36 +0100
Subject: [PATCH 13/23] [MergeICmps] Refactored how cmp-instructions are stored
per block
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 34 +++++++----------------
1 file changed, 10 insertions(+), 24 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 690ad4d26d8ef..1626d35fb65d2 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -132,10 +132,14 @@ class BaseIdentifier {
DenseMap<const Value*, int> BaseToIndex;
};
+
+// All Instructions related to a comparison.
+typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
+
// If this value is a load from a constant offset w.r.t. a base address, and
// there are no other users of the load or address, returns the base address and
// the offset.
-BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
+BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, InstructionSet* BlockInsts) {
auto *const LoadI = dyn_cast<LoadInst>(Val);
if (!LoadI)
return {};
@@ -174,11 +178,12 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId) {
if (!GEP->accumulateConstantOffset(DL, Offset))
return {};
Base = GEP->getPointerOperand();
+ BlockInsts->insert(GEP);
}
+ BlockInsts->insert(LoadI);
return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
}
-typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
struct Comparison {
public:
@@ -200,7 +205,6 @@ struct Comparison {
virtual ~Comparison() = default;
virtual LoadOperands getLoads() = 0;
- virtual InstructionSet getInsts() = 0;
bool areContiguous(const Comparison& Other) const;
bool operator<(const Comparison &Other) const;
};
@@ -226,13 +230,6 @@ struct BCEConstCmp : public Comparison {
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,std::nullopt);
}
- InstructionSet getInsts() override {
- InstructionSet BlockInsts{CmpI,Lhs.LoadI};
- if (Lhs.GEP)
- BlockInsts.insert(Lhs.GEP);
- return BlockInsts;
- }
-
};
// A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -256,14 +253,6 @@ struct BCECmp : public Comparison {
Comparison::LoadOperands getLoads() override {
return std::make_pair(&Lhs,&Rhs);
}
- InstructionSet getInsts() override {
- InstructionSet BlockInsts{CmpI, Lhs.LoadI, Rhs.LoadI};
- if (Lhs.GEP)
- BlockInsts.insert(Lhs.GEP);
- if (Rhs.GEP)
- BlockInsts.insert(Rhs.GEP);
- return BlockInsts;
- }
};
// TODO: this can be improved to take alignment into account.
@@ -455,7 +444,7 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
<< (ExpectedPredicate == ICmpInst::ICMP_EQ ? "eq" : "ne")
<< "\n");
// First operand is always a load
- auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId);
+ auto Lhs = visitICmpLoadOperand(CmpI->getOperand(0), BaseId, BlockInsts);
if (!Lhs.BaseId)
return std::nullopt;
@@ -465,10 +454,11 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
const auto &DL = CmpI->getDataLayout();
int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
+ BlockInsts->insert(CmpI);
if (auto const& Const = dyn_cast<Constant>(RhsOperand))
return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
- auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId);
+ auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
if (!Rhs.BaseId)
return std::nullopt;
return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
@@ -553,10 +543,6 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
if (!Result)
return std::nullopt;
- for (auto* Cmp : Result->getCmpChain()) {
- auto CmpInsts = Cmp->getInsts();
- BlockInsts.insert(CmpInsts.begin(), CmpInsts.end());
- }
BlockInsts.insert(BranchI);
return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
}
>From e5d3e57bfe487e45d4c3feaaf43c9c05e05ae516 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 21:57:31 +0100
Subject: [PATCH 14/23] [MergeICmps] Use shared-ptr to avoid leaking memory
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 37 +++++++++++++----------
1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 1626d35fb65d2..67def6b0f09da 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -288,15 +288,16 @@ bool Comparison::operator<(const Comparison& Other) const {
// Represents multiple comparisons inside of a single basic block.
// This happens if multiple basic blocks have previously been merged into a single using a select node.
class IntraCmpChain {
- std::vector<Comparison*> CmpChain;
+ // TODO: this could probably be a unique-ptr but current impl relies on some copies
+ std::vector<std::shared_ptr<Comparison>> CmpChain;
public:
- IntraCmpChain(Comparison* C) : CmpChain{C} {}
+ IntraCmpChain(std::shared_ptr<Comparison> C) : CmpChain{C} {}
IntraCmpChain combine(const IntraCmpChain OtherChain) {
CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
return *this;
}
- std::vector<Comparison*> getCmpChain() const {
+ std::vector<std::shared_ptr<Comparison>> getCmpChain() const {
return CmpChain;
}
};
@@ -305,10 +306,10 @@ class IntraCmpChain {
// A basic block that contains one or more comparisons.
class MultBCECmpBlock {
public:
- MultBCECmpBlock(std::vector<Comparison*> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+ MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
: BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
- std::vector<Comparison*> getCmps() {
+ std::vector<std::shared_ptr<Comparison>> getCmps() {
return Cmps;
}
@@ -331,7 +332,7 @@ class MultBCECmpBlock {
InstructionSet BlockInsts;
private:
- std::vector<Comparison*> Cmps;
+ std::vector<std::shared_ptr<Comparison>> Cmps;
};
// A basic block with single a comparison between two BCE atoms.
@@ -342,13 +343,13 @@ class MultBCECmpBlock {
class SingleBCECmpBlock {
public:
SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
- : BB(M.BB), OrigOrder(OrigOrder), Cmp(M.getCmps()[I]) {}
+ : BB(M.BB), OrigOrder(OrigOrder), Cmp(std::move(M.getCmps()[I])) {}
SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
- : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(M.getCmps()[I]), SplitInsts(SplitInsts) {}
+ : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(M.getCmps()[I])), SplitInsts(SplitInsts) {}
const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
- const Comparison* getCmp() const { return Cmp; }
+ const Comparison* getCmp() const { return Cmp.get(); }
bool operator<(const SingleBCECmpBlock &O) const {
return *Cmp < *O.Cmp;
@@ -367,7 +368,7 @@ class SingleBCECmpBlock {
bool RequireSplit = false;
private:
- Comparison* Cmp;
+ std::shared_ptr<Comparison> Cmp;
llvm::SmallVector<Instruction *, 4> SplitInsts;
};
@@ -382,7 +383,7 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
};
- for (auto* Cmp : Cmps) {
+ for (auto& Cmp : Cmps) {
auto [Lhs,Rhs] = Cmp->getLoads();
if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
return false;
@@ -426,7 +427,7 @@ bool MultBCECmpBlock::doesOtherWork() const {
// Visit the given comparison. If this is a comparison between two valid
// BCE atoms, or between a BCE atom and a constant, returns the comparison.
-std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
+std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
const ICmpInst::Predicate ExpectedPredicate,
BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
// The comparison can only be used once:
@@ -456,12 +457,12 @@ std::optional<Comparison*> visitICmp(const ICmpInst *const CmpI,
BlockInsts->insert(CmpI);
if (auto const& Const = dyn_cast<Constant>(RhsOperand))
- return new BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI);
+ return std::make_shared<BCEConstCmp>(BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
if (!Rhs.BaseId)
return std::nullopt;
- return new BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI);
+ return std::make_shared<BCECmp>(BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
}
// Chain of comparisons inside a single basic block connected using `select` nodes.
@@ -494,8 +495,12 @@ std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
std::optional<IntraCmpChain> visitComparison(Value *Cond,
ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
- if (auto *CmpI = dyn_cast<ICmpInst>(Cond))
- return visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+ if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) {
+ auto CmpVisit = visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
+ if (!CmpVisit)
+ return std::nullopt;
+ return IntraCmpChain(*CmpVisit);
+ }
if (auto *SelectI = dyn_cast<SelectInst>(Cond))
return visitSelect(SelectI, ExpectedPredicate, BaseId, BlockInsts);
>From b5b557c735d7b602d2b7c9309fbf46daf4636725 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Fri, 21 Mar 2025 23:01:38 +0100
Subject: [PATCH 15/23] [MergeICmps] Reduced copies for mergeBlocks
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 41 +++++++++--------------
1 file changed, 16 insertions(+), 25 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 67def6b0f09da..447c909f84aec 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -625,36 +625,34 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
/// ranges that can be merged together into a single comparison.
-static std::vector<BCECmpChain::ContiguousBlocks>
-mergeBlocks(std::vector<SingleBCECmpBlock> &&Blocks) {
- std::vector<BCECmpChain::ContiguousBlocks> MergedBlocks;
-
+template<class RandomIt>
+static void mergeBlocks(RandomIt First, RandomIt Last,
+ std::vector<BCECmpChain::ContiguousBlocks>* MergedBlocks) {
// Sort to detect continuous offsets.
- llvm::sort(Blocks,
+ llvm::sort(First, Last,
[](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
return LhsBlock < RhsBlock;
});
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
- for (SingleBCECmpBlock &Block : Blocks) {
- if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*Block.getCmp())) {
- MergedBlocks.emplace_back();
- LastMergedBlock = &MergedBlocks.back();
+ int Offset = MergedBlocks->size();
+ for (auto& BlockIt = First; BlockIt != Last; ++BlockIt) {
+ if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
+ MergedBlocks->emplace_back();
+ LastMergedBlock = &MergedBlocks->back();
} else {
- LLVM_DEBUG(dbgs() << "Merging block " << Block.BB->getName() << " into "
+ LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName() << " into "
<< LastMergedBlock->back().BB->getName() << "\n");
}
- LastMergedBlock->push_back(std::move(Block));
+ LastMergedBlock->push_back(std::move(*BlockIt));
}
// While we allow reordering for merging, do not reorder unmerged comparisons.
// Doing so may introduce branch on poison.
- llvm::sort(MergedBlocks, [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
+ llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(), [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
const BCECmpChain::ContiguousBlocks &RhsBlocks) {
return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
});
-
- return MergedBlocks;
}
BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
@@ -737,19 +735,12 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
EntryBlock_ = Comparisons[0].BB;
- std::vector<SingleBCECmpBlock> ConstComparisons, BceComparisons;
auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
- // TODO: too many copies here
- std::partition_copy(Comparisons.begin(), Comparisons.end(),
- std::back_inserter(ConstComparisons),
- std::back_inserter(BceComparisons),
- isConstCmp);
-
- auto MergedConstCmpBlocks = mergeBlocks(std::move(ConstComparisons));
- auto MergedBCECmpBlocks = mergeBlocks(std::move(BceComparisons));
+ auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
- MergedBlocks_.insert(MergedBlocks_.end(),MergedBCECmpBlocks.begin(),MergedBCECmpBlocks.end());
- MergedBlocks_.insert(MergedBlocks_.end(),MergedConstCmpBlocks.begin(),MergedConstCmpBlocks.end());
+ // this will order the merged BCE-comparisons before the BCE-const-comparisons
+ mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+ mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
}
namespace {
>From 31ea42eb04db0ed0d2f346b364d0517ddcdbd997 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 17:42:11 +0100
Subject: [PATCH 16/23] [MergeICmps] Don't split up select blocks if they
aren't merged in the cmp-chain
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 200 +++++++++++-------
.../MergeICmps/X86/entry-block-shuffled.ll | 16 +-
.../X86/not-split-unmerged-select.ll | 53 +----
.../MergeICmps/X86/partial-select-merge.ll | 165 +++++----------
4 files changed, 203 insertions(+), 231 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 447c909f84aec..5943d717276c4 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -602,10 +602,13 @@ class BCECmpChain {
bool simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
DomTreeUpdater &DTU);
+ bool multBlockOnlyPartiallyMerged();
+
bool atLeastOneMerged() const {
return any_of(MergedBlocks_,
[](const auto &Blocks) { return Blocks.size() > 1; });
- }
+ };
+
private:
PHINode &Phi_;
@@ -616,6 +619,25 @@ class BCECmpChain {
BasicBlock *EntryBlock_;
};
+
+// Returns true if a merge in the chain depends on a basic block where not every comparison is merged.
+// NOTE: This is pretty restrictive and could potentially be handled using an improved tradeoff heuristic.
+bool BCECmpChain::multBlockOnlyPartiallyMerged() {
+ llvm::SmallDenseSet<const BasicBlock*, 8> UnmergedBlocks, MergedBB;
+
+ for (auto& Merged : MergedBlocks_) {
+ if (Merged.size() == 1) {
+ UnmergedBlocks.insert(Merged[0].BB);
+ continue;
+ }
+ for (auto& C : Merged)
+ MergedBB.insert(C.BB);
+ }
+ return llvm::any_of(MergedBB, [&](const BasicBlock* BB){
+ return UnmergedBlocks.contains(BB);
+ });
+}
+
static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
unsigned MinOrigOrder = std::numeric_limits<unsigned>::max();
for (const SingleBCECmpBlock &Block : Blocks)
@@ -655,6 +677,7 @@ static void mergeBlocks(RandomIt First, RandomIt Last,
});
}
+
BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
AliasAnalysis &AA)
: Phi_(Phi) {
@@ -796,14 +819,41 @@ class MergedBlockName {
};
} // namespace
+
+void updateBranching(Value* CondResult,
+ IRBuilder<>& Builder,
+ BasicBlock *BB,
+ BasicBlock *const NextCmpBlock,
+ PHINode &Phi,
+ LLVMContext &Context,
+ const TargetLibraryInfo &TLI,
+ AliasAnalysis &AA, DomTreeUpdater &DTU) {
+ BasicBlock *const PhiBB = Phi.getParent();
+ // Add a branch to the next basic block in the chain.
+ if (NextCmpBlock == PhiBB) {
+ // Continue to phi, passing it the comparison result.
+ Builder.CreateBr(PhiBB);
+ Phi.addIncoming(CondResult, BB);
+ DTU.applyUpdates({{DominatorTree::Insert, BB, PhiBB}});
+ } else {
+ // Continue to next block if equal, exit to phi else.
+ Builder.CreateCondBr(CondResult, NextCmpBlock, PhiBB);
+ Phi.addIncoming(ConstantInt::getFalse(Context), BB);
+ DTU.applyUpdates({{DominatorTree::Insert, BB, NextCmpBlock},
+ {DominatorTree::Insert, BB, PhiBB}});
+ }
+}
+
+
// Merges the given contiguous comparison blocks into one memcmp block.
static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
BasicBlock *const InsertBefore,
BasicBlock *const NextCmpBlock,
- PHINode &Phi, const TargetLibraryInfo &TLI,
+ PHINode &Phi,
+ LLVMContext &Context,
+ const TargetLibraryInfo &TLI,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
- assert(!Comparisons.empty() && "merging zero comparisons");
- LLVMContext &Context = NextCmpBlock->getContext();
+ assert(Comparisons.size() > 1 && "merging multiple comparisons");
const SingleBCECmpBlock &FirstCmp = Comparisons[0];
// Create a new cmp block before next cmp block.
@@ -818,92 +868,81 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
else
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
+
// Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
- if (Comparisons.size() > 1) {
- std::vector<Constant*> Constants;
- std::vector<Type*> Types;
- for (const auto& BceBlock : Comparisons) {
- auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
- Constants.emplace_back(ConstCmp->Const);
- Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
- }
- // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
- auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
- auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
- auto *StructConstant = ConstantStruct::get(StructType, Constants);
- Builder.CreateStore(StructConstant, StructAlloca);
- Rhs = StructAlloca;
+ std::vector<Constant*> Constants;
+ std::vector<Type*> Types;
+ for (const auto& BceBlock : Comparisons) {
+ auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+ Constants.emplace_back(ConstCmp->Const);
+ Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
}
- } else if (auto* FirstBceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
+ // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
+ auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+ auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+ auto *StructConstant = ConstantStruct::get(StructType, Constants);
+ Builder.CreateStore(StructConstant, StructAlloca);
+ Rhs = StructAlloca;
+ } else {
+ auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
if (FirstBceCmp->Rhs.GEP)
Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
else
Rhs = FirstBceCmp->Rhs.LoadI->getPointerOperand();
}
- Value *IsEqual = nullptr;
LLVM_DEBUG(dbgs() << "Merging " << Comparisons.size() << " comparisons -> "
<< BB->getName() << "\n");
// If there is one block that requires splitting, we do it now, i.e.
// just before we know we will collapse the chain. The instructions
// can be executed before any of the instructions in the chain.
- const auto ToSplit = llvm::find_if(
+ const auto* ToSplit = llvm::find_if(
Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
if (ToSplit != Comparisons.end()) {
LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
ToSplit->split(BB, AA);
}
- if (Comparisons.size() == 1) {
- LLVM_DEBUG(dbgs() << "Only one comparison, updating branches\n");
- // Use clone to keep the metadata
- Instruction *const LhsLoad = Builder.Insert(FirstCmp.Lhs()->LoadI->clone());
- LhsLoad->replaceUsesOfWith(LhsLoad->getOperand(0), Lhs);
- // There are no blocks to merge, just do the comparison.
- if (auto* ConstCmp = dyn_cast<BCEConstCmp>(FirstCmp.getCmp()))
- IsEqual = Builder.CreateICmpEQ(LhsLoad, ConstCmp->Const);
- else if (const auto& BceCmp = dyn_cast<BCECmp>(FirstCmp.getCmp())) {
- Instruction *const RhsLoad = Builder.Insert(BceCmp->Rhs.LoadI->clone());
- RhsLoad->replaceUsesOfWith(cast<Instruction>(RhsLoad)->getOperand(0), Rhs);
- IsEqual = Builder.CreateICmpEQ(LhsLoad, RhsLoad);
- }
- } else {
- // memcmp expects a 'size_t' argument and returns 'int'.
- unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
- unsigned IntBits = TLI.getIntSize();
- const unsigned TotalSizeBits = std::accumulate(
- Comparisons.begin(), Comparisons.end(), 0u,
- [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
-
-
- // Create memcmp() == 0.
- const auto &DL = Phi.getDataLayout();
- Value *const MemCmpCall = emitMemCmp(
- Lhs, Rhs,
- ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
- Builder, DL, &TLI);
- IsEqual = Builder.CreateICmpEQ(
- MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
- }
-
- BasicBlock *const PhiBB = Phi.getParent();
- // Add a branch to the next basic block in the chain.
- if (NextCmpBlock == PhiBB) {
- // Continue to phi, passing it the comparison result.
- Builder.CreateBr(PhiBB);
- Phi.addIncoming(IsEqual, BB);
- DTU.applyUpdates({{DominatorTree::Insert, BB, PhiBB}});
- } else {
- // Continue to next block if equal, exit to phi else.
- Builder.CreateCondBr(IsEqual, NextCmpBlock, PhiBB);
- Phi.addIncoming(ConstantInt::getFalse(Context), BB);
- DTU.applyUpdates({{DominatorTree::Insert, BB, NextCmpBlock},
- {DominatorTree::Insert, BB, PhiBB}});
- }
+ // memcmp expects a 'size_t' argument and returns 'int'.
+ unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
+ unsigned IntBits = TLI.getIntSize();
+ const unsigned TotalSizeBits = std::accumulate(
+ Comparisons.begin(), Comparisons.end(), 0u,
+ [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+
+ // Create memcmp() == 0.
+ const auto &DL = Phi.getDataLayout();
+ Value *const MemCmpCall = emitMemCmp(
+ Lhs, Rhs,
+ ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
+ Builder, DL, &TLI);
+ Value* IsEqual = Builder.CreateICmpEQ(
+ MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
+
+ updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
return BB;
}
+// Keep existing block if it isn't merged. Only change the branches.
+// Also handles not splitting mult-blocks that use select instructions.
+static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
+ BasicBlock *const InsertBefore,
+ BasicBlock *const NextCmpBlock,
+ PHINode &Phi,
+ LLVMContext &Context,
+ const TargetLibraryInfo &TLI,
+ AliasAnalysis &AA, DomTreeUpdater &DTU) {
+ BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
+ NextCmpBlock->getParent(), InsertBefore);
+ // Transfer all instructions except the branching terminator to the new block.
+ MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
+ Value* CondResult = cast<Value>(&MultBB->back());
+ IRBuilder<> Builder(MultBB);
+ updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+ return MultBB;
+}
+
bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
DomTreeUpdater &DTU) {
assert(atLeastOneMerged() && "simplifying trivial BCECmpChain");
@@ -914,9 +953,23 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
// so that the next block is always available to branch to.
BasicBlock *InsertBefore = EntryBlock_;
BasicBlock *NextCmpBlock = Phi_.getParent();
- for (const auto &Blocks : reverse(MergedBlocks_)) {
- InsertBefore = NextCmpBlock = mergeComparisons(
- Blocks, InsertBefore, NextCmpBlock, Phi_, TLI, AA, DTU);
+ SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
+ LLVMContext &Context = NextCmpBlock->getContext();
+ for (const auto &Cmps : reverse(MergedBlocks_)) {
+ // TODO: Check if single comparisons should also be split!
+ // If there is only a single comparison then nothing should be merged and can use original block.
+ if (Cmps.size() == 1) {
+ // If a comparison from a mult-block is already handled then don't emit same block again.
+ BasicBlock *const BB = Cmps[0].BB;
+ if (ExistingBlocksToKeep.contains(BB))
+ continue;
+ ExistingBlocksToKeep.insert(BB);
+ InsertBefore = NextCmpBlock = updateOriginalBlock(
+ BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+ } else {
+ InsertBefore = NextCmpBlock = mergeComparisons(
+ Cmps, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+ }
}
// Replace the original cmp chain with the new cmp chain by pointing all
@@ -947,7 +1000,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
SmallVector<BasicBlock *, 16> DeadBlocks;
for (const auto &Blocks : MergedBlocks_) {
for (const SingleBCECmpBlock &Block : Blocks) {
- // Many single blocks can refer to the same multblock coming from an select instruction
+ // Many single blocks can refer to the same multblock coming from an select instruction.
// TODO: preferrably use a set instead
if (llvm::is_contained(DeadBlocks, Block.BB))
continue;
@@ -1069,6 +1122,11 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
return false;
}
+ if (CmpChain.multBlockOnlyPartiallyMerged()) {
+ LLVM_DEBUG(dbgs() << "chain uses not fully merged basic block, no merge\n");
+ return false;
+ }
+
return CmpChain.simplify(TLI, AA, DTU);
}
diff --git a/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll b/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
index bc6beefb2caee..65156697f1892 100644
--- a/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/entry-block-shuffled.ll
@@ -11,10 +11,10 @@ define zeroext i1 @opeq1(
; CHECK-LABEL: @opeq1(
; CHECK-NEXT: entry2:
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [[S:%.*]], ptr [[A:%.*]], i64 0, i32 3
-; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds [[S]], ptr [[B:%.*]], i64 0, i32 2
-; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[TMP0]], align 4
-; CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP1]], align 4
-; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP2]], [[TMP3]]
+; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[TMP0]], align 4
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds [[S]], ptr [[B:%.*]], i64 0, i32 2
+; CHECK-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4
+; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP1]], [[TMP3]]
; CHECK-NEXT: br i1 [[TMP4]], label %"land.rhs.i+land.rhs.i.2", label [[OPEQ1_EXIT:%.*]]
; CHECK: "land.rhs.i+land.rhs.i.2":
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A]], ptr [[B]], i64 8)
@@ -22,10 +22,10 @@ define zeroext i1 @opeq1(
; CHECK-NEXT: br i1 [[TMP5]], label [[LAND_RHS_I_31:%.*]], label [[OPEQ1_EXIT]]
; CHECK: land.rhs.i.31:
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [[S]], ptr [[A]], i64 0, i32 3
-; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [[S]], ptr [[B]], i64 0, i32 3
-; CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr [[TMP6]], align 4
-; CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP7]], align 4
-; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[TMP8]], [[TMP9]]
+; CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
+; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[S]], ptr [[B]], i64 0, i32 3
+; CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP8]], align 4
+; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i32 [[TMP7]], [[TMP9]]
; CHECK-NEXT: br label [[OPEQ1_EXIT]]
; CHECK: opeq1.exit:
; CHECK-NEXT: [[TMP11:%.*]] = phi i1 [ [[TMP10]], [[LAND_RHS_I_31]] ], [ false, %"land.rhs.i+land.rhs.i.2" ], [ false, [[ENTRY2:%.*]] ]
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index 874ea22e75106..582b57d8c60ce 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,5 +1,4 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
@@ -86,26 +85,23 @@ land.end: ; preds = %land.rhs, %land.lhs
ret i1 %7
}
-; p[12] and p[13] mergable, select blocks are split even though they aren't merged. simplifycfg merges them back.
-; NOTE: Ideally wouldn't always split and thus not rely on simplifycfg.
+; p[12] and p[13] mergable, select mult-block is part of the chain but isn't merged and won't get split up into its single comparisons.
define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
; REG-LABEL: @partial_merge_not_select(
-; REG: entry5:
+; REG: entry3:
; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_4:%.*]], label [[LAND_END]]
-; REG: entry4:
; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_3:%.*]], label [[LAND_END:%.*]]
-; REG: entry3:
; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END]]
+; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; REG: "land.lhs.true11+land.rhs":
; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
; REG-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
@@ -124,42 +120,9 @@ define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnul
; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
; REG-NEXT: br label [[LAND_END]]
; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, [[ENTRY_3]] ], [ false, [[ENTRY_4]] ], [ false, %entry5 ]
+; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
; REG-NEXT: ret i1 [[RES]]
;
-; CFG-LABEL: @partial_merge_not_select(
-; CFG: entry5:
-; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; CFG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; CFG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; CFG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; CFG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
-; CFG: "land.lhs.true11+land.rhs":
-; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; CFG-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; CFG-NEXT: store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
-; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
-; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; CFG-NEXT: [[SEL2:%.*]] = select i1 [[CMP3]], i1 [[CMP4]], i1 false
-; CFG-NEXT: br i1 [[SEL2]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; CFG: land.lhs.true211:
-; CFG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; CFG-NEXT: br label [[LAND_END]]
-; CFG: land.end:
-; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry5 ]
-; CFG-NEXT: ret i1 [[RES]]
entry:
%arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
%0 = load i8, ptr %arrayidx, align 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 20a3faa854836..317a3a1464536 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -1,64 +1,49 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,simplifycfg' -verify-dom-info -S | FileCheck %s --check-prefix=CFG
-; REG checks the IR when only mergeicmps is run.
-; CFG checks the IR when simplifycfg is run afterwards to merge distinct blocks back together.
-
-; Can merge part of a select block even if not entire block mergable.
+; Cannot merge only part of a select block if not entire block mergable.
define zeroext i1 @cmp_partially_mergable_select(
ptr nocapture readonly align 4 dereferenceable(24) %a,
ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
; REG-LABEL: @cmp_partially_mergable_select(
-; REG: "land.lhs.true+land.rhs+land.lhs.true4":
-; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
-; REG: land.lhs.true103:
-; REG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; REG-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
-; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
-; REG-NEXT: br i1 [[CMP2]], label [[ENTRY2:%.*]], label [[LAND_END]]
-; REG: entry2:
-; REG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; REG-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
-; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
-; REG: land.lhs.true41:
-; REG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; REG-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
-; REG-NEXT: br label %land.end
+; REG: entry:
+; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; REG-NEXT: [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
+; REG-NEXT: br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; REG: land.lhs.true:
+; REG-NEXT: [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; REG-NEXT: [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
+; REG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
+; REG: land.lhs.true4:
+; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
+; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
+; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
+; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
+; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; REG-NEXT: [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
+; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
+; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; REG-NEXT: br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
+; REG: land.lhs.true10:
+; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
+; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; REG-NEXT: [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
+; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
+; REG-NEXT: br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG: land.rhs:
+; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
+; REG-NEXT: [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
+; REG-NEXT: [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
+; REG-NEXT: [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
+; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
+; REG-NEXT: br label [[LAND_END]]
; REG: land.end:
-; REG-NEXT: [[TMP8:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[ENTRY2]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
-; REG-NEXT: ret i1 [[TMP8]]
-;
-; CFG-LABEL: @cmp_partially_mergable_select(
-; CFG: "land.lhs.true+land.rhs+land.lhs.true4":
-; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
-; CFG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_103:%.*]], label [[LAND_END:%.*]]
-; CFG: land.lhs.true103:
-; CFG-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; CFG-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[TMP0]], align 4
-; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP1]], align 4
-; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], [[TMP3]]
-; CFG-NEXT: [[TMP4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CFG-NEXT: [[TMP5:%.*]] = load i32, ptr [[TMP4]], align 4
-; CFG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 255
-; CFG-NEXT: [[SEL:%.*]] = select i1 %5, i1 %8, i1 false
-; CFG-NEXT: br i1 [[SEL]], label [[LAND_LHS_41:%.*]], label [[LAND_END]]
-; CFG: land.lhs.true41:
-; CFG-NEXT: [[TMP6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; CFG-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP6]], align 4
-; CFG-NEXT: [[CMP4:%.*]] = icmp eq i32 [[TMP7]], 100
-; CFG-NEXT: br label %land.end
-; CFG: land.end:
-; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP4]], [[LAND_LHS_41]] ], [ false, [[LAND_LHS_103]] ], [ false, %"land.lhs.true+land.rhs+land.lhs.true4" ]
-; CFG-NEXT: ret i1 [[RES]]
+; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
+; REG-NEXT: ret i1 [[RES]]
;
entry:
%e = getelementptr inbounds nuw i8, ptr %a, i64 8
@@ -106,82 +91,48 @@ land.end: ; preds = %land.rhs, %land.lhs
}
-; p[12] and p[13] are mergable. p[12] is inside of a select block which will be split up.
-; MergeICmps always splits up matching select blocks. The following simplifycfg pass merges them back together.
+; p[12] and p[13] are mergable. p[12] is inside of a select block which will not be split up, so it shouldn't merge them.
define dso_local zeroext i1 @cmp_partially_mergable_select_array(
ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
; REG-LABEL: @cmp_partially_mergable_select_array(
-; REG: "entry+land.rhs":
+; REG: entry:
; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8 }>
-; REG-NEXT: store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
-; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT: br i1 [[CMP0]], label [[ENTRY_5:%.*]], label [[LAND_END:%.*]]
-; REG: entry5:
+; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT: br i1 [[CMP1]], label [[ENTRY_4:%.*]], label [[LAND_END:%.*]]
-; REG: entry4:
; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT: br i1 [[CMP2]], label [[LAND_LHS_113:%.*]], label [[LAND_END]]
-; REG: land.lhs.true113:
+; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
+; REG: land.lhs.true11:
; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_162:%.*]], label [[LAND_END]]
-; REG: land.lhs.true162:
+; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
+; REG: land.lhs.true16:
; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
-; REG: land.lhs.true211:
+; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
+; REG: land.lhs.true21:
; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; REG-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; REG: land.rhs:
+; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
+; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; REG-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
; REG-NEXT: br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, [[LAND_LHS_162]] ], [ false, [[LAND_LHS_113]] ], [ false, [[ENTRY_4]] ], [ false, [[ENTRY_5]] ], [ false, %"entry+land.rhs" ]
+; REG: land.end:
+; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
; REG-NEXT: ret i1 [[RES]]
;
-;
-; CFG-LABEL: @cmp_partially_mergable_select_array(
-; CFG: "entry+land.rhs":
-; CFG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; CFG-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8 }>
-; CFG-NEXT: store <{ i8, i8 }> <{ i8 -1, i8 9 }>, ptr [[TMP0]], align 1
-; CFG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX0]], ptr [[TMP0]], i64 2)
-; CFG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CFG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; CFG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; CFG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1:%.*]], -56
-; CFG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; CFG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; CFG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; CFG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; CFG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; CFG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; CFG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; CFG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; CFG-NEXT: [[SEL2:%.*]] = select i1 [[SEL1]], i1 [[CMP3]], i1 false
-; CFG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CFG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; CFG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; CFG-NEXT: [[SEL3:%.*]] = select i1 [[SEL2]], i1 [[CMP4]], i1 false
-; CFG-NEXT: br i1 [[SEL3]], label [[LAND_LHS_211:%.*]], label [[LAND_END]]
-; CFG: land.lhs.true211:
-; CFG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; CFG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; CFG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; CFG-NEXT: br label [[LAND_END]]
-; CFG: land.end:
-; CFG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_211]] ], [ false, %"entry+land.rhs" ]
-; CFG-NEXT: ret i1 [[RES]]
-;
entry:
%arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
%0 = load i8, ptr %arrayidx, align 1
>From ebf207543d463f2875ecddb032fa9d24a87a8345 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 21:03:44 +0100
Subject: [PATCH 17/23] [MergeICmps] Ensure cmp-chains that require splitting
come first
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 22 ++-
.../MergeICmps/X86/mixed-cmp-split.ll | 175 ++++++++++++++++++
2 files changed, 190 insertions(+), 7 deletions(-)
create mode 100644 llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 5943d717276c4..a5d2895c9f0e1 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -320,7 +320,7 @@ class MultBCECmpBlock {
// instructions in the block.
bool canSplit(AliasAnalysis &AA) const;
- // Return true if this all the relevant instructions in the BCE-cmp-block can
+ // Return true if all the relevant instructions in the BCE-cmp-block can
// be sunk below this instruction. By doing this, we know we can separate the
// BCE-cmp-block instructions from the non-BCE-cmp-block instructions in the
// block.
@@ -761,9 +761,16 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
- // this will order the merged BCE-comparisons before the BCE-const-comparisons
- mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
- mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+ // The chain that requires splitting should always be first.
+ // If no chain requires splitting then defaults to BCE-comparisons coming first.
+ if (std::any_of(Comparisons.begin(), BceIt,
+ [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
+ mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+ mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+ } else {
+ mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
+ mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
+ }
}
namespace {
@@ -956,10 +963,11 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
LLVMContext &Context = NextCmpBlock->getContext();
for (const auto &Cmps : reverse(MergedBlocks_)) {
- // TODO: Check if single comparisons should also be split!
- // If there is only a single comparison then nothing should be merged and can use original block.
+ // If there is only a single comparison then nothing should
+ // be merged and can use original block.
if (Cmps.size() == 1) {
- // If a comparison from a mult-block is already handled then don't emit same block again.
+ // If a comparison from a mult-block is already handled
+ // then don't emit same block again.
BasicBlock *const BB = Cmps[0].BB;
if (ExistingBlocksToKeep.contains(BB))
continue;
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
new file mode 100644
index 0000000000000..61fdd2b7e17e9
--- /dev/null
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -0,0 +1,175 @@
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+
+; define dso_local noundef zeroext i1 @cmp_mixed_split(ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %a, ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %b) local_unnamed_addr {
+; entry:
+; %0 = load i32, ptr %a, align 4
+; %1 = load i32, ptr %b, align 4
+; %cmp = icmp eq i32 %0, %1
+; br i1 %cmp, label %land.lhs.true, label %land.end
+;
+; land.lhs.true: ; preds = %entry
+; %e = getelementptr inbounds nuw i8, ptr %a, i64 20
+; %2 = load i32, ptr %e, align 4
+; %a3 = getelementptr inbounds nuw i8, ptr %b, i64 4
+; %3 = load i32, ptr %a3, align 4
+; %b2 = getelementptr inbounds nuw i8, ptr %a, i64 8
+; %4 = load i32, ptr %b2, align 4
+; %c = getelementptr inbounds nuw i8, ptr %a, i64 12
+; %5 = load i8, ptr %c, align 4
+; %a1 = getelementptr inbounds nuw i8, ptr %a, i64 4
+; %6 = load i32, ptr %a1, align 4
+; %d = getelementptr inbounds nuw i8, ptr %a, i64 16
+; %7 = load i32, ptr %d, align 4
+; %cmp5 = icmp eq i32 %6, %3
+; %cmp7 = icmp eq i8 %5, 43
+; %or.cond = select i1 %cmp5, i1 %cmp7, i1 false
+; %cmp9 = icmp eq i32 %4, 1
+; %or.cond13 = select i1 %or.cond, i1 %cmp9, i1 false
+; %cmp11 = icmp eq i32 %7, 12
+; %or.cond14 = select i1 %or.cond13, i1 %cmp11, i1 false
+; %cmp12 = icmp eq i32 %2, 3
+; %spec.select = select i1 %or.cond14, i1 %cmp12, i1 false
+; br label %land.end
+;
+; land.end: ; preds = %land.lhs.true, %entry
+; %8 = phi i1 [ false, %entry ], [ %spec.select, %land.lhs.true ]
+; ret i1 %8
+; }
+
+
+
+
+declare void @foo(...)
+
+; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.
+
+define dso_local noundef zeroext i1 @cmp_mixed_const_first(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed_const_first(
+; This merged-block should come first as it should be split.
+; CHECK: "entry+land.rhs+land.lhs.true8":
+; CHECK-NEXT: call void (...) @foo() #[[ATTR2:[0-9]+]]
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
+; CHECK-NEXT: [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP0]], 0
+; CHECK-NEXT: br i1 [[CMP0]], label [[LAND_LHS_TRUE10:%.*]], label [[LAND_END:%.*]]
+; CHECK: "land.lhs.true+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT: [[MEMCMP1:%.*]] = call i32 @memcmp(ptr [[A]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP1]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ [[CMP1]], [[LAND_LHS_TRUE10]] ], [ false, [[ENTRY_LAND_RHS:%.*]] ]
+; CHECK-NEXT: ret i1 [[RES]]
+;
+entry:
+ %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %0 = load i32, ptr %e, align 4
+ %cmp = icmp eq i32 %0, 255
+ call void (...) @foo() inaccessiblememonly
+ br i1 %cmp, label %land.lhs.true, label %land.end
+
+land.lhs.true: ; preds = %entry
+ %1 = load i32, ptr %a, align 4
+ %2 = load i32, ptr %b, align 4
+ %cmp3 = icmp eq i32 %1, %2
+ br i1 %cmp3, label %land.lhs.true4, label %land.end
+
+land.lhs.true4: ; preds = %land.lhs.true
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+ %3 = load i8, ptr %c, align 1
+ %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+ %4 = load i8, ptr %c5, align 1
+ %cmp7 = icmp eq i8 %3, %4
+ br i1 %cmp7, label %land.lhs.true8, label %land.end
+
+land.lhs.true8: ; preds = %land.lhs.true4
+ %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+ %5 = load i32, ptr %g, align 4
+ %cmp9 = icmp eq i32 %5, 100
+ br i1 %cmp9, label %land.lhs.true10, label %land.end
+
+land.lhs.true10: ; preds = %land.lhs.true8
+ %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %6 = load i8, ptr %b11, align 4
+ %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+ %7 = load i8, ptr %b13, align 4
+ %cmp15 = icmp eq i8 %6, %7
+ br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true10
+ %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+ %8 = load i32, ptr %f, align 4
+ %cmp16 = icmp eq i32 %8, 200
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true8, %land.lhs.true4, %land.lhs.true, %entry
+ %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true8 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+ ret i1 %9
+}
+
+; If block to split it in BCE-comparison that that block should be first.
+
+define dso_local noundef zeroext i1 @cmp_mixed_bce_first(
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
+ ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
+; CHECK-LABEL: @cmp_mixed_bce_first(
+; CHECK: "entry+land.lhs.true10+land.lhs.true4":
+; CHECK-NEXT: call void (...) @foo() #[[ATTR2:[0-9]+]]
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[A:%.*]], ptr [[B:%.*]], i64 6)
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[CMP1]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; CHECK: "land.lhs.true+land.rhs+land.lhs.true4":
+; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
+; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
+; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[TMP4:%.*]] = phi i1 [ [[CMP2]], [[LAND_LHS_TRUE]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret i1 [[TMP4]]
+;
+entry:
+ %0 = load i32, ptr %a, align 4
+ %1 = load i32, ptr %b, align 4
+ call void (...) @foo() inaccessiblememonly
+ %cmp3 = icmp eq i32 %0, %1
+ br i1 %cmp3, label %land.lhs.true, label %land.end
+
+land.lhs.true:
+ %e = getelementptr inbounds nuw i8, ptr %a, i64 8
+ %2 = load i32, ptr %e, align 4
+ %cmp = icmp eq i32 %2, 255
+ br i1 %cmp, label %land.lhs.true4, label %land.end
+
+land.lhs.true4: ; preds = %land.lhs.true
+ %c = getelementptr inbounds nuw i8, ptr %a, i64 5
+ %3 = load i8, ptr %c, align 1
+ %c5 = getelementptr inbounds nuw i8, ptr %b, i64 5
+ %4 = load i8, ptr %c5, align 1
+ %cmp7 = icmp eq i8 %3, %4
+ %g = getelementptr inbounds nuw i8, ptr %a, i64 16
+ %5 = load i32, ptr %g, align 4
+ %cmp9 = icmp eq i32 %5, 100
+ %or.cond = select i1 %cmp7, i1 %cmp9, i1 false
+ br i1 %or.cond, label %land.lhs.true10, label %land.end
+
+land.lhs.true10: ; preds = %land.lhs.true4
+ %b11 = getelementptr inbounds nuw i8, ptr %a, i64 4
+ %6 = load i8, ptr %b11, align 4
+ %b13 = getelementptr inbounds nuw i8, ptr %b, i64 4
+ %7 = load i8, ptr %b13, align 4
+ %cmp15 = icmp eq i8 %6, %7
+ br i1 %cmp15, label %land.rhs, label %land.end
+
+land.rhs: ; preds = %land.lhs.true10
+ %f = getelementptr inbounds nuw i8, ptr %a, i64 12
+ %8 = load i32, ptr %f, align 4
+ %cmp16 = icmp eq i32 %8, 200
+ br label %land.end
+
+land.end: ; preds = %land.rhs, %land.lhs.true10, %land.lhs.true4, %land.lhs.true, %entry
+ %9 = phi i1 [ false, %land.lhs.true10 ], [ false, %land.lhs.true4 ], [ false, %land.lhs.true ], [ false, %entry ], [ %cmp16, %land.rhs ]
+ ret i1 %9
+}
>From b57565426049fe2cc63435b435ea75bc1c81cb56 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sat, 22 Mar 2025 21:21:20 +0100
Subject: [PATCH 18/23] [MergeICmps] Made instruction splicing more robust by
not assuming second to last inst holds cond-result
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index a5d2895c9f0e1..30852a846f3d7 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -942,9 +942,14 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
NextCmpBlock->getParent(), InsertBefore);
+ auto *const BranchI = cast<BranchInst>(BB->getTerminator());
+ Value* CondResult = nullptr;
+ if (BranchI->isUnconditional())
+ CondResult = Phi.getIncomingValueForBlock(BB);
+ else
+ CondResult = cast<Value>(BranchI->getCondition());
// Transfer all instructions except the branching terminator to the new block.
MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
- Value* CondResult = cast<Value>(&MultBB->back());
IRBuilder<> Builder(MultBB);
updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
return MultBB;
>From cb53e53e790d0dffbace8d8bd86c4d48a8752bbe Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Sun, 23 Mar 2025 20:42:10 +0100
Subject: [PATCH 19/23] [MergeICmps] Cleaned up code and added new debug info
---
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 170 ++++++++++--------
.../MergeICmps/X86/mixed-cmp-split.ll | 39 ----
2 files changed, 93 insertions(+), 116 deletions(-)
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 30852a846f3d7..21b97a0b45faf 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -185,6 +185,9 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, Instructi
}
+// An abstract parent class that can either be a comparison of
+// two BCEAtoms with the same offsets to a base pointer (BCECmp)
+// or a comparison of a single BCEAtom with a constant (BCEConstCmp).
struct Comparison {
public:
enum CompKind {
@@ -197,14 +200,11 @@ struct Comparison {
int SizeBits;
const ICmpInst *CmpI;
- using LoadOperands = std::pair<BCEAtom*, std::optional<BCEAtom*>>;
-
Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
: Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
CompKind getKind() const { return Kind; }
virtual ~Comparison() = default;
- virtual LoadOperands getLoads() = 0;
bool areContiguous(const Comparison& Other) const;
bool operator<(const Comparison &Other) const;
};
@@ -226,10 +226,6 @@ struct BCEConstCmp : public Comparison {
static bool classof(const Comparison* C) {
return C->getKind() == CK_ConstCmp;
}
-
- Comparison::LoadOperands getLoads() override {
- return std::make_pair(&Lhs,std::nullopt);
- }
};
// A comparison between two BCE atoms, e.g. `a == o.a` in the example at the
@@ -249,10 +245,6 @@ struct BCECmp : public Comparison {
static bool classof(const Comparison* C) {
return C->getKind() == CK_BceCmp;
}
-
- Comparison::LoadOperands getLoads() override {
- return std::make_pair(&Lhs,&Rhs);
- }
};
// TODO: this can be improved to take alignment into account.
@@ -286,7 +278,7 @@ bool Comparison::operator<(const Comparison& Other) const {
}
// Represents multiple comparisons inside of a single basic block.
-// This happens if multiple basic blocks have previously been merged into a single using a select node.
+// This happens if multiple basic blocks have previously been merged into a single block using a select node.
class IntraCmpChain {
// TODO: this could probably be a unique-ptr but current impl relies on some copies
std::vector<std::shared_ptr<Comparison>> CmpChain;
@@ -302,7 +294,6 @@ class IntraCmpChain {
}
};
-
// A basic block that contains one or more comparisons.
class MultBCECmpBlock {
public:
@@ -326,6 +317,9 @@ class MultBCECmpBlock {
// block.
bool canSinkBCECmpInst(const Instruction *, AliasAnalysis &AA) const;
+ // Returns all instructions that should be split off of the comparison chain.
+ llvm::SmallVector<Instruction *, 4> getAllSplitInsts(AliasAnalysis &AA) const;
+
// The basic block where this comparison happens.
BasicBlock *BB;
// Instructions relating to the BCECmp and branch.
@@ -342,15 +336,20 @@ class MultBCECmpBlock {
// (see canSplit()).
class SingleBCECmpBlock {
public:
- SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder)
- : BB(M.BB), OrigOrder(OrigOrder), Cmp(std::move(M.getCmps()[I])) {}
-
- SingleBCECmpBlock(MultBCECmpBlock M, unsigned I, unsigned OrigOrder, llvm::SmallVector<Instruction *, 4> SplitInsts)
- : BB(M.BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(M.getCmps()[I])), SplitInsts(SplitInsts) {}
-
- const BCEAtom* Lhs() const { return Cmp->getLoads().first; }
+ SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder)
+ : BB(BB), OrigOrder(OrigOrder), Cmp(std::move(Cmp)) {}
+
+ SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder,
+ llvm::SmallVector<Instruction *, 4> SplitInsts)
+ : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)), SplitInsts(SplitInsts) {}
+
+ const BCEAtom* Lhs() const {
+ if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
+ return &BceConstCmp->Lhs;
+ auto *const BceCmp = cast<BCECmp>(Cmp.get());
+ return &BceCmp->Lhs;
+ }
const Comparison* getCmp() const { return Cmp.get(); }
-
bool operator<(const SingleBCECmpBlock &O) const {
return *Cmp < *O.Cmp;
}
@@ -383,11 +382,14 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
};
- for (auto& Cmp : Cmps) {
- auto [Lhs,Rhs] = Cmp->getLoads();
- if (MayClobber(Lhs->LoadI) || (Rhs && MayClobber((*Rhs)->LoadI)))
- return false;
- }
+ auto CmpLoadsAreClobbered = [&](const auto& Cmp) {
+ if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
+ return MayClobber(BceConstCmp->Lhs.LoadI);
+ auto *const BceCmp = cast<BCECmp>(Cmp.get());
+ return MayClobber(BceCmp->Lhs.LoadI) || MayClobber(BceCmp->Rhs.LoadI);
+ };
+ if (llvm::any_of(Cmps, CmpLoadsAreClobbered))
+ return false;
}
// Make sure this instruction does not use any of the BCE cmp block
// instructions as operand.
@@ -425,6 +427,20 @@ bool MultBCECmpBlock::doesOtherWork() const {
return false;
}
+llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
+ llvm::SmallVector<Instruction *, 4> SplitInsts;
+ for (Instruction& Inst : *BB) {
+ if (BlockInsts.count(&Inst))
+ continue;
+ assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
+ // This is a non-BCE-cmp-block instruction. And it can be separated
+ // from the BCE-cmp-block instructions.
+ SplitInsts.push_back(&Inst);
+ }
+ return SplitInsts;
+}
+
+
// Visit the given comparison. If this is a comparison between two valid
// BCE atoms, or between a BCE atom and a constant, returns the comparison.
std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
@@ -552,42 +568,37 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
}
-// void emitDebugInfo(BCECmpBlock &&Comparison) {
-// LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-// << "': Found constant-cmp of " << Comparison.getCmp().SizeBits
-// << " bits including " << Comparison.getCmp()->Lhs.BaseId << " + "
-// << Comparison.getCmp().Lhs.Offset << "\n");
-
-// LLVM_DEBUG(dbgs() << "Block '" << Comparison.BB->getName()
-// << "': Found cmp of " << Comparison.getCmp().SizeBits
-// << " bits between " << Comparison.getCmp().Lhs.BaseId << " + "
-// << Comparison.Lhs.Offset << " and "
-// << Comparison.Rhs.BaseId << " + "
-// << Comparison.Rhs.Offset << "\n");
-// LLVM_DEBUG(dbgs() << "\n");
-// }
-
-// Enqueues a single comparison and if it's the first comparison block then adds the `OtherInsts` to the block too to split it.
-static inline void enqueueSingleCmp(std::vector<SingleBCECmpBlock> &Comparisons,
+void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock* BB) {
+ LLVM_DEBUG(dbgs() << "Block '" << BB->getName());
+ if (auto* ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
+ LLVM_DEBUG(dbgs() << "': Found constant-cmp of " << Cmp->SizeBits
+ << " bits including " << ConstCmp->Lhs.BaseId << " + "
+ << ConstCmp->Lhs.Offset << "\n");
+ return;
+ }
+ auto* BceCmp = cast<BCECmp>(Cmp.get());
+ LLVM_DEBUG(dbgs() << "': Found cmp of " << BceCmp->SizeBits
+ << " bits between " << BceCmp->Lhs.BaseId << " + "
+ << BceCmp->Lhs.Offset << " and "
+ << BceCmp->Rhs.BaseId << " + "
+ << BceCmp->Rhs.Offset << "\n");
+}
+
+// Enqueues all comparisons of a mult-block.
+// If the block requires splitting then adds `OtherInsts` to the block too.
+static inline void enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
- // emitDebugInfo(Comparison);
- for (unsigned i = 0; i < CmpBlock.getCmps().size(); i++) {
+ bool hasAlreadySplit = false;
+ for (auto& Cmp : CmpBlock.getCmps()) {
+ emitDebugInfo(Cmp, CmpBlock.BB);
unsigned OrigOrder = Comparisons.size();
- if (!RequireSplit || i != 0) {
- Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder));
+ if (RequireSplit && !hasAlreadySplit) {
+ hasAlreadySplit = true;
+ auto SplitInsts = CmpBlock.getAllSplitInsts(AA);
+ Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
continue;
}
- // If should split mult block then put all instructions at the beginning of the first block
- llvm::SmallVector<Instruction *, 4> OtherInsts;
- for (Instruction &Inst : *CmpBlock.BB) {
- if (CmpBlock.BlockInsts.count(&Inst))
- continue;
- assert(CmpBlock.canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
- // This is a non-BCE-cmp-block instruction. And it can be separated
- // from the BCE-cmp-block instruction.
- OtherInsts.push_back(&Inst);
- }
- Comparisons.push_back(SingleBCECmpBlock(CmpBlock, i, OrigOrder, OtherInsts));
+ Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder));
}
}
@@ -609,7 +620,6 @@ class BCECmpChain {
[](const auto &Blocks) { return Blocks.size() > 1; });
};
-
private:
PHINode &Phi_;
// The list of all blocks in the chain, grouped by contiguity.
@@ -714,7 +724,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
LLVM_DEBUG(dbgs()
<< "Split initial block '" << CmpBlock->BB->getName()
<< "' that does extra work besides compare\n");
- enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, true);
+ enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, true);
} else {
LLVM_DEBUG(dbgs()
<< "ignoring initial block '" << CmpBlock->BB->getName()
@@ -747,7 +757,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
// We could still merge bb1 and bb2 though.
return;
}
- enqueueSingleCmp(Comparisons, std::move(*CmpBlock), AA, false);
+ enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, false);
}
// It is possible we have no suitable comparison to merge.
@@ -827,6 +837,7 @@ class MergedBlockName {
} // namespace
+// Add a branch to the next basic block in the chain.
void updateBranching(Value* CondResult,
IRBuilder<>& Builder,
BasicBlock *BB,
@@ -836,7 +847,6 @@ void updateBranching(Value* CondResult,
const TargetLibraryInfo &TLI,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
BasicBlock *const PhiBB = Phi.getParent();
- // Add a branch to the next basic block in the chain.
if (NextCmpBlock == PhiBB) {
// Continue to phi, passing it the comparison result.
Builder.CreateBr(PhiBB);
@@ -851,6 +861,25 @@ void updateBranching(Value* CondResult,
}
}
+// Builds constant-struct to compare pointer to during memcmp(). Has to be a chain of const-comparisons.
+AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context) {
+ std::vector<Constant*> Constants;
+ std::vector<Type*> Types;
+
+ for (const auto& BceBlock : Comparisons) {
+ assert(isa<BCEConstCmp>(BceBlock.getCmp()) && "Const-cmp-chain can only contain const comparisons");
+ auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+ Constants.emplace_back(ConstCmp->Const);
+ Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
+ }
+ // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
+ auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+ auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
+ auto *StructConstant = ConstantStruct::get(StructType, Constants);
+ Builder.CreateStore(StructConstant, StructAlloca);
+
+ return StructAlloca;
+}
// Merges the given contiguous comparison blocks into one memcmp block.
static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
@@ -870,27 +899,13 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
IRBuilder<> Builder(BB);
// Add the GEPs from the first BCECmpBlock.
Value *Lhs, *Rhs;
-
if (FirstCmp.Lhs()->GEP)
Lhs = Builder.Insert(FirstCmp.Lhs()->GEP->clone());
else
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
- // Build constant-struct to compare pointer to. Has to be a chain of const-comparisons.
if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
- std::vector<Constant*> Constants;
- std::vector<Type*> Types;
- for (const auto& BceBlock : Comparisons) {
- auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
- Constants.emplace_back(ConstCmp->Const);
- Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
- }
- // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
- auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
- auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
- auto *StructConstant = ConstantStruct::get(StructType, Constants);
- Builder.CreateStore(StructConstant, StructAlloca);
- Rhs = StructAlloca;
+ Rhs = buildStruct(Comparisons, Builder, Context);
} else {
auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
if (FirstBceCmp->Rhs.GEP)
@@ -952,6 +967,7 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
IRBuilder<> Builder(MultBB);
updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+
return MultBB;
}
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
index 61fdd2b7e17e9..1bdd4fef67136 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -1,44 +1,5 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
-; define dso_local noundef zeroext i1 @cmp_mixed_split(ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %a, ptr noundef nonnull readonly align 4 captures(none) dereferenceable(40) %b) local_unnamed_addr {
-; entry:
-; %0 = load i32, ptr %a, align 4
-; %1 = load i32, ptr %b, align 4
-; %cmp = icmp eq i32 %0, %1
-; br i1 %cmp, label %land.lhs.true, label %land.end
-;
-; land.lhs.true: ; preds = %entry
-; %e = getelementptr inbounds nuw i8, ptr %a, i64 20
-; %2 = load i32, ptr %e, align 4
-; %a3 = getelementptr inbounds nuw i8, ptr %b, i64 4
-; %3 = load i32, ptr %a3, align 4
-; %b2 = getelementptr inbounds nuw i8, ptr %a, i64 8
-; %4 = load i32, ptr %b2, align 4
-; %c = getelementptr inbounds nuw i8, ptr %a, i64 12
-; %5 = load i8, ptr %c, align 4
-; %a1 = getelementptr inbounds nuw i8, ptr %a, i64 4
-; %6 = load i32, ptr %a1, align 4
-; %d = getelementptr inbounds nuw i8, ptr %a, i64 16
-; %7 = load i32, ptr %d, align 4
-; %cmp5 = icmp eq i32 %6, %3
-; %cmp7 = icmp eq i8 %5, 43
-; %or.cond = select i1 %cmp5, i1 %cmp7, i1 false
-; %cmp9 = icmp eq i32 %4, 1
-; %or.cond13 = select i1 %or.cond, i1 %cmp9, i1 false
-; %cmp11 = icmp eq i32 %7, 12
-; %or.cond14 = select i1 %or.cond13, i1 %cmp11, i1 false
-; %cmp12 = icmp eq i32 %2, 3
-; %spec.select = select i1 %or.cond14, i1 %cmp12, i1 false
-; br label %land.end
-;
-; land.end: ; preds = %land.lhs.true, %entry
-; %8 = phi i1 [ false, %entry ], [ %spec.select, %land.lhs.true ]
-; ret i1 %8
-; }
-
-
-
-
declare void @foo(...)
; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.
>From dd4cd885616b23b1a327c686c3085572f0ddac93 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 24 Mar 2025 20:50:48 +0100
Subject: [PATCH 20/23] [MergeICmps] Use GlobalConstant instead of local alloca
for const-cmp; Is deleted when folded during expand-memcmp pass
---
llvm/lib/CodeGen/ExpandMemCmp.cpp | 19 +++
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 12 +-
.../Transforms/MergeICmps/X86/const-cmp-bb.ll | 5 +-
.../MergeICmps/X86/many-const-cmp-select.ll | 31 ++--
.../MergeICmps/X86/mixed-cmp-bb-select.ll | 6 +-
.../MergeICmps/X86/mixed-cmp-split.ll | 11 +-
.../MergeICmps/X86/mixed-comparisons.ll | 6 +-
.../X86/mixed-type-const-comparisons.ll | 11 +-
.../X86/not-split-unmerged-select.ll | 144 ++++++++--------
.../MergeICmps/X86/partial-select-merge.ll | 154 +++++++++---------
.../MergeICmps/X86/split-block-does-work.ll | 7 +-
11 files changed, 209 insertions(+), 197 deletions(-)
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 74f93e1979532..e32cb2db1c954 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -879,8 +879,27 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
if (Value *Res = Expansion.getMemCmpExpansion()) {
// Replace call with result of expansion and erase call.
+ auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
CI->replaceAllUsesWith(Res);
CI->eraseFromParent();
+
+ // If the mergeicmps pass used a global constant to merge comparisons and
+ // the the global constants were folded then the variable can be deleted since it isn't used anymore.
+ if (GV) {
+ // NOTE: There is still a use lingering around but that use itself isn't
+ // used so it is fine to erase this instruction.
+ static bool (*hasActiveUses)(Value*) = [](Value* V) {
+ for (User* U: V->users()){
+ if (hasActiveUses(U))
+ return true;
+ }
+ return false;
+ };
+ if (!hasActiveUses(GV)) {
+ LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName() << " that was introduced by the previous mergeicmps pass\n");
+ GV->eraseFromParent();
+ }
+ }
}
return true;
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 21b97a0b45faf..2eb1c9761d32e 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -861,8 +861,9 @@ void updateBranching(Value* CondResult,
}
}
-// Builds constant-struct to compare pointer to during memcmp(). Has to be a chain of const-comparisons.
-AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context) {
+// Builds global constant-struct to compare to pointer during memcmp().
+// Has to be global in order for expand-memcmp pass to be able to fold constants.
+GlobalVariable* buildConstantStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context, Module& M) {
std::vector<Constant*> Constants;
std::vector<Type*> Types;
@@ -872,13 +873,10 @@ AllocaInst* buildStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& B
Constants.emplace_back(ConstCmp->Const);
Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
}
- // NOTE: Could check if all elements are of the same type and then use an array instead, if that is more performat.
auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
- auto* StructAlloca = Builder.CreateAlloca(StructType,nullptr);
auto *StructConstant = ConstantStruct::get(StructType, Constants);
- Builder.CreateStore(StructConstant, StructAlloca);
- return StructAlloca;
+ return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage, StructConstant, "memcmp_const_op");
}
// Merges the given contiguous comparison blocks into one memcmp block.
@@ -905,7 +903,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
Lhs = FirstCmp.Lhs()->LoadI->getPointerOperand();
if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
- Rhs = buildStruct(Comparisons, Builder, Context);
+ Rhs = buildConstantStruct(Comparisons, Builder, Context, *Phi.getModule());
} else {
auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
if (FirstBceCmp->Rhs.GEP)
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index 51c3c27583602..c39d586d2f174 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -4,11 +4,10 @@
; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
+; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
; CHECK-LABEL: @test(
; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>, align 8
-; CHECK-NEXT: store <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>, ptr [[TMP0]], align 1
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[TMP0:%.*]], i64 3)
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[MEMCMP_OP]], i64 3)
; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END5:%.*]]
; CHECK: land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index 0ca0f671d98a4..bca4dacbefbfa 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -2,29 +2,28 @@
; Can merge contiguous const-comparison basic blocks that include a select statement.
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
+
define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
; CHECK-LABEL: @is_all_ones_many(
; CHECK-NEXT: "entry+land.lhs.true11":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8, i8 }>
-; CHECK-NEXT: store <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>, ptr [[TMP0]], align 1
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 4)
-; CHECK-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT: br i1 [[TMP1]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[MEMCMP_OP1]], i64 4)
+; CHECK-NEXT: [[TMP0:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[TMP0]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true16+land.lhs.true21":
-; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; CHECK-NEXT: store <{ i8, i8 }> <{ i8 2, i8 7 }>, ptr [[TMP3]], align 1
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP2]], ptr [[TMP3]], i64 2)
-; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; CHECK-NEXT: br i1 [[TMP4]], label [[LAST_CMP:%.*]], label [[LAND_END]]
+; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[TMP2]], label [[LAST_CMP:%.*]], label [[LAND_END]]
; CHECK: land.rhs1:
-; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
-; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[TMP5]], align 1
-; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[TMP3]], align 1
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i8 [[TMP4]], 9
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
-; CHECK-NEXT: [[TMP8:%.*]] = phi i1 [ [[TMP7]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
-; CHECK-NEXT: ret i1 [[TMP8]]
+; CHECK-NEXT: [[TMP6:%.*]] = phi i1 [ [[TMP5]], [[LAST_CMP]] ], [ false, [[NEXT_MEMCMP]] ], [ false, [[ENTRY:%.*]] ]
+; CHECK-NEXT: ret i1 [[TMP6]]
;
entry:
%0 = load i8, ptr %p, align 1
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index dfe57e6ef930a..3990af69d6c83 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -2,6 +2,8 @@
; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
define dso_local noundef zeroext i1 @cmp_mixed(
ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %b) local_unnamed_addr {
@@ -12,9 +14,7 @@ define dso_local noundef zeroext i1 @cmp_mixed(
; CHECK-NEXT: br i1 [[CMP1]], label [[ENTRY_LAND_RHS:%.*]], label [[LAND_END:%.*]]
; CHECK: "entry+land.rhs+land.lhs.true4":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
index 1bdd4fef67136..3e4e4c3eaf6be 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-split.ll
@@ -4,15 +4,16 @@ declare void @foo(...)
; Tests that if both const-cmp and bce-cmp chains can be merged that the splitted block is still at the beginning.
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
define dso_local noundef zeroext i1 @cmp_mixed_const_first(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
; CHECK-LABEL: @cmp_mixed_const_first(
; This merged-block should come first as it should be split.
; CHECK: "entry+land.rhs+land.lhs.true8":
; CHECK-NEXT: call void (...) @foo() #[[ATTR2:[0-9]+]]
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT: [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[MEMCMP0:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[MEMCMP0]], 0
; CHECK-NEXT: br i1 [[CMP0]], label [[LAND_LHS_TRUE10:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true+land.lhs.true10+land.lhs.true4":
@@ -82,9 +83,7 @@ define dso_local noundef zeroext i1 @cmp_mixed_bce_first(
; CHECK-NEXT: br i1 [[CMP1]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true+land.rhs+land.lhs.true4":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP1]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
index d88d7d824b5ed..b5e85d3a09dfb 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-comparisons.ll
@@ -4,6 +4,8 @@
; Tests if a mixed chain of comparisons can still be merged into two memcmp calls.
; a.e == 255 && a.a == b.a && a.c == b.c && a.g == 100 && a.b == b.b && a.f == 200;
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+
define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 dereferenceable(20) %a, ptr noundef nonnull align 4 dereferenceable(20) %b) local_unnamed_addr {
; CHECK-LABEL: @cmp_mixed(
; This is the classic BCE comparison block
@@ -14,9 +16,7 @@ define dso_local noundef zeroext i1 @cmp_mixed(ptr noundef nonnull align 4 deref
; This is the new BCE to constant comparison block
; CHECK: "entry+land.rhs+land.lhs.true8":
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 8
-; CHECK-NEXT: [[TMP1:%.*]] = alloca <{ i32, i32, i32 }>
-; CHECK-NEXT: store <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>, ptr [[TMP1]], align 1
-; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[TMP1]], i64 12)
+; CHECK-NEXT: [[MEMCMP2:%.*]] = call i32 @memcmp(ptr [[TMP0]], ptr [[MEMCMP_OP0]], i64 12)
; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[MEMCMP2]], 0
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
index 15c5a382d1f46..3a5bf5585d46a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-type-const-comparisons.ll
@@ -3,6 +3,9 @@
; Tests if a const-cmp-chain of different types can still be merged.
; This is usually the case when comparing different struct fields to constants.
+; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i8 }> <{ i32 3, i8 100 }>
+; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>
+
; Can only merge gep 0 with gep 4 due to alignment since gep 8 is not directly adjacent to gep 4.
define dso_local zeroext i1 @is_all_ones_struct(
; CHECK-LABEL: @is_all_ones_struct(
@@ -12,9 +15,7 @@ define dso_local zeroext i1 @is_all_ones_struct(
; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP1]], 200
; CHECK-NEXT: br i1 [[CMP0]], label [[MERGED:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.rhs+land.lhs.true":
-; CHECK-NEXT: [[TMP2:%.*]] = alloca <{ i32, i8 }>
-; CHECK-NEXT: store <{ i32, i8 }> <{ i32 3, i8 100 }>, ptr [[TMP2]]
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[TMP2]], i64 5)
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P]], ptr [[MEMCMP_OP0]], i64 5)
; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
@@ -49,9 +50,7 @@ land.end: ; preds = %land.rhs, %land.lhs
define dso_local noundef zeroext i1 @is_all_ones_struct_select_block(
; CHECK-LABEL: @is_all_ones_struct_select_block(
; CHECK: "entry+land.rhs":
-; CHECK-NEXT: [[TMP0:%.*]] = alloca <{ i32, i8, i8 }>
-; CHECK-NEXT: store <{ i32, i8, i8 }> <{ i32 200, i8 3, i8 100 }>, ptr [[TMP0]]
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[TMP0]], i64 6)
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[P:%.*]], ptr [[MEMCMP_OP1]], i64 6)
; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; CHECK-NEXT: br label [[LAND_END]]
; CHECK: land.end:
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index 582b57d8c60ce..d3e882a226ac7 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,46 +1,48 @@
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
+; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
+
define dso_local noundef zeroext i1 @unmergable_select(
ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @unmergable_select(
-; REG: entry:
-; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
-; REG: land.lhs.true11:
-; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
-; REG: land.lhs.true16:
-; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; REG: land.lhs.true21:
-; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG: land.rhs:
-; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
-; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
-; REG-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
-; REG-NEXT: br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
-; REG-NEXT: ret i1 [[RES]]
+; CHECK-LABEL: @unmergable_select(
+; CHECK: entry:
+; CHECK-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT: br i1 [[SEL1]], label [[LAND_LHS_11:%.*]], label [[LAND_END:%.*]]
+; CHECK: land.lhs.true11:
+; CHECK-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CHECK-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true16:
+; CHECK-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true21:
+; CHECK-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK: land.rhs:
+; CHECK-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 14
+; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; CHECK-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_11]] ], [ false, %entry ], [ %cmp28, [[LAND_RHS]] ]
+; CHECK-NEXT: ret i1 [[RES]]
;
entry:
%arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
@@ -88,40 +90,38 @@ land.end: ; preds = %land.rhs, %land.lhs
; p[12] and p[13] mergable, select mult-block is part of the chain but isn't merged and won't get split up into its single comparisons.
define dso_local noundef zeroext i1 @partial_merge_not_select(ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @partial_merge_not_select(
-; REG: entry3:
-; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
-; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
-; REG: "land.lhs.true11+land.rhs":
-; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
-; REG-NEXT: [[TMP3:%.*]] = alloca <{ i8, i8 }>
-; REG-NEXT: store <{ i8, i8 }> <{ i8 1, i8 9 }>, ptr [[TMP3]], align 1
-; REG-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[TMP3]], i64 2)
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
-; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
-; REG: land.lhs.true162:
-; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
-; REG: land.lhs.true211:
-; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT: br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
-; REG-NEXT: ret i1 [[RES]]
+; CHECK-LABEL: @partial_merge_not_select(
+; CHECK: entry3:
+; CHECK-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 10
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT: br i1 [[SEL1]], label [[LAND_LHS_LAND_RHS:%.*]], label [[LAND_END:%.*]]
+; CHECK: "land.lhs.true11+land.rhs":
+; CHECK-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 12
+; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[IDX3]], ptr [[MEMCMP_OP]], i64 2)
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: br i1 [[CMP3]], label [[LAND_LHS_16:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true162:
+; CHECK-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT: br i1 [[CMP4]], label [[LAND_LHS_21:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true211:
+; CHECK-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ [[CMP5]], [[LAND_LHS_21]] ], [ false, [[LAND_LHS_16]] ], [ false, [[LAND_LHS_LAND_RHS]] ], [ false, %entry3 ]
+; CHECK-NEXT: ret i1 [[RES]]
;
entry:
%arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 10
diff --git a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
index 317a3a1464536..f67743ed6fcc1 100644
--- a/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/partial-select-merge.ll
@@ -1,49 +1,49 @@
-; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s --check-prefix=REG
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
; Cannot merge only part of a select block if not entire block mergable.
define zeroext i1 @cmp_partially_mergable_select(
ptr nocapture readonly align 4 dereferenceable(24) %a,
ptr nocapture readonly align 4 dereferenceable(24) %b) local_unnamed_addr {
-; REG-LABEL: @cmp_partially_mergable_select(
-; REG: entry:
-; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
-; REG-NEXT: [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
-; REG-NEXT: br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
-; REG: land.lhs.true:
-; REG-NEXT: [[TMP1:%.*]] = load i32, ptr [[A]], align 4
-; REG-NEXT: [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
-; REG-NEXT: br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
-; REG: land.lhs.true4:
-; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
-; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
-; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
-; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
-; REG-NEXT: [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
-; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
-; REG-NEXT: br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
-; REG: land.lhs.true10:
-; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
-; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
-; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
-; REG-NEXT: [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
-; REG-NEXT: br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG: land.rhs:
-; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
-; REG-NEXT: [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
-; REG-NEXT: [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
-; REG-NEXT: [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
-; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
-; REG-NEXT: br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
-; REG-NEXT: ret i1 [[RES]]
+; CHECK-LABEL: @cmp_partially_mergable_select(
+; CHECK: entry:
+; CHECK-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 8
+; CHECK-NEXT: [[TMP0:%.*]] = load i32, ptr [[IDX0]], align 4
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i32 [[TMP0]], 255
+; CHECK-NEXT: br i1 [[CMP0]], label [[LAND_LHS_TRUE:%.*]], label [[LAND_END:%.*]]
+; CHECK: land.lhs.true:
+; CHECK-NEXT: [[TMP1:%.*]] = load i32, ptr [[A]], align 4
+; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr [[B:%.*]], align 4
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i32 [[TMP1]], [[TMP2]]
+; CHECK-NEXT: br i1 [[CMP1]], label [[LAND_LHS_TRUE_4:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true4:
+; CHECK-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 5
+; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 5
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP3]], [[TMP4]]
+; CHECK-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 16
+; CHECK-NEXT: [[TMP5:%.*]] = load i32, ptr [[IDX3]], align 4
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i32 [[TMP5]], 100
+; CHECK-NEXT: [[SEL0:%.*]] = select i1 [[CMP2]], i1 [[CMP3]], i1 false
+; CHECK-NEXT: br i1 [[SEL0]], label [[LAND_LHS_TRUE_10:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true10:
+; CHECK-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 20
+; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX4]], align 4
+; CHECK-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 20
+; CHECK-NEXT: [[TMP7:%.*]] = load i8, ptr [[IDX5]], align 4
+; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP6]], [[TMP7]]
+; CHECK-NEXT: br i1 [[CMP4]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK: land.rhs:
+; CHECK-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[A]], i64 4
+; CHECK-NEXT: [[TMP8:%.*]] = load i8, ptr [[IDX6]], align 4
+; CHECK-NEXT: [[IDX7:%.*]] = getelementptr inbounds nuw i8, ptr [[B]], i64 4
+; CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr [[IDX7]], align 4
+; CHECK-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP8]], [[TMP9]]
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_10]] ], [ false, [[LAND_LHS_TRUE_4]] ], [ false, [[LAND_LHS_TRUE]] ], [ false, %entry ], [ [[CMP5]], [[LAND_RHS]] ]
+; CHECK-NEXT: ret i1 [[RES]]
;
entry:
%e = getelementptr inbounds nuw i8, ptr %a, i64 8
@@ -95,43 +95,43 @@ land.end: ; preds = %land.rhs, %land.lhs
define dso_local zeroext i1 @cmp_partially_mergable_select_array(
ptr nocapture readonly align 1 dereferenceable(24) %p) local_unnamed_addr {
-; REG-LABEL: @cmp_partially_mergable_select_array(
-; REG: entry:
-; REG-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
-; REG-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
-; REG-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
-; REG-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
-; REG-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
-; REG-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
-; REG-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
-; REG-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
-; REG-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
-; REG-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
-; REG-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
-; REG-NEXT: br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
-; REG: land.lhs.true11:
-; REG-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
-; REG-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
-; REG-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
-; REG-NEXT: br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
-; REG: land.lhs.true16:
-; REG-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; REG-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
-; REG-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
-; REG-NEXT: br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
-; REG: land.lhs.true21:
-; REG-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
-; REG-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
-; REG-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
-; REG-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
-; REG: land.rhs:
-; REG-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
-; REG-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
-; REG-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
-; REG-NEXT: br label [[LAND_END]]
-; REG: land.end:
-; REG-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
-; REG-NEXT: ret i1 [[RES]]
+; CHECK-LABEL: @cmp_partially_mergable_select_array(
+; CHECK: entry:
+; CHECK-NEXT: [[IDX0:%.*]] = getelementptr inbounds nuw i8, ptr [[P:%.*]], i64 12
+; CHECK-NEXT: [[TMP0:%.*]] = load i8, ptr [[IDX0]], align 1
+; CHECK-NEXT: [[IDX1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 1
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr [[IDX1]], align 1
+; CHECK-NEXT: [[IDX2:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 3
+; CHECK-NEXT: [[TMP2:%.*]] = load i8, ptr [[IDX2]], align 1
+; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i8 [[TMP0]], -1
+; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i8 [[TMP1]], -56
+; CHECK-NEXT: [[SEL0:%.*]] = select i1 [[CMP0]], i1 [[CMP1]], i1 false
+; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i8 [[TMP2]], -66
+; CHECK-NEXT: [[SEL1:%.*]] = select i1 [[SEL0]], i1 [[CMP2]], i1 false
+; CHECK-NEXT: br i1 [[SEL1]], label [[LAND_LHS_TRUE_11:%.*]], label [[LAND_END:%.*]]
+; CHECK: land.lhs.true11:
+; CHECK-NEXT: [[IDX3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 10
+; CHECK-NEXT: [[TMP3:%.*]] = load i8, ptr [[IDX3]], align 1
+; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i8 [[TMP3]], 1
+; CHECK-NEXT: br i1 [[CMP3]], label [[LAND_LHS_TRUE_16:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true16:
+; CHECK-NEXT: [[IDX4:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
+; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr [[IDX4]], align 1
+; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i8 [[TMP4]], 2
+; CHECK-NEXT: br i1 [[CMP4]], label [[LAND_LHS_TRUE_21:%.*]], label [[LAND_END]]
+; CHECK: land.lhs.true21:
+; CHECK-NEXT: [[IDX5:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 8
+; CHECK-NEXT: [[TMP5:%.*]] = load i8, ptr [[IDX5]], align 1
+; CHECK-NEXT: [[CMP5:%.*]] = icmp eq i8 [[TMP5]], 7
+; CHECK-NEXT: br i1 [[CMP5]], label [[LAND_RHS:%.*]], label [[LAND_END]]
+; CHECK: land.rhs:
+; CHECK-NEXT: [[IDX6:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 13
+; CHECK-NEXT: [[TMP6:%.*]] = load i8, ptr [[IDX6]], align 1
+; CHECK-NEXT: [[CMP6:%.*]] = icmp eq i8 [[TMP6]], 9
+; CHECK-NEXT: br label [[LAND_END]]
+; CHECK: land.end:
+; CHECK-NEXT: [[RES:%.*]] = phi i1 [ false, [[LAND_LHS_TRUE_21]] ], [ false, [[LAND_LHS_TRUE_16]] ], [ false, [[LAND_LHS_TRUE_11]] ], [ false, %entry ], [ [[CMP6]], [[LAND_RHS]] ]
+; CHECK-NEXT: ret i1 [[RES]]
;
entry:
%arrayidx = getelementptr inbounds nuw i8, ptr %p, i64 12
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 5381d88ed7f52..00306a2b5f22c 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -1,4 +1,3 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=mergeicmps -verify-dom-info -mtriple=x86_64-unknown-unknown -S | FileCheck %s --check-prefix=X86
%S = type { i32, i32, i32, i32 }
@@ -6,6 +5,8 @@
declare void @foo(...)
declare void @bar(...)
+; X86: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
+
; We can split %entry and create a memcmp(16 bytes).
define zeroext i1 @opeq1(
; X86-LABEL: @opeq1(
@@ -250,9 +251,7 @@ define dso_local noundef zeroext i1 @unclobbered_select_cmp(
; X86-NEXT: call void (...) @foo() #[[ATTR2]]
; X86-NEXT: call void (...) @bar() #[[ATTR2]]
; X86-NEXT: [[OFFSET:%.*]] = getelementptr inbounds nuw i8, ptr [[A:%.*]], i64 2
-; X86-NEXT: [[TMP0:%.*]] = alloca <{ i8, i8, i8 }>
-; X86-NEXT: store <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>, ptr [[TMP0]], align 1
-; X86-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[TMP0]], i64 3)
+; X86-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[OFFSET]], ptr [[MEMCMP_OP]], i64 3)
; X86-NEXT: [[TMP1:%.*]] = icmp eq i32 [[MEMCMP]], 0
; X86-NEXT: br label [[LAND_END:%.*]]
; X86: land.end:
>From e029bebe32b4d36d62dda9903e89f23538621852 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 18:17:09 +0200
Subject: [PATCH 21/23] [MergerICmps] Tested that global constant is properly
removed in expand-memcmp
---
llvm/lib/CodeGen/ExpandMemCmp.cpp | 4 +--
.../Transforms/MergeICmps/X86/const-cmp-bb.ll | 25 +++++++++++++++++--
.../MergeICmps/X86/many-const-cmp-select.ll | 8 ++++--
.../MergeICmps/X86/mixed-cmp-bb-select.ll | 2 ++
.../X86/not-split-unmerged-select.ll | 2 ++
.../MergeICmps/X86/split-block-does-work.ll | 2 ++
6 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index e32cb2db1c954..41b43a131932a 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -878,14 +878,14 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
NumMemCmpInlined++;
if (Value *Res = Expansion.getMemCmpExpansion()) {
- // Replace call with result of expansion and erase call.
auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
+ // Replace call with result of expansion and erase call.
CI->replaceAllUsesWith(Res);
CI->eraseFromParent();
// If the mergeicmps pass used a global constant to merge comparisons and
// the the global constants were folded then the variable can be deleted since it isn't used anymore.
- if (GV) {
+ if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
// NOTE: There is still a use lingering around but that use itself isn't
// used so it is fine to erase this instruction.
static bool (*hasActiveUses)(Value*) = [](Value* V) {
diff --git a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
index c39d586d2f174..3956c62579986 100644
--- a/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/const-cmp-bb.ll
@@ -1,10 +1,14 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --force-update
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
; adjacent byte pointer accesses compared to constants, should be merged into single memcmp, spanning multiple basic blocks
-define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
+
+; Global should be removed once its constant has been folded.
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66 }>
+
+define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) local_unnamed_addr #0 {
; CHECK-LABEL: @test(
; CHECK-NEXT: "entry+land.lhs.true+land.rhs":
; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[p:%.*]], ptr [[MEMCMP_OP]], i64 3)
@@ -13,6 +17,23 @@ define zeroext i1 @test(ptr nocapture noundef nonnull dereferenceable(3) %p) loc
; CHECK: land.end:
; CHECK-NEXT: ret i1 [[TMP1]]
;
+; EXPANDED-LABEL: define zeroext i1 @test(
+; EXPANDED-SAME: ptr nocapture noundef nonnull dereferenceable(3) [[P:%.*]]) local_unnamed_addr {
+; EXPANDED-NEXT: "entry+land.lhs.true+land.rhs":
+; EXPANDED-NEXT: [[TMP0:%.*]] = load i16, ptr [[P]], align 1
+; EXPANDED-NEXT: [[TMP8:%.*]] = xor i16 [[TMP0]], -14081
+; EXPANDED-NEXT: [[TMP2:%.*]] = getelementptr i8, ptr [[P]], i64 2
+; EXPANDED-NEXT: [[TMP3:%.*]] = load i8, ptr [[TMP2]], align 1
+; EXPANDED-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i16
+; EXPANDED-NEXT: [[TMP5:%.*]] = xor i16 [[TMP4]], 190
+; EXPANDED-NEXT: [[TMP6:%.*]] = or i16 [[TMP8]], [[TMP5]]
+; EXPANDED-NEXT: [[TMP7:%.*]] = icmp ne i16 [[TMP6]], 0
+; EXPANDED-NEXT: [[CMP:%.*]] = zext i1 [[TMP7]] to i32
+; EXPANDED-NEXT: [[RES:%.*]] = icmp eq i32 [[CMP]], 0
+; EXPANDED-NEXT: br label %[[LAND_END:.*]]
+; EXPANDED: [[LAND_END]]:
+; EXPANDED-NEXT: ret i1 [[RES]]
+;
entry:
%0 = load i8, ptr %p, align 1
%cmp = icmp eq i8 %0, -1
diff --git a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
index bca4dacbefbfa..c4c2fe7e6a222 100644
--- a/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/many-const-cmp-select.ll
@@ -1,10 +1,14 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S 2>&1 | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
; Can merge contiguous const-comparison basic blocks that include a select statement.
; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
; CHECK: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
+; EXPANDED-NOT: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 2, i8 7 }>
+; EXPANDED-NOT: [[MEMCMP_OP1:@memcmp_const_op.1]] = private constant <{ i8, i8, i8, i8 }> <{ i8 -1, i8 -56, i8 -66, i8 1 }>
+
define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dereferenceable(24) %p) local_unnamed_addr {
; CHECK-LABEL: @is_all_ones_many(
; CHECK-NEXT: "entry+land.lhs.true11":
@@ -13,8 +17,8 @@ define dso_local zeroext i1 @is_all_ones_many(ptr nocapture noundef nonnull dere
; CHECK-NEXT: br i1 [[TMP0]], label [[NEXT_MEMCMP:%.*]], label [[LAND_END:%.*]]
; CHECK: "land.lhs.true16+land.lhs.true21":
; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 6
-; CHECK-NEXT: [[MEMCMP:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
-; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[MEMCMP]], 0
+; CHECK-NEXT: [[MEMCMP1:%.*]] = call i32 @memcmp(ptr [[TMP1]], ptr [[MEMCMP_OP0]], i64 2)
+; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i32 [[MEMCMP1]], 0
; CHECK-NEXT: br i1 [[TMP2]], label [[LAST_CMP:%.*]], label [[LAND_END]]
; CHECK: land.rhs1:
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds nuw i8, ptr [[P]], i64 9
diff --git a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
index 3990af69d6c83..d81aecc76ea4a 100644
--- a/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/mixed-cmp-bb-select.ll
@@ -1,8 +1,10 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
; Tests if a mixed chain of comparisons (including a select block) can still be merged into two memcmp calls.
; CHECK: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
+; EXPANDED-NOT: [[MEMCMP_OP0:@memcmp_const_op]] = private constant <{ i32, i32, i32 }> <{ i32 255, i32 200, i32 100 }>
define dso_local noundef zeroext i1 @cmp_mixed(
ptr noundef nonnull readonly align 4 captures(none) dereferenceable(20) %a,
diff --git a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
index d3e882a226ac7..d059609afe292 100644
--- a/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/not-split-unmerged-select.ll
@@ -1,8 +1,10 @@
; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes=mergeicmps -verify-dom-info -S | FileCheck %s
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
; No adjacent accesses to the same pointer so nothing should be merged. Select blocks won't get split.
; CHECK: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8 }> <{ i8 1, i8 9 }>
define dso_local noundef zeroext i1 @unmergable_select(
ptr noundef nonnull readonly align 8 captures(none) dereferenceable(24) %p) local_unnamed_addr {
diff --git a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
index 00306a2b5f22c..442d11f9c77fa 100644
--- a/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
+++ b/llvm/test/Transforms/MergeICmps/X86/split-block-does-work.ll
@@ -1,4 +1,5 @@
; RUN: opt < %s -passes=mergeicmps -verify-dom-info -mtriple=x86_64-unknown-unknown -S | FileCheck %s --check-prefix=X86
+; RUN: opt < %s -mtriple=x86_64-unknown-unknown -passes='mergeicmps,expand-memcmp' -verify-dom-info -S 2>&1 | FileCheck %s --check-prefix=EXPANDED
%S = type { i32, i32, i32, i32 }
@@ -6,6 +7,7 @@ declare void @foo(...)
declare void @bar(...)
; X86: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
+; EXPANDED-NOT: [[MEMCMP_OP:@memcmp_const_op]] = private constant <{ i8, i8, i8 }> <{ i8 100, i8 3, i8 -56 }>
; We can split %entry and create a memcmp(16 bytes).
define zeroext i1 @opeq1(
>From e1fe5286bb4e3cbc045c1b5ecee020451a601993 Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 19:27:40 +0200
Subject: [PATCH 22/23] [MergerICmps] Formatted
---
llvm/lib/CodeGen/ExpandMemCmp.cpp | 13 +-
llvm/lib/Transforms/Scalar/MergeICmps.cpp | 363 ++++++++++++----------
2 files changed, 205 insertions(+), 171 deletions(-)
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 41b43a131932a..323d34f838b27 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -878,25 +878,28 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
NumMemCmpInlined++;
if (Value *Res = Expansion.getMemCmpExpansion()) {
- auto* GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
+ auto *GV = dyn_cast<GlobalVariable>(CI->getArgOperand(1));
// Replace call with result of expansion and erase call.
CI->replaceAllUsesWith(Res);
CI->eraseFromParent();
// If the mergeicmps pass used a global constant to merge comparisons and
- // the the global constants were folded then the variable can be deleted since it isn't used anymore.
+ // the the global constants were folded then the variable can be deleted
+ // since it isn't used anymore.
if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
// NOTE: There is still a use lingering around but that use itself isn't
// used so it is fine to erase this instruction.
- static bool (*hasActiveUses)(Value*) = [](Value* V) {
- for (User* U: V->users()){
+ static bool (*hasActiveUses)(Value *) = [](Value *V) {
+ for (User *U : V->users()) {
if (hasActiveUses(U))
return true;
}
return false;
};
if (!hasActiveUses(GV)) {
- LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName() << " that was introduced by the previous mergeicmps pass\n");
+ LLVM_DEBUG(
+ dbgs() << "Removing global constant " << GV->getName()
+ << " that was introduced by the previous mergeicmps pass\n");
GV->eraseFromParent();
}
}
diff --git a/llvm/lib/Transforms/Scalar/MergeICmps.cpp b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
index 2eb1c9761d32e..0167fdddf7f7f 100644
--- a/llvm/lib/Transforms/Scalar/MergeICmps.cpp
+++ b/llvm/lib/Transforms/Scalar/MergeICmps.cpp
@@ -52,8 +52,8 @@
#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"
-#include "llvm/IR/Instruction.h"
#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
#include "llvm/IR/ValueMap.h"
#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"
@@ -132,14 +132,14 @@ class BaseIdentifier {
DenseMap<const Value*, int> BaseToIndex;
};
-
// All Instructions related to a comparison.
typedef SmallDenseSet<const Instruction *, 8> InstructionSet;
// If this value is a load from a constant offset w.r.t. a base address, and
// there are no other users of the load or address, returns the base address and
// the offset.
-BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, InstructionSet* BlockInsts) {
+BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId,
+ InstructionSet *BlockInsts) {
auto *const LoadI = dyn_cast<LoadInst>(Val);
if (!LoadI)
return {};
@@ -184,7 +184,6 @@ BCEAtom visitICmpLoadOperand(Value *const Val, BaseIdentifier &BaseId, Instructi
return BCEAtom(GEP, LoadI, BaseId.getBaseId(Base), Offset);
}
-
// An abstract parent class that can either be a comparison of
// two BCEAtoms with the same offsets to a base pointer (BCECmp)
// or a comparison of a single BCEAtom with a constant (BCEConstCmp).
@@ -194,23 +193,26 @@ struct Comparison {
CK_ConstCmp,
CK_BceCmp,
};
+
private:
const CompKind Kind;
+
public:
int SizeBits;
const ICmpInst *CmpI;
Comparison(CompKind K, int SizeBits, const ICmpInst *CmpI)
- : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
+ : Kind(K), SizeBits(SizeBits), CmpI(CmpI) {}
CompKind getKind() const { return Kind; }
virtual ~Comparison() = default;
- bool areContiguous(const Comparison& Other) const;
+ bool areContiguous(const Comparison &Other) const;
bool operator<(const Comparison &Other) const;
};
// A comparison between a BCE atom and an integer constant.
-// If these BCE atoms are chained and access adjacent memory then they too can be merged, e.g.
+// If these BCE atoms are chained and access adjacent memory then they too can
+// be merged, e.g.
// ```
// int *p = ...;
// int a = p[0];
@@ -219,11 +221,12 @@ struct Comparison {
// ```
struct BCEConstCmp : public Comparison {
BCEAtom Lhs;
- Constant* Const;
+ Constant *Const;
- BCEConstCmp(BCEAtom L, Constant* Const, int SizeBits, const ICmpInst *CmpI)
- : Comparison(CK_ConstCmp, SizeBits,CmpI), Lhs(std::move(L)), Const(Const) {}
- static bool classof(const Comparison* C) {
+ BCEConstCmp(BCEAtom L, Constant *Const, int SizeBits, const ICmpInst *CmpI)
+ : Comparison(CK_ConstCmp, SizeBits, CmpI), Lhs(std::move(L)),
+ Const(Const) {}
+ static bool classof(const Comparison *C) {
return C->getKind() == CK_ConstCmp;
}
};
@@ -239,54 +242,58 @@ struct BCECmp : public Comparison {
BCEAtom Rhs;
BCECmp(BCEAtom L, BCEAtom R, int SizeBits, const ICmpInst *CmpI)
- : Comparison(CK_BceCmp, SizeBits,CmpI), Lhs(std::move(L)), Rhs(std::move(R)) {
+ : Comparison(CK_BceCmp, SizeBits, CmpI), Lhs(std::move(L)),
+ Rhs(std::move(R)) {
if (Rhs < Lhs) std::swap(Rhs, Lhs);
}
- static bool classof(const Comparison* C) {
- return C->getKind() == CK_BceCmp;
- }
+ static bool classof(const Comparison *C) { return C->getKind() == CK_BceCmp; }
};
// TODO: this can be improved to take alignment into account.
-bool Comparison::areContiguous(const Comparison& Other) const {
- assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+bool Comparison::areContiguous(const Comparison &Other) const {
+ assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) &&
+ "Comparisons are of same kind");
if (isa<BCEConstCmp>(this)) {
- const auto& First = cast<BCEConstCmp>(this);
- const auto& Second = cast<BCEConstCmp>(Other);
+ const auto &First = cast<BCEConstCmp>(this);
+ const auto &Second = cast<BCEConstCmp>(Other);
return First->Lhs.BaseId == Second.Lhs.BaseId &&
First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset;
}
- const auto& First = cast<BCECmp>(this);
- const auto& Second = cast<BCECmp>(Other);
+ const auto &First = cast<BCECmp>(this);
+ const auto &Second = cast<BCECmp>(Other);
return First->Lhs.BaseId == Second.Lhs.BaseId &&
First->Rhs.BaseId == Second.Rhs.BaseId &&
First->Lhs.Offset + First->SizeBits / 8 == Second.Lhs.Offset &&
First->Rhs.Offset + First->SizeBits / 8 == Second.Rhs.Offset;
}
-bool Comparison::operator<(const Comparison& Other) const {
- assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) && "Comparisons are of same kind");
+bool Comparison::operator<(const Comparison &Other) const {
+ assert(isa<BCEConstCmp>(this) == isa<BCEConstCmp>(Other) &&
+ "Comparisons are of same kind");
if (isa<BCEConstCmp>(this)) {
- const auto& First = cast<BCEConstCmp>(this);
- const auto& Second = cast<BCEConstCmp>(Other);
+ const auto &First = cast<BCEConstCmp>(this);
+ const auto &Second = cast<BCEConstCmp>(Other);
return First->Lhs < Second.Lhs;
}
- const auto& First = cast<BCECmp>(this);
- const auto& Second = cast<BCECmp>(Other);
- return std::tie(First->Lhs,First->Rhs) < std::tie(Second.Lhs,Second.Rhs);
+ const auto &First = cast<BCECmp>(this);
+ const auto &Second = cast<BCECmp>(Other);
+ return std::tie(First->Lhs, First->Rhs) < std::tie(Second.Lhs, Second.Rhs);
}
// Represents multiple comparisons inside of a single basic block.
-// This happens if multiple basic blocks have previously been merged into a single block using a select node.
+// This happens if multiple basic blocks have previously been merged into a
+// single block using a select node.
class IntraCmpChain {
- // TODO: this could probably be a unique-ptr but current impl relies on some copies
+ // TODO: this could probably be a unique-ptr but current impl relies on some
+ // copies
std::vector<std::shared_ptr<Comparison>> CmpChain;
public:
IntraCmpChain(std::shared_ptr<Comparison> C) : CmpChain{C} {}
IntraCmpChain combine(const IntraCmpChain OtherChain) {
- CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(), OtherChain.CmpChain.end());
+ CmpChain.insert(CmpChain.end(), OtherChain.CmpChain.begin(),
+ OtherChain.CmpChain.end());
return *this;
}
std::vector<std::shared_ptr<Comparison>> getCmpChain() const {
@@ -296,13 +303,12 @@ class IntraCmpChain {
// A basic block that contains one or more comparisons.
class MultBCECmpBlock {
- public:
- MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB, InstructionSet BlockInsts)
+public:
+ MultBCECmpBlock(std::vector<std::shared_ptr<Comparison>> Cmps, BasicBlock *BB,
+ InstructionSet BlockInsts)
: BB(BB), BlockInsts(std::move(BlockInsts)), Cmps(std::move(Cmps)) {}
- std::vector<std::shared_ptr<Comparison>> getCmps() {
- return Cmps;
- }
+ std::vector<std::shared_ptr<Comparison>> getCmps() { return Cmps; }
// Returns true if the block does other works besides comparison.
bool doesOtherWork() const;
@@ -335,24 +341,25 @@ class MultBCECmpBlock {
// split into the atom comparison part and the "other work" part
// (see canSplit()).
class SingleBCECmpBlock {
- public:
- SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder)
+public:
+ SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock *BB,
+ unsigned OrigOrder)
: BB(BB), OrigOrder(OrigOrder), Cmp(std::move(Cmp)) {}
- SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock* BB, unsigned OrigOrder,
+ SingleBCECmpBlock(std::shared_ptr<Comparison> Cmp, BasicBlock *BB,
+ unsigned OrigOrder,
llvm::SmallVector<Instruction *, 4> SplitInsts)
- : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)), SplitInsts(SplitInsts) {}
+ : BB(BB), OrigOrder(OrigOrder), RequireSplit(true), Cmp(std::move(Cmp)),
+ SplitInsts(SplitInsts) {}
- const BCEAtom* Lhs() const {
+ const BCEAtom *Lhs() const {
if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
return &BceConstCmp->Lhs;
auto *const BceCmp = cast<BCECmp>(Cmp.get());
return &BceCmp->Lhs;
}
- const Comparison* getCmp() const { return Cmp.get(); }
- bool operator<(const SingleBCECmpBlock &O) const {
- return *Cmp < *O.Cmp;
- }
+ const Comparison *getCmp() const { return Cmp.get(); }
+ bool operator<(const SingleBCECmpBlock &O) const { return *Cmp < *O.Cmp; }
// We can separate the BCE-cmp-block instructions and the non-BCE-cmp-block
// instructions. Split the old block and move all non-BCE-cmp-insts into the
@@ -372,7 +379,7 @@ class SingleBCECmpBlock {
};
bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
- AliasAnalysis &AA) const {
+ AliasAnalysis &AA) const {
// If this instruction may clobber the loads and is in middle of the BCE cmp
// block instructions, then bail for now.
if (Inst->mayWriteToMemory()) {
@@ -382,7 +389,7 @@ bool MultBCECmpBlock::canSinkBCECmpInst(const Instruction *Inst,
return (Inst->getParent() != LI->getParent() || !Inst->comesBefore(LI)) &&
isModSet(AA.getModRefInfo(Inst, MemoryLocation::get(LI)));
};
- auto CmpLoadsAreClobbered = [&](const auto& Cmp) {
+ auto CmpLoadsAreClobbered = [&](const auto &Cmp) {
if (auto *const BceConstCmp = dyn_cast<BCEConstCmp>(Cmp.get()))
return MayClobber(BceConstCmp->Lhs.LoadI);
auto *const BceCmp = cast<BCECmp>(Cmp.get());
@@ -427,9 +434,10 @@ bool MultBCECmpBlock::doesOtherWork() const {
return false;
}
-llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
+llvm::SmallVector<Instruction *, 4>
+MultBCECmpBlock::getAllSplitInsts(AliasAnalysis &AA) const {
llvm::SmallVector<Instruction *, 4> SplitInsts;
- for (Instruction& Inst : *BB) {
+ for (Instruction &Inst : *BB) {
if (BlockInsts.count(&Inst))
continue;
assert(canSinkBCECmpInst(&Inst, AA) && "Split unsplittable block");
@@ -440,12 +448,12 @@ llvm::SmallVector<Instruction *, 4> MultBCECmpBlock::getAllSplitInsts(AliasAnaly
return SplitInsts;
}
-
// Visit the given comparison. If this is a comparison between two valid
// BCE atoms, or between a BCE atom and a constant, returns the comparison.
-std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
- const ICmpInst::Predicate ExpectedPredicate,
- BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+std::optional<std::shared_ptr<Comparison>>
+visitICmp(const ICmpInst *const CmpI,
+ const ICmpInst::Predicate ExpectedPredicate, BaseIdentifier &BaseId,
+ InstructionSet *BlockInsts) {
// The comparison can only be used once:
// - For intermediate blocks, as a branch condition.
// - For the final block, as an incoming value for the Phi.
@@ -465,43 +473,51 @@ std::optional<std::shared_ptr<Comparison>> visitICmp(const ICmpInst *const CmpI,
if (!Lhs.BaseId)
return std::nullopt;
- // Second operand can either be load if doing compare between two BCE atoms or
+ // Second operand can either be load if doing compare between two BCE atoms or
// can be constant if comparing adjacent memory to constant
- auto* RhsOperand = CmpI->getOperand(1);
+ auto *RhsOperand = CmpI->getOperand(1);
const auto &DL = CmpI->getDataLayout();
int SizeBits = DL.getTypeSizeInBits(CmpI->getOperand(0)->getType());
BlockInsts->insert(CmpI);
- if (auto const& Const = dyn_cast<Constant>(RhsOperand))
- return std::make_shared<BCEConstCmp>(BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
+ if (auto const &Const = dyn_cast<Constant>(RhsOperand))
+ return std::make_shared<BCEConstCmp>(
+ BCEConstCmp(std::move(Lhs), Const, SizeBits, CmpI));
auto Rhs = visitICmpLoadOperand(RhsOperand, BaseId, BlockInsts);
if (!Rhs.BaseId)
return std::nullopt;
- return std::make_shared<BCECmp>(BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
+ return std::make_shared<BCECmp>(
+ BCECmp(std::move(Lhs), std::move(Rhs), SizeBits, CmpI));
}
-// Chain of comparisons inside a single basic block connected using `select` nodes.
-std::optional<IntraCmpChain> visitComparison(Value*, ICmpInst::Predicate, BaseIdentifier&, InstructionSet*);
+// Chain of comparisons inside a single basic block connected using `select`
+// nodes.
+std::optional<IntraCmpChain> visitComparison(Value *, ICmpInst::Predicate,
+ BaseIdentifier &,
+ InstructionSet *);
std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
- ICmpInst::Predicate ExpectedPredicate, BaseIdentifier& BaseId, InstructionSet *BlockInsts) {
+ ICmpInst::Predicate ExpectedPredicate,
+ BaseIdentifier &BaseId,
+ InstructionSet *BlockInsts) {
if (!SelectI->hasOneUse()) {
LLVM_DEBUG(dbgs() << "select has several uses\n");
return std::nullopt;
}
- auto* Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
- auto* Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
- auto const& Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
- auto const& ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
+ auto *Cmp1 = dyn_cast<ICmpInst>(SelectI->getOperand(0));
+ auto *Sel1 = dyn_cast<SelectInst>(SelectI->getOperand(0));
+ auto const &Cmp2 = dyn_cast<ICmpInst>(SelectI->getOperand(1));
+ auto const &ConstantI = dyn_cast<Constant>(SelectI->getOperand(2));
if (!(Cmp1 || Sel1) || !Cmp2 || !ConstantI || !ConstantI->isZeroValue())
return std::nullopt;
- auto Lhs = visitComparison(SelectI->getOperand(0),ExpectedPredicate,BaseId,BlockInsts);
+ auto Lhs = visitComparison(SelectI->getOperand(0), ExpectedPredicate, BaseId,
+ BlockInsts);
if (!Lhs)
return std::nullopt;
- auto Rhs = visitComparison(Cmp2,ExpectedPredicate,BaseId,BlockInsts);
+ auto Rhs = visitComparison(Cmp2, ExpectedPredicate, BaseId, BlockInsts);
if (!Rhs)
return std::nullopt;
@@ -509,8 +525,9 @@ std::optional<IntraCmpChain> visitSelect(const SelectInst *const SelectI,
return Lhs->combine(std::move(*Rhs));
}
-std::optional<IntraCmpChain> visitComparison(Value *Cond,
- ICmpInst::Predicate ExpectedPredicate,BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
+std::optional<IntraCmpChain>
+visitComparison(Value *Cond, ICmpInst::Predicate ExpectedPredicate,
+ BaseIdentifier &BaseId, InstructionSet *BlockInsts) {
if (auto *CmpI = dyn_cast<ICmpInst>(Cond)) {
auto CmpVisit = visitICmp(CmpI, ExpectedPredicate, BaseId, BlockInsts);
if (!CmpVisit)
@@ -526,9 +543,9 @@ std::optional<IntraCmpChain> visitComparison(Value *Cond,
// Visit the given comparison block. If this is a comparison between two valid
// BCE atoms, returns the comparison.
std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
- BasicBlock *const Block,
- const BasicBlock *const PhiBlock,
- BaseIdentifier &BaseId) {
+ BasicBlock *const Block,
+ const BasicBlock *const PhiBlock,
+ BaseIdentifier &BaseId) {
if (Block->empty())
return std::nullopt;
auto *const BranchI = dyn_cast<BranchInst>(Block->getTerminator());
@@ -560,7 +577,8 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
}
InstructionSet BlockInsts;
- std::optional<IntraCmpChain> Result = visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
+ std::optional<IntraCmpChain> Result =
+ visitComparison(Cond, ExpectedPredicate, BaseId, &BlockInsts);
if (!Result)
return std::nullopt;
@@ -568,34 +586,36 @@ std::optional<MultBCECmpBlock> visitCmpBlock(Value *const Val,
return MultBCECmpBlock(Result->getCmpChain(), Block, BlockInsts);
}
-void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock* BB) {
+void emitDebugInfo(std::shared_ptr<Comparison> Cmp, BasicBlock *BB) {
LLVM_DEBUG(dbgs() << "Block '" << BB->getName());
- if (auto* ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
+ if (auto *ConstCmp = dyn_cast<BCEConstCmp>(Cmp.get())) {
LLVM_DEBUG(dbgs() << "': Found constant-cmp of " << Cmp->SizeBits
- << " bits including " << ConstCmp->Lhs.BaseId << " + "
- << ConstCmp->Lhs.Offset << "\n");
+ << " bits including " << ConstCmp->Lhs.BaseId << " + "
+ << ConstCmp->Lhs.Offset << "\n");
return;
}
- auto* BceCmp = cast<BCECmp>(Cmp.get());
+ auto *BceCmp = cast<BCECmp>(Cmp.get());
LLVM_DEBUG(dbgs() << "': Found cmp of " << BceCmp->SizeBits
- << " bits between " << BceCmp->Lhs.BaseId << " + "
- << BceCmp->Lhs.Offset << " and "
- << BceCmp->Rhs.BaseId << " + "
- << BceCmp->Rhs.Offset << "\n");
+ << " bits between " << BceCmp->Lhs.BaseId << " + "
+ << BceCmp->Lhs.Offset << " and " << BceCmp->Rhs.BaseId
+ << " + " << BceCmp->Rhs.Offset << "\n");
}
// Enqueues all comparisons of a mult-block.
// If the block requires splitting then adds `OtherInsts` to the block too.
-static inline void enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
- MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA, bool RequireSplit) {
+static inline void
+enqueueSingleCmps(std::vector<SingleBCECmpBlock> &Comparisons,
+ MultBCECmpBlock &&CmpBlock, AliasAnalysis &AA,
+ bool RequireSplit) {
bool hasAlreadySplit = false;
- for (auto& Cmp : CmpBlock.getCmps()) {
+ for (auto &Cmp : CmpBlock.getCmps()) {
emitDebugInfo(Cmp, CmpBlock.BB);
unsigned OrigOrder = Comparisons.size();
if (RequireSplit && !hasAlreadySplit) {
hasAlreadySplit = true;
auto SplitInsts = CmpBlock.getAllSplitInsts(AA);
- Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
+ Comparisons.push_back(
+ SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder, SplitInsts));
continue;
}
Comparisons.push_back(SingleBCECmpBlock(Cmp, CmpBlock.BB, OrigOrder));
@@ -629,21 +649,21 @@ class BCECmpChain {
BasicBlock *EntryBlock_;
};
-
-// Returns true if a merge in the chain depends on a basic block where not every comparison is merged.
-// NOTE: This is pretty restrictive and could potentially be handled using an improved tradeoff heuristic.
+// Returns true if a merge in the chain depends on a basic block where not every
+// comparison is merged. NOTE: This is pretty restrictive and could potentially
+// be handled using an improved tradeoff heuristic.
bool BCECmpChain::multBlockOnlyPartiallyMerged() {
- llvm::SmallDenseSet<const BasicBlock*, 8> UnmergedBlocks, MergedBB;
+ llvm::SmallDenseSet<const BasicBlock *, 8> UnmergedBlocks, MergedBB;
- for (auto& Merged : MergedBlocks_) {
+ for (auto &Merged : MergedBlocks_) {
if (Merged.size() == 1) {
UnmergedBlocks.insert(Merged[0].BB);
continue;
}
- for (auto& C : Merged)
+ for (auto &C : Merged)
MergedBB.insert(C.BB);
}
- return llvm::any_of(MergedBB, [&](const BasicBlock* BB){
+ return llvm::any_of(MergedBB, [&](const BasicBlock *BB) {
return UnmergedBlocks.contains(BB);
});
}
@@ -655,39 +675,43 @@ static unsigned getMinOrigOrder(const BCECmpChain::ContiguousBlocks &Blocks) {
return MinOrigOrder;
}
-/// Given a chain of comparison blocks (of the same kind), groups the blocks into contiguous
-/// ranges that can be merged together into a single comparison.
-template<class RandomIt>
-static void mergeBlocks(RandomIt First, RandomIt Last,
- std::vector<BCECmpChain::ContiguousBlocks>* MergedBlocks) {
+/// Given a chain of comparison blocks (of the same kind), groups the blocks
+/// into contiguous ranges that can be merged together into a single comparison.
+template <class RandomIt>
+static void
+mergeBlocks(RandomIt First, RandomIt Last,
+ std::vector<BCECmpChain::ContiguousBlocks> *MergedBlocks) {
// Sort to detect continuous offsets.
- llvm::sort(First, Last,
- [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
- return LhsBlock < RhsBlock;
- });
+ llvm::sort(
+ First, Last,
+ [](const SingleBCECmpBlock &LhsBlock, const SingleBCECmpBlock &RhsBlock) {
+ return LhsBlock < RhsBlock;
+ });
BCECmpChain::ContiguousBlocks *LastMergedBlock = nullptr;
int Offset = MergedBlocks->size();
- for (auto& BlockIt = First; BlockIt != Last; ++BlockIt) {
- if (!LastMergedBlock || !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
+ for (auto &BlockIt = First; BlockIt != Last; ++BlockIt) {
+ if (!LastMergedBlock ||
+ !LastMergedBlock->back().getCmp()->areContiguous(*BlockIt->getCmp())) {
MergedBlocks->emplace_back();
LastMergedBlock = &MergedBlocks->back();
} else {
- LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName() << " into "
- << LastMergedBlock->back().BB->getName() << "\n");
+ LLVM_DEBUG(dbgs() << "Merging block " << BlockIt->BB->getName()
+ << " into " << LastMergedBlock->back().BB->getName()
+ << "\n");
}
LastMergedBlock->push_back(std::move(*BlockIt));
}
// While we allow reordering for merging, do not reorder unmerged comparisons.
// Doing so may introduce branch on poison.
- llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(), [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
- const BCECmpChain::ContiguousBlocks &RhsBlocks) {
- return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
- });
+ llvm::sort(MergedBlocks->begin() + Offset, MergedBlocks->end(),
+ [](const BCECmpChain::ContiguousBlocks &LhsBlocks,
+ const BCECmpChain::ContiguousBlocks &RhsBlocks) {
+ return getMinOrigOrder(LhsBlocks) < getMinOrigOrder(RhsBlocks);
+ });
}
-
BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
AliasAnalysis &AA)
: Phi_(Phi) {
@@ -759,7 +783,7 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
}
enqueueSingleCmps(Comparisons, std::move(*CmpBlock), AA, false);
}
-
+
// It is possible we have no suitable comparison to merge.
if (Comparisons.empty()) {
LLVM_DEBUG(dbgs() << "chain with no BCE basic blocks, no merge\n");
@@ -768,13 +792,17 @@ BCECmpChain::BCECmpChain(const std::vector<BasicBlock *> &Blocks, PHINode &Phi,
EntryBlock_ = Comparisons[0].BB;
- auto isConstCmp = [](SingleBCECmpBlock& C) { return isa<BCEConstCmp>(C.getCmp()); };
- auto BceIt = std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
+ auto isConstCmp = [](SingleBCECmpBlock &C) {
+ return isa<BCEConstCmp>(C.getCmp());
+ };
+ auto BceIt =
+ std::partition(Comparisons.begin(), Comparisons.end(), isConstCmp);
// The chain that requires splitting should always be first.
- // If no chain requires splitting then defaults to BCE-comparisons coming first.
+ // If no chain requires splitting then defaults to BCE-comparisons coming
+ // first.
if (std::any_of(Comparisons.begin(), BceIt,
- [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
+ [](const SingleBCECmpBlock &B) { return B.RequireSplit; })) {
mergeBlocks(Comparisons.begin(), BceIt, &MergedBlocks_);
mergeBlocks(BceIt, Comparisons.end(), &MergedBlocks_);
} else {
@@ -805,12 +833,11 @@ class MergedBlockName {
// Since multiple comparisons can come from the same basic block
// (when using select inst) don't want to repeat same name twice
UniqueVector<StringRef> UniqueNames;
- for (const auto& B : Comparisons)
+ for (const auto &B : Comparisons)
UniqueNames.insert(B.BB->getName());
- const int size = std::accumulate(UniqueNames.begin(), UniqueNames.end(), 0,
- [](int i, const StringRef &Name) {
- return i + Name.size();
- });
+ const int size = std::accumulate(
+ UniqueNames.begin(), UniqueNames.end(), 0,
+ [](int i, const StringRef &Name) { return i + Name.size(); });
if (size == 0)
return StringRef("", 0);
@@ -836,15 +863,10 @@ class MergedBlockName {
};
} // namespace
-
// Add a branch to the next basic block in the chain.
-void updateBranching(Value* CondResult,
- IRBuilder<>& Builder,
- BasicBlock *BB,
- BasicBlock *const NextCmpBlock,
- PHINode &Phi,
- LLVMContext &Context,
- const TargetLibraryInfo &TLI,
+void updateBranching(Value *CondResult, IRBuilder<> &Builder, BasicBlock *BB,
+ BasicBlock *const NextCmpBlock, PHINode &Phi,
+ LLVMContext &Context, const TargetLibraryInfo &TLI,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
BasicBlock *const PhiBB = Phi.getParent();
if (NextCmpBlock == PhiBB) {
@@ -862,29 +884,34 @@ void updateBranching(Value* CondResult,
}
// Builds global constant-struct to compare to pointer during memcmp().
-// Has to be global in order for expand-memcmp pass to be able to fold constants.
-GlobalVariable* buildConstantStruct(ArrayRef<SingleBCECmpBlock>& Comparisons, IRBuilder<>& Builder, LLVMContext &Context, Module& M) {
- std::vector<Constant*> Constants;
- std::vector<Type*> Types;
-
- for (const auto& BceBlock : Comparisons) {
- assert(isa<BCEConstCmp>(BceBlock.getCmp()) && "Const-cmp-chain can only contain const comparisons");
- auto* ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
+// Has to be global in order for expand-memcmp pass to be able to fold
+// constants.
+GlobalVariable *buildConstantStruct(ArrayRef<SingleBCECmpBlock> &Comparisons,
+ IRBuilder<> &Builder, LLVMContext &Context,
+ Module &M) {
+ std::vector<Constant *> Constants;
+ std::vector<Type *> Types;
+
+ for (const auto &BceBlock : Comparisons) {
+ assert(isa<BCEConstCmp>(BceBlock.getCmp()) &&
+ "Const-cmp-chain can only contain const comparisons");
+ auto *ConstCmp = cast<BCEConstCmp>(BceBlock.getCmp());
Constants.emplace_back(ConstCmp->Const);
Types.emplace_back(ConstCmp->Lhs.LoadI->getType());
}
- auto* StructType = StructType::get(Context, Types, /* currently only matches packed offsets */ true);
+ auto *StructType = StructType::get(
+ Context, Types, /* currently only matches packed offsets */ true);
auto *StructConstant = ConstantStruct::get(StructType, Constants);
- return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage, StructConstant, "memcmp_const_op");
+ return new GlobalVariable(M, StructType, true, GlobalVariable::PrivateLinkage,
+ StructConstant, "memcmp_const_op");
}
// Merges the given contiguous comparison blocks into one memcmp block.
static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
BasicBlock *const InsertBefore,
BasicBlock *const NextCmpBlock,
- PHINode &Phi,
- LLVMContext &Context,
+ PHINode &Phi, LLVMContext &Context,
const TargetLibraryInfo &TLI,
AliasAnalysis &AA, DomTreeUpdater &DTU) {
assert(Comparisons.size() > 1 && "merging multiple comparisons");
@@ -905,7 +932,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
if (isa<BCEConstCmp>(FirstCmp.getCmp())) {
Rhs = buildConstantStruct(Comparisons, Builder, Context, *Phi.getModule());
} else {
- auto* FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
+ auto *FirstBceCmp = cast<BCECmp>(FirstCmp.getCmp());
if (FirstBceCmp->Rhs.GEP)
Rhs = Builder.Insert(FirstBceCmp->Rhs.GEP->clone());
else
@@ -917,7 +944,7 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
// If there is one block that requires splitting, we do it now, i.e.
// just before we know we will collapse the chain. The instructions
// can be executed before any of the instructions in the chain.
- const auto* ToSplit = llvm::find_if(
+ const auto *ToSplit = llvm::find_if(
Comparisons, [](const SingleBCECmpBlock &B) { return B.RequireSplit; });
if (ToSplit != Comparisons.end()) {
LLVM_DEBUG(dbgs() << "Splitting non_BCE work to header\n");
@@ -927,9 +954,11 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
// memcmp expects a 'size_t' argument and returns 'int'.
unsigned SizeTBits = TLI.getSizeTSize(*Phi.getModule());
unsigned IntBits = TLI.getIntSize();
- const unsigned TotalSizeBits = std::accumulate(
- Comparisons.begin(), Comparisons.end(), 0u,
- [](int Size, const SingleBCECmpBlock &C) { return Size + C.getCmp()->SizeBits; });
+ const unsigned TotalSizeBits =
+ std::accumulate(Comparisons.begin(), Comparisons.end(), 0u,
+ [](int Size, const SingleBCECmpBlock &C) {
+ return Size + C.getCmp()->SizeBits;
+ });
// Create memcmp() == 0.
const auto &DL = Phi.getDataLayout();
@@ -937,26 +966,26 @@ static BasicBlock *mergeComparisons(ArrayRef<SingleBCECmpBlock> Comparisons,
Lhs, Rhs,
ConstantInt::get(Builder.getIntNTy(SizeTBits), TotalSizeBits / 8),
Builder, DL, &TLI);
- Value* IsEqual = Builder.CreateICmpEQ(
+ Value *IsEqual = Builder.CreateICmpEQ(
MemCmpCall, ConstantInt::get(Builder.getIntNTy(IntBits), 0));
- updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+ updateBranching(IsEqual, Builder, BB, NextCmpBlock, Phi, Context, TLI, AA,
+ DTU);
return BB;
}
// Keep existing block if it isn't merged. Only change the branches.
// Also handles not splitting mult-blocks that use select instructions.
static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
- BasicBlock *const InsertBefore,
- BasicBlock *const NextCmpBlock,
- PHINode &Phi,
- LLVMContext &Context,
- const TargetLibraryInfo &TLI,
- AliasAnalysis &AA, DomTreeUpdater &DTU) {
- BasicBlock *MultBB = BasicBlock::Create(Context, BB->getName(),
- NextCmpBlock->getParent(), InsertBefore);
+ BasicBlock *const InsertBefore,
+ BasicBlock *const NextCmpBlock,
+ PHINode &Phi, LLVMContext &Context,
+ const TargetLibraryInfo &TLI,
+ AliasAnalysis &AA, DomTreeUpdater &DTU) {
+ BasicBlock *MultBB = BasicBlock::Create(
+ Context, BB->getName(), NextCmpBlock->getParent(), InsertBefore);
auto *const BranchI = cast<BranchInst>(BB->getTerminator());
- Value* CondResult = nullptr;
+ Value *CondResult = nullptr;
if (BranchI->isUnconditional())
CondResult = Phi.getIncomingValueForBlock(BB);
else
@@ -964,7 +993,8 @@ static BasicBlock *updateOriginalBlock(BasicBlock *const BB,
// Transfer all instructions except the branching terminator to the new block.
MultBB->splice(MultBB->end(), BB, BB->begin(), std::prev(BB->end()));
IRBuilder<> Builder(MultBB);
- updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI, AA, DTU);
+ updateBranching(CondResult, Builder, MultBB, NextCmpBlock, Phi, Context, TLI,
+ AA, DTU);
return MultBB;
}
@@ -979,7 +1009,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
// so that the next block is always available to branch to.
BasicBlock *InsertBefore = EntryBlock_;
BasicBlock *NextCmpBlock = Phi_.getParent();
- SmallDenseSet<const BasicBlock*, 8> ExistingBlocksToKeep;
+ SmallDenseSet<const BasicBlock *, 8> ExistingBlocksToKeep;
LLVMContext &Context = NextCmpBlock->getContext();
for (const auto &Cmps : reverse(MergedBlocks_)) {
// If there is only a single comparison then nothing should
@@ -992,7 +1022,7 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
continue;
ExistingBlocksToKeep.insert(BB);
InsertBefore = NextCmpBlock = updateOriginalBlock(
- BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
+ BB, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
} else {
InsertBefore = NextCmpBlock = mergeComparisons(
Cmps, InsertBefore, NextCmpBlock, Phi_, Context, TLI, AA, DTU);
@@ -1027,7 +1057,8 @@ bool BCECmpChain::simplify(const TargetLibraryInfo &TLI, AliasAnalysis &AA,
SmallVector<BasicBlock *, 16> DeadBlocks;
for (const auto &Blocks : MergedBlocks_) {
for (const SingleBCECmpBlock &Block : Blocks) {
- // Many single blocks can refer to the same multblock coming from an select instruction.
+ // Many single blocks can refer to the same multblock coming from an
+ // select instruction.
// TODO: preferrably use a set instead
if (llvm::is_contained(DeadBlocks, Block.BB))
continue;
@@ -1077,11 +1108,10 @@ std::vector<BasicBlock *> getOrderedBlocks(PHINode &Phi,
return Blocks;
}
-template<typename T>
-bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
- auto* IncomingValue = Phi.getIncomingValue(I);
+template <typename T> bool isInvalidPrevBlock(PHINode &Phi, unsigned I) {
+ auto *IncomingValue = Phi.getIncomingValue(I);
return !isa<T>(IncomingValue) ||
- cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
+ cast<T>(IncomingValue)->getParent() != Phi.getIncomingBlock(I);
}
bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
@@ -1115,7 +1145,8 @@ bool processPhi(PHINode &Phi, const TargetLibraryInfo &TLI, AliasAnalysis &AA,
LLVM_DEBUG(dbgs() << "skip: several non-constant values\n");
return false;
}
- if (isInvalidPrevBlock<ICmpInst>(Phi,I) && isInvalidPrevBlock<SelectInst>(Phi,I)) {
+ if (isInvalidPrevBlock<ICmpInst>(Phi, I) &&
+ isInvalidPrevBlock<SelectInst>(Phi, I)) {
// Non-constant incoming value is not from a cmp instruction or not
// produced by the last block. We could end up processing the value
// producing block more than once.
>From 221fe3506eae132dea2b93954a9515fb84c1f2db Mon Sep 17 00:00:00 2001
From: PhilippR <phil.rados at gmail.com>
Date: Mon, 31 Mar 2025 23:49:31 +0200
Subject: [PATCH 23/23] [MergerICmps] Fixed global var removal for failing
memcmp codegen tests
---
llvm/lib/CodeGen/ExpandMemCmp.cpp | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)
diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index 323d34f838b27..c6f7f850c29fb 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -883,25 +883,16 @@ static bool expandMemCmp(CallInst *CI, const TargetTransformInfo *TTI,
CI->replaceAllUsesWith(Res);
CI->eraseFromParent();
- // If the mergeicmps pass used a global constant to merge comparisons and
- // the the global constants were folded then the variable can be deleted
+ // If the memcmp call used a global constant to merge comparisons and
+ // the global constant was folded then the variable can be deleted
// since it isn't used anymore.
- if (GV && GV->hasPrivateLinkage() && GV->isConstant()) {
- // NOTE: There is still a use lingering around but that use itself isn't
- // used so it is fine to erase this instruction.
- static bool (*hasActiveUses)(Value *) = [](Value *V) {
- for (User *U : V->users()) {
- if (hasActiveUses(U))
- return true;
- }
- return false;
- };
- if (!hasActiveUses(GV)) {
- LLVM_DEBUG(
- dbgs() << "Removing global constant " << GV->getName()
- << " that was introduced by the previous mergeicmps pass\n");
- GV->eraseFromParent();
- }
+ // This is mostly done when mergeicmps used a global constant to merge
+ // constant comparisons.
+ if (GV && GV->hasPrivateLinkage() && GV->isConstant() &&
+ !GV->isConstantUsed()) {
+ LLVM_DEBUG(dbgs() << "Removing global constant " << GV->getName()
+ << " that was used by the dead memcmp() call\n");
+ GV->eraseFromParent();
}
}
More information about the llvm-commits
mailing list