[llvm] Revert "[LSV] Merge contiguous chains across scalar types" (PR #170381)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 2 14:25:29 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-globalisel
Author: Drew Kersnar (dakersnar)
<details>
<summary>Changes</summary>
Reverts llvm/llvm-project#<!-- -->154069. I pointed out a number of issues post-merge, most importantly examples of miscompiles: https://github.com/llvm/llvm-project/pull/154069#issuecomment-3603854626.
While the motivation of the change is clear, I think the implementation approach is flawed. It seems like the goal is to allow elements like `load <2xi16>` and `load i32` to be vectorized together despite the current algorithm not grouping them into the same equivalence classes. I personally think that if we want to attempt this it should be a more wholistic approach, maybe even redefining the concept of an equivalence class. This current solution seems like it would be really hard to do bug-free, and even if the bugs were present, it is only able to merge chains that happen to be adjacent to each other after `splitChainByContiguity`. But we can discuss more in the re-land. Maybe the broader approach I'm proposing is too difficult, and a narrow optimization is worthwhile. Regardless, this should be reverted, it needs more iteration before it is correct.
---
Patch is 411.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170381.diff
50 Files Affected:
- (modified) llvm/include/llvm/Transforms/Utils/Local.h (+1-1)
- (modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+1-1)
- (modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+2-2)
- (modified) llvm/lib/Transforms/Scalar/SROA.cpp (+1-1)
- (modified) llvm/lib/Transforms/Utils/Local.cpp (+22-38)
- (modified) llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp (+63-219)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/amdgpu-irtranslator.ll (-2)
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+129-133)
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll (+792-825)
- (modified) llvm/test/CodeGen/AMDGPU/build_vector.ll (+15-16)
- (modified) llvm/test/CodeGen/AMDGPU/fabs.bf16.ll (+40-18)
- (modified) llvm/test/CodeGen/AMDGPU/fabs.f16.ll (+14-14)
- (modified) llvm/test/CodeGen/AMDGPU/fabs.ll (+15-16)
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll (+26-28)
- (modified) llvm/test/CodeGen/AMDGPU/fdiv.ll (+162-171)
- (modified) llvm/test/CodeGen/AMDGPU/fnearbyint.ll (+11-12)
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.bf16.ll (+39-45)
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.f16.ll (+7-7)
- (modified) llvm/test/CodeGen/AMDGPU/fneg-fabs.ll (+15-16)
- (modified) llvm/test/CodeGen/AMDGPU/fneg.ll (+15-16)
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_sint.ll (+49-57)
- (modified) llvm/test/CodeGen/AMDGPU/fp_to_uint.ll (+30-39)
- (modified) llvm/test/CodeGen/AMDGPU/fshl.ll (+149-141)
- (modified) llvm/test/CodeGen/AMDGPU/fshr.ll (+206-197)
- (modified) llvm/test/CodeGen/AMDGPU/global_atomics.ll (-82)
- (modified) llvm/test/CodeGen/AMDGPU/half.ll (+109-65)
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+27-27)
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll (+6-6)
- (modified) llvm/test/CodeGen/AMDGPU/kernel-args.ll (+12-12)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx950.ll (+120-192)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.ll (+58-59)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.ll (+58-59)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.ll (+33-34)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log.ll (+56-57)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log10.ll (+56-57)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.log2.ll (+12-14)
- (modified) llvm/test/CodeGen/AMDGPU/max.ll (+16-18)
- (modified) llvm/test/CodeGen/AMDGPU/min.ll (+108-108)
- (modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+30-58)
- (modified) llvm/test/CodeGen/AMDGPU/rotl.ll (+36-38)
- (modified) llvm/test/CodeGen/AMDGPU/rotr.ll (+28-30)
- (modified) llvm/test/CodeGen/AMDGPU/s_addk_i32.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll (+63-64)
- (modified) llvm/test/CodeGen/AMDGPU/store-to-constant.ll (+2-4)
- (modified) llvm/test/CodeGen/AMDGPU/udivrem.ll (+61-65)
- (modified) llvm/test/CodeGen/AMDGPU/uint_to_fp.f64.ll (+1-1)
- (removed) llvm/test/Transforms/InstCombine/copy-access-metadata.ll (-215)
- (removed) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/copy-metadata-load-store.ll (-159)
- (modified) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors-complex.ll (+66-258)
- (modified) llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-vectors.ll (+3-281)
``````````diff
diff --git a/llvm/include/llvm/Transforms/Utils/Local.h b/llvm/include/llvm/Transforms/Utils/Local.h
index d0af2d3d2e4c2..9acfd872e574b 100644
--- a/llvm/include/llvm/Transforms/Utils/Local.h
+++ b/llvm/include/llvm/Transforms/Utils/Local.h
@@ -431,7 +431,7 @@ LLVM_ABI void combineAAMetadata(Instruction *K, const Instruction *J);
/// Copy the metadata from the source instruction to the destination (the
/// replacement for the source instruction).
-LLVM_ABI void copyMetadataForAccess(Instruction &Dest, Instruction &Source);
+LLVM_ABI void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source);
/// Patch the replacement so that it is not more restrictive than the value
/// being replaced. It assumes that the replacement does not get moved from
diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 98884c441096e..fdff21b6ef8df 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1035,7 +1035,7 @@ bool LegalizeBufferContentTypesVisitor::visitLoadImpl(
LoadInst *NewLI = IRB.CreateAlignedLoad(
LoadableType, NewPtr, commonAlignment(OrigLI.getAlign(), ByteOffset),
Name + ".off." + Twine(ByteOffset));
- copyMetadataForAccess(*NewLI, OrigLI);
+ copyMetadataForLoad(*NewLI, OrigLI);
NewLI->setAAMetadata(
AANodes.adjustForAccess(ByteOffset, LoadableType, DL));
NewLI->setAtomic(OrigLI.getOrdering(), OrigLI.getSyncScopeID());
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index 3e04aeb675d2a..9491610190c10 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -415,7 +415,7 @@ void PointerReplacer::replace(Instruction *I) {
LT->getAlign(), LT->getOrdering(),
LT->getSyncScopeID());
NewI->takeName(LT);
- copyMetadataForAccess(*NewI, *LT);
+ copyMetadataForLoad(*NewI, *LT);
IC.InsertNewInstWith(NewI, LT->getIterator());
IC.replaceInstUsesWith(*LT, NewI);
@@ -606,7 +606,7 @@ LoadInst *InstCombinerImpl::combineLoadToNewType(LoadInst &LI, Type *NewTy,
Builder.CreateAlignedLoad(NewTy, LI.getPointerOperand(), LI.getAlign(),
LI.isVolatile(), LI.getName() + Suffix);
NewLoad->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
- copyMetadataForAccess(*NewLoad, LI);
+ copyMetadataForLoad(*NewLoad, LI);
return NewLoad;
}
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index a7c322bfcb981..70afe833c9f47 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -3272,7 +3272,7 @@ class AllocaSliceRewriter : public InstVisitor<AllocaSliceRewriter, bool> {
// Copy any metadata that is valid for the new load. This may require
// conversion to a different kind of metadata, e.g. !nonnull might change
// to !range or vice versa.
- copyMetadataForAccess(*NewLI, LI);
+ copyMetadataForLoad(*NewLI, LI);
// Do this after copyMetadataForLoad() to preserve the TBAA shift.
if (AATags)
diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp
index dec2e019333b9..a03cf6e953e35 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3100,70 +3100,54 @@ void llvm::combineAAMetadata(Instruction *K, const Instruction *J) {
combineMetadata(K, J, /*DoesKMove=*/true, /*AAOnly=*/true);
}
-void llvm::copyMetadataForAccess(Instruction &DestI, Instruction &SourceI) {
+void llvm::copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source) {
SmallVector<std::pair<unsigned, MDNode *>, 8> MD;
- SourceI.getAllMetadata(MD);
- MDBuilder MDB(DestI.getContext());
- Type *NewType = DestI.getType();
-
- // Only needed for range metadata on loads.
- const DataLayout *DL = nullptr;
- const LoadInst *LSource = dyn_cast<LoadInst>(&SourceI);
- if (LSource)
- DL = &LSource->getDataLayout();
-
+ Source.getAllMetadata(MD);
+ MDBuilder MDB(Dest.getContext());
+ Type *NewType = Dest.getType();
+ const DataLayout &DL = Source.getDataLayout();
for (const auto &MDPair : MD) {
unsigned ID = MDPair.first;
MDNode *N = MDPair.second;
-
+ // Note, essentially every kind of metadata should be preserved here! This
+ // routine is supposed to clone a load instruction changing *only its type*.
+ // The only metadata it makes sense to drop is metadata which is invalidated
+ // when the pointer type changes. This should essentially never be the case
+ // in LLVM, but we explicitly switch over only known metadata to be
+ // conservatively correct. If you are adding metadata to LLVM which pertains
+ // to loads, you almost certainly want to add it here.
switch (ID) {
- // Applies to both loads and stores as-is.
case LLVMContext::MD_dbg:
+ case LLVMContext::MD_tbaa:
case LLVMContext::MD_prof:
+ case LLVMContext::MD_fpmath:
case LLVMContext::MD_tbaa_struct:
+ case LLVMContext::MD_invariant_load:
case LLVMContext::MD_alias_scope:
case LLVMContext::MD_noalias:
case LLVMContext::MD_nontemporal:
+ case LLVMContext::MD_mem_parallel_loop_access:
case LLVMContext::MD_access_group:
case LLVMContext::MD_noundef:
case LLVMContext::MD_noalias_addrspace:
- case LLVMContext::MD_mem_parallel_loop_access:
- DestI.setMetadata(ID, N);
- break;
-
- // Load-only metadata.
- case LLVMContext::MD_fpmath:
- case LLVMContext::MD_invariant_load:
- if (isa<LoadInst>(DestI))
- DestI.setMetadata(ID, N);
+ // All of these directly apply.
+ Dest.setMetadata(ID, N);
break;
case LLVMContext::MD_nonnull:
- if (auto *LDest = dyn_cast<LoadInst>(&DestI)) {
- if (LSource)
- copyNonnullMetadata(*LSource, N, *LDest);
- }
+ copyNonnullMetadata(Source, N, Dest);
break;
case LLVMContext::MD_align:
case LLVMContext::MD_dereferenceable:
case LLVMContext::MD_dereferenceable_or_null:
- // Applies to both loads and stores only if the new type is also a
- // pointer.
+ // These only directly apply if the new type is also a pointer.
if (NewType->isPointerTy())
- DestI.setMetadata(ID, N);
+ Dest.setMetadata(ID, N);
break;
case LLVMContext::MD_range:
- if (auto *LDest = dyn_cast<LoadInst>(&DestI)) {
- if (LSource && DL)
- copyRangeMetadata(*DL, *LSource, N, *LDest);
- }
- break;
-
- case LLVMContext::MD_tbaa:
- if (isa<LoadInst>(DestI))
- DestI.setMetadata(ID, N);
+ copyRangeMetadata(DL, Source, N, Dest);
break;
}
}
diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
index 114df653bad83..c28314f6ab124 100644
--- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
@@ -112,7 +112,6 @@
#include <optional>
#include <tuple>
#include <type_traits>
-#include <unordered_map>
#include <utility>
#include <vector>
@@ -269,6 +268,11 @@ class Vectorizer {
/// isGuaranteedToTransferExecutionToSuccessor(I) == true.
bool runOnPseudoBB(BasicBlock::iterator Begin, BasicBlock::iterator End);
+ /// Runs the vectorizer on one equivalence class, i.e. one set of loads/stores
+ /// in the same BB with the same value for getUnderlyingObject() etc.
+ bool runOnEquivalenceClass(const EqClassKey &EqClassKey,
+ ArrayRef<Instruction *> EqClass);
+
/// Runs the vectorizer on one chain, i.e. a subset of an equivalence class
/// where all instructions access a known, constant offset from the first
/// instruction.
@@ -334,22 +338,12 @@ class Vectorizer {
EquivalenceClassMap collectEquivalenceClasses(BasicBlock::iterator Begin,
BasicBlock::iterator End);
- /// Inserts a cast instruction to convert Inst to DstTy.
- Value *insertCast(Value *Val, Type *DstTy);
-
/// Partitions Instrs into "chains" where every instruction has a known
/// constant offset from the first instr in the chain.
///
/// Postcondition: For all i, ret[i][0].second == 0, because the first instr
/// in the chain is the leader, and an instr touches distance 0 from itself.
std::vector<Chain> gatherChains(ArrayRef<Instruction *> Instrs);
-
- // Helpers for chain merging.
- std::optional<APInt> computeLeaderDelta(Instruction *I1, Instruction *I2);
- bool chainsOverlapAfterRebase(const Chain &A, const Chain &B,
- const APInt &Delta) const;
- static void rebaseChain(Chain &C, const APInt &Delta);
- void normalizeChainToType(Chain &C, Type *CastTy);
};
class LoadStoreVectorizerLegacyPass : public FunctionPass {
@@ -431,20 +425,6 @@ PreservedAnalyses LoadStoreVectorizerPass::run(Function &F,
return Changed ? PA : PreservedAnalyses::all();
}
-static const Value *getUnderlyingObject(const Value *Ptr) {
- const Value *ObjPtr = llvm::getUnderlyingObject(Ptr);
- if (const auto *Sel = dyn_cast<SelectInst>(ObjPtr)) {
- // The select's themselves are distinct instructions even if they share
- // the same condition and evaluate to consecutive pointers for true and
- // false values of the condition. Therefore using the select's themselves
- // for grouping instructions would put consecutive accesses into different
- // lists and they won't be even checked for being consecutive, and won't
- // be vectorized.
- return Sel->getCondition();
- }
- return ObjPtr;
-}
-
bool Vectorizer::run() {
bool Changed = false;
// Break up the BB if there are any instrs which aren't guaranteed to transfer
@@ -488,88 +468,6 @@ bool Vectorizer::run() {
return Changed;
}
-Value *Vectorizer::insertCast(Value *Val, Type *DstTy) {
- if (DL.getTypeSizeInBits(Val->getType()) == DL.getTypeSizeInBits(DstTy)) {
- return Builder.CreateBitOrPointerCast(Val, DstTy, Val->getName() + ".bc");
- }
-
- // If the types are of different sizes and both are integers, we can use
- // zext or sext to cast.
- if (Val->getType()->isIntegerTy() && DstTy->isIntegerTy()) {
- if (DL.getTypeSizeInBits(Val->getType()) < DL.getTypeSizeInBits(DstTy)) {
- return Builder.CreateZExt(Val, DstTy, Val->getName() + ".bc");
- }
- return Builder.CreateTrunc(Val, DstTy, Val->getName() + ".bc");
- }
-
- return nullptr;
-}
-
-std::optional<APInt> Vectorizer::computeLeaderDelta(Instruction *I1,
- Instruction *I2) {
- assert(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) ||
- (isa<StoreInst>(I1) && isa<StoreInst>(I2))) &&
- "computeLeaderDelta must be called with two load or two store "
- "instructions");
- Instruction *CtxInst = I1->comesBefore(I2) ? I2 : I1;
- const Value *Ptr1 = getLoadStorePointerOperand(I1);
- const Value *Ptr2 = getLoadStorePointerOperand(I2);
- return getConstantOffset(const_cast<Value *>(Ptr1), const_cast<Value *>(Ptr2),
- CtxInst);
-}
-
-bool Vectorizer::chainsOverlapAfterRebase(const Chain &A, const Chain &B,
- const APInt &Delta) const {
- ConstantRange ARange(
- A.front().OffsetFromLeader,
- A.back().OffsetFromLeader +
- DL.getTypeStoreSize(getLoadStoreType(A.back().Inst)));
- ConstantRange BRange(
- B.front().OffsetFromLeader + Delta,
- B.back().OffsetFromLeader + Delta +
- DL.getTypeStoreSize(getLoadStoreType(B.back().Inst)));
- return !ARange.intersectWith(BRange).isEmptySet();
-}
-
-void Vectorizer::rebaseChain(Chain &C, const APInt &Delta) {
- for (ChainElem &E : C)
- E.OffsetFromLeader += Delta;
-}
-
-void Vectorizer::normalizeChainToType(Chain &C, Type *CastTy) {
- for (ChainElem &Elem : C) {
- Instruction *Inst = Elem.Inst;
- Type *OrigValTy = getLoadStoreType(Inst);
- if (OrigValTy == CastTy)
- continue;
-
- if (auto *LI = dyn_cast<LoadInst>(Inst)) {
- Builder.SetInsertPoint(LI);
- LoadInst *NewLoad = Builder.CreateLoad(CastTy, LI->getPointerOperand(),
- LI->getName() + ".mut");
- copyMetadataForAccess(*NewLoad, *LI);
- Value *CastBack = insertCast(NewLoad, OrigValTy);
- if (!CastBack)
- llvm_unreachable("Failed to insert cast");
- LI->replaceAllUsesWith(CastBack);
- ToErase.emplace_back(LI);
- Elem.Inst = NewLoad;
- } else if (auto *SI = dyn_cast<StoreInst>(Inst)) {
- Builder.SetInsertPoint(SI);
- Value *CastVal = insertCast(SI->getValueOperand(), CastTy);
- if (!CastVal)
- llvm_unreachable("Failed to insert cast");
- StoreInst *NewStore =
- Builder.CreateStore(CastVal, SI->getPointerOperand());
- NewStore->setAlignment(SI->getAlign());
- NewStore->setVolatile(SI->isVolatile());
- copyMetadataForAccess(*NewStore, *SI);
- ToErase.emplace_back(SI);
- Elem.Inst = NewStore;
- }
- }
-}
-
bool Vectorizer::runOnPseudoBB(BasicBlock::iterator Begin,
BasicBlock::iterator End) {
LLVM_DEBUG({
@@ -582,120 +480,49 @@ bool Vectorizer::runOnPseudoBB(BasicBlock::iterator Begin,
});
bool Changed = false;
- SmallVector<Chain> ContiguousSubChains;
-
for (const auto &[EqClassKey, EqClass] :
- collectEquivalenceClasses(Begin, End)) {
-
- LLVM_DEBUG({
- dbgs() << "LSV: Running on equivalence class of size " << EqClass.size()
- << " keyed on " << EqClassKey << ":\n";
- for (Instruction *I : EqClass)
- dbgs() << " " << *I << "\n";
- });
-
- for (Chain &C : gatherChains(EqClass)) {
-
- // Split up the chain into increasingly smaller chains, until we can
- // finally vectorize the chains.
- //
- // (Don't be scared by the depth of the loop nest here. These operations
- // are all at worst O(n lg n) in the number of instructions, and splitting
- // chains doesn't change the number of instrs. So the whole loop nest is
- // O(n lg n).)
- for (auto &C : splitChainByMayAliasInstrs(C)) {
- for (auto &C : splitChainByContiguity(C)) {
- ContiguousSubChains.emplace_back(C);
- }
- }
- }
- }
-
- // Merge chains in reverse order, so that the first chain is the largest.
- for (int I = ContiguousSubChains.size() - 1; I > 0; I--) {
- Chain &C1 = ContiguousSubChains[I - 1];
- Chain &C2 = ContiguousSubChains[I];
+ collectEquivalenceClasses(Begin, End))
+ Changed |= runOnEquivalenceClass(EqClassKey, EqClass);
- // If the scalar types of the chains are the same, we can merge them
- // without inserting any casts.
- if (getLoadStoreType(C1[0].Inst)->getScalarType() ==
- getLoadStoreType(C2[0].Inst)->getScalarType())
- continue;
-
- const Value *C1Ptr = getLoadStorePointerOperand(C1[0].Inst);
- const Value *C2Ptr = getLoadStorePointerOperand(C2[0].Inst);
- unsigned AS1 = C1Ptr->getType()->getPointerAddressSpace();
- unsigned AS2 = C2Ptr->getType()->getPointerAddressSpace();
- bool C1IsLoad = isa<LoadInst>(C1[0].Inst);
- bool C2IsLoad = isa<LoadInst>(C2[0].Inst);
-
- // If the chains are mapped to different types, have distinct underlying
- // pointer objects, or include both loads and stores, skip.
- if (C1IsLoad != C2IsLoad || AS1 != AS2 ||
- ::getUnderlyingObject(C1Ptr) != ::getUnderlyingObject(C2Ptr))
- continue;
-
- // Compute constant offset between chain leaders; if unknown, skip.
- std::optional<APInt> DeltaOpt = computeLeaderDelta(C1[0].Inst, C2[0].Inst);
- if (!DeltaOpt)
- continue;
-
- // Check that rebasing C2 into C1's coordinate space will not overlap C1.
- if (chainsOverlapAfterRebase(C1, C2, *DeltaOpt))
- continue;
-
- // Determine the common integer cast type for normalization and ensure total
- // bitwidth matches across all elements of both chains.
- Type *C1ElemTy = getLoadStoreType(C1[0].Inst);
- unsigned TotalBits = DL.getTypeSizeInBits(C1ElemTy);
- auto AllElemsMatchTotalBits = [&](const Chain &C) {
- return llvm::all_of(C, [&](const ChainElem &E) {
- return DL.getTypeSizeInBits(getLoadStoreType(E.Inst)) == TotalBits;
- });
- };
- if (!AllElemsMatchTotalBits(C1) || !AllElemsMatchTotalBits(C2))
- continue;
+ return Changed;
+}
- // Power-of-two span ensures we can form a legal, single vector access
- // without padding or splitting. Many targets and cost models assume POT
- // widths, and it guarantees an integral element count for the chosen
- // VecElemTy.
- APInt Sz = C2.front().OffsetFromLeader +
- DL.getTypeStoreSize(getLoadStoreType(C2.front().Inst)) -
- C1.back().OffsetFromLeader + *DeltaOpt;
- if (!Sz.isPowerOf2())
- continue;
+bool Vectorizer::runOnEquivalenceClass(const EqClassKey &EqClassKey,
+ ArrayRef<Instruction *> EqClass) {
+ bool Changed = false;
- // Rebase C2's offsets into C1's coordinate space prior to merging and
- // merge C2 into C1 by appending all elements of C2 to C1, then erase C2
- // from ContiguousSubChains.
- rebaseChain(C2, *DeltaOpt);
- C1.insert(C1.end(), C2.begin(), C2.end());
- ContiguousSubChains.erase(ContiguousSubChains.begin() + I);
-
- // Normalize the value operand/result type of each instruction in C1 to
- // C1CastTy.
- Type *C1CastTy =
- Type::getIntNTy(C1ElemTy->getContext(), DL.getTypeSizeInBits(C1ElemTy));
- normalizeChainToType(C1, C1CastTy);
- }
+ LLVM_DEBUG({
+ dbgs() << "LSV: Running on equivalence class of size " << EqClass.size()
+ << " keyed on " << EqClassKey << ":\n";
+ for (Instruction *I : EqClass)
+ dbgs() << " " << *I << "\n";
+ });
- for (auto &C : ContiguousSubChains) {
- if (C.size() <= 1)
- continue;
- for (auto &AlignedSubChain : splitChainByAlignment(C))
- Changed |= vectorizeChain(AlignedSubChain);
- }
+ std::vector<Chain> Chains = gatherChains(EqClass);
+ LLVM_DEBUG(dbgs() << "LSV: Got " << Chains.size()
+ << " nontrivial chains.\n";);
+ for (Chain &C : Chains)
+ Changed |= runOnChain(C);
+ return Changed;
+}
- // Erase all instructions scheduled for deletion in this pseudo-BB.
- for (Instruction *I : ToErase) {
- auto *PtrOperand = getLoadStorePointerOperand(I);
- if (I->use_empty())
- I->eraseFromParent();
- RecursivelyDeleteTriviallyDeadInstructions(PtrOperand);
- }
- ToErase.clear();
+bool Vectorizer::runOnChain(Chain &C) {
+ LLVM_DEBUG({
+ dbgs() << "LSV: Running on chain with " << C.size() << " instructions:\n";
+ dumpChain(C);
+ });
+ // Split up the chain into increasingly smaller chains, until we can finally
+ // vectorize the chains.
+ //
+ // (Don't be scared by the depth of the loop nest here. These operations are
+ // all at worst O(n lg n) in the number of instructions, and splitting chains
+ // doesn't change the number of instrs. So the whole loop nest is O(n lg n).)
+ bool Changed = false;
+ for (auto &C : splitChainByMayAliasInstrs(C))
+ for (auto &C : splitChainByContiguity(C))
+ for (auto &C : splitChainByAlignment(C))
+ Changed |= vectorizeChain(C);
return Changed;
}
@@ -756,7 +583,7 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
LLVM_DEBUG(
dbgs() << "LSV: Found intervening may-alias instrs; cannot merge "
<< *ChainIt->Inst << " into " << *ChainBegin->Inst << "\n");
- if (!NewChain.empty()) {
+ if (NewChain.size() > 1) {
LLVM_DEBUG({
dbgs() << "LSV: got nontrivial chain without aliasing instrs:\n";
dumpChain(NewChain);
@@ -768,7 +595,7 @@ std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &C) {
NewChain = SmallVector<ChainElem, 1>({*ChainIt});
}
}
- if (!NewChain.empty()) {
+ if (NewChain.size() > 1) {
LLVM_DEBUG({
dbgs() << "LSV: got ...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/170381
More information about the llvm-commits
mailing list