[llvm] [MachinePipeliner] Fix loop-carried dependencies analysis (PR #121907)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 7 01:25:42 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-aarch64
Author: Ryotaro Kasuga (kasuga-fj)
<details>
<summary>Changes</summary>
In the current MachinePipeliner, several loop-carried edges are missed. It can result generating invalid code. At least following loop-carried dependencies can be missed.
- Memory dependencies from top to bottom.
- Example:
```
// There is a loop-carried dependence from the first store to
// the second one.
for (int i=1; i<n; i++) {
a[i] = ...;
a[i-1] = ...;
}
```
- Store to store dependencies.
- Store to load dependencies.
- Output (write-after-write for physical register) dependencies.
- Use of alias analysis results that are valid only in the single iteration.
- Example:
```
void f(double * restrict a, double * restrict b);
...
for (int i=0; i<n; i++)
f(ptr0, ptr1); // will be inlined
```
This patch added these dependencies to fix correctness issues.
In addition, the current analysis may add excessive dependencies because loop-carried memory dependencies from bottom to top by are expressed by using dependencies in the forward direction (i.e., from top to bottom edge). This patch also removes such dependencies.
I tested performance changes with and without this patch. I used llvm-test-suite as test cases and checked the following:
- Changes in Initiation Interval on Hexagon (since I don't have a real machine, I couldn't check the actual execution time).
- Changes of Initiation Interval and execution time on AArch64 (Neoverse V1).
- (Note: As described below, loop unrolling is disabled).
As far as I have tested, there has been no significant performance impact. It's worth noting, however, that a huge performance degradation can occur when
- Loop unrolling is enabled and
- The target architecture doesn't implement `TargetInstrInfo::getIncrementValue`.
This is because loop-carried edges are added to each pair of unrolled memory instructions. For example, suppose a loop contains a store to a[i] and it's unrolled 4 times. In this case, the loop has store to a[i], a[i+1], a[i+2], and a[i+3]. If `getIncrementValue` isn't implemented, we cannot be sure that these stores are independent, so loop-carried dependencies are added against each pair of them. We can avoid this problem by disabling loop unrolling or implementing getIncrementValue. I don't think it makes much sense to enable both loop unrolling and software pipelining, so I believe disabling loop unrolling is not a big problem.
---
Patch is 84.58 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/121907.diff
18 Files Affected:
- (modified) llvm/include/llvm/CodeGen/MachinePipeliner.h (+45-35)
- (modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+497-230)
- (modified) llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir (+6-1)
- (added) llvm/test/CodeGen/AArch64/sms-loop-carried-fp-exceptions1.mir (+107)
- (added) llvm/test/CodeGen/AArch64/sms-loop-carried-fp-exceptions2.mir (+100)
- (modified) llvm/test/CodeGen/Hexagon/swp-carried-dep1.mir (+5-6)
- (modified) llvm/test/CodeGen/Hexagon/swp-epilog-phi7.ll (+5)
- (modified) llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll (+4-4)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep1.mir (+110)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep2.mir (+104)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep3.mir (+108)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep4.mir (+107)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep5.mir (+106)
- (added) llvm/test/CodeGen/Hexagon/swp-loop-carried-order-dep6.mir (+153)
- (modified) llvm/test/CodeGen/Hexagon/swp-loop-carried-unknown.ll (+9-6)
- (modified) llvm/test/CodeGen/Hexagon/swp-resmii-1.ll (+1-1)
- (modified) llvm/test/CodeGen/PowerPC/sms-recmii.ll (+1-1)
- (modified) llvm/test/CodeGen/PowerPC/sms-store-dependence.ll (+8-41)
``````````diff
diff --git a/llvm/include/llvm/CodeGen/MachinePipeliner.h b/llvm/include/llvm/CodeGen/MachinePipeliner.h
index 8e47d0cead7571..810a5d9f6dff00 100644
--- a/llvm/include/llvm/CodeGen/MachinePipeliner.h
+++ b/llvm/include/llvm/CodeGen/MachinePipeliner.h
@@ -42,6 +42,7 @@
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"
+#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/DFAPacketizer.h"
#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
@@ -190,6 +191,33 @@ class SwingSchedulerDDGEdge {
bool ignoreDependence(bool IgnoreAnti) const;
};
+struct LoopCarriedEdges {
+ using OutputDep = SmallDenseMap<Register, SmallSetVector<SUnit *, 4>>;
+ using OrderDep = SmallSetVector<SUnit *, 8>;
+ using OutputDepsType = DenseMap<SUnit *, OutputDep>;
+ using OrderDepsType = DenseMap<SUnit *, OrderDep>;
+
+ OutputDepsType OutputDeps;
+ OrderDepsType OrderDeps;
+
+ const OutputDep *getOutputDepOrNull(SUnit *Key) const {
+ auto Ite = OutputDeps.find(Key);
+ if (Ite == OutputDeps.end())
+ return nullptr;
+ return &Ite->second;
+ }
+
+ const OrderDep *getOrderDepOrNull(SUnit *Key) const {
+ auto Ite = OrderDeps.find(Key);
+ if (Ite == OrderDeps.end())
+ return nullptr;
+ return &Ite->second;
+ }
+
+ void dump(SUnit *SU, const TargetRegisterInfo *TRI,
+ const MachineRegisterInfo *MRI) const;
+};
+
/// Represents dependencies between instructions. This class is a wrapper of
/// `SUnits` and its dependencies to manipulate back-edges in a natural way.
/// Currently it only supports back-edges via PHI, which are expressed as
@@ -217,8 +245,12 @@ class SwingSchedulerDDG {
SwingSchedulerDDGEdges &getEdges(const SUnit *SU);
const SwingSchedulerDDGEdges &getEdges(const SUnit *SU) const;
+ void addLoopCarriedEdges(std::vector<SUnit> &SUnits,
+ const LoopCarriedEdges &LCE);
+
public:
- SwingSchedulerDDG(std::vector<SUnit> &SUnits, SUnit *EntrySU, SUnit *ExitSU);
+ SwingSchedulerDDG(std::vector<SUnit> &SUnits, SUnit *EntrySU, SUnit *ExitSU,
+ const LoopCarriedEdges &LCE);
const EdgesType &getInEdges(const SUnit *SU) const;
@@ -285,22 +317,14 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
BitVector Blocked;
SmallVector<SmallPtrSet<SUnit *, 4>, 10> B;
SmallVector<SmallVector<int, 4>, 16> AdjK;
- // Node to Index from ScheduleDAGTopologicalSort
- std::vector<int> *Node2Idx;
+ SmallVector<BitVector, 16> LoopCarried;
unsigned NumPaths = 0u;
- static unsigned MaxPaths;
public:
- Circuits(std::vector<SUnit> &SUs, ScheduleDAGTopologicalSort &Topo)
- : SUnits(SUs), Blocked(SUs.size()), B(SUs.size()), AdjK(SUs.size()) {
- Node2Idx = new std::vector<int>(SUs.size());
- unsigned Idx = 0;
- for (const auto &NodeNum : Topo)
- Node2Idx->at(NodeNum) = Idx++;
- }
+ Circuits(std::vector<SUnit> &SUs)
+ : SUnits(SUs), Blocked(SUs.size()), B(SUs.size()), AdjK(SUs.size()) {}
Circuits &operator=(const Circuits &other) = delete;
Circuits(const Circuits &other) = delete;
- ~Circuits() { delete Node2Idx; }
/// Reset the data structures used in the circuit algorithm.
void reset() {
@@ -310,9 +334,9 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
NumPaths = 0;
}
- void createAdjacencyStructure(SwingSchedulerDAG *DAG);
+ void createAdjacencyStructure(const SwingSchedulerDDG *DDG);
bool circuit(int V, int S, NodeSetType &NodeSets,
- const SwingSchedulerDAG *DAG, bool HasBackedge = false);
+ const SwingSchedulerDDG *DDG, bool HasLoopCarriedEdge = false);
void unblock(int U);
};
@@ -366,7 +390,8 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
return ScheduleInfo[Node->NodeNum].ZeroLatencyHeight;
}
- bool isLoopCarriedDep(const SwingSchedulerDDGEdge &Edge) const;
+ bool hasLoopCarriedMemDep(const MachineInstr *Src, const MachineInstr *Dst,
+ BatchAAResults *BAA) const;
void applyInstrChange(MachineInstr *MI, SMSchedule &Schedule);
@@ -391,7 +416,9 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
const SwingSchedulerDDG *getDDG() const { return DDG.get(); }
private:
- void addLoopCarriedDependences(AAResults *AA);
+ LoopCarriedEdges addLoopCarriedDependences(AAResults *AA);
+ AliasResult::Kind checkLoopCarriedMemDep(const MachineInstr *Src,
+ const MachineInstr *Dst) const;
void updatePhiDependences();
void changeDependences();
unsigned calculateResMII();
@@ -409,7 +436,7 @@ class SwingSchedulerDAG : public ScheduleDAGInstrs {
void computeNodeOrder(NodeSetType &NodeSets);
void checkValidNodeOrder(const NodeSetType &Circuits) const;
bool schedulePipeline(SMSchedule &Schedule);
- bool computeDelta(MachineInstr &MI, unsigned &Delta) const;
+ bool computeDelta(const MachineInstr &MI, unsigned &Delta) const;
MachineInstr *findDefInLoop(Register Reg);
bool canUseLastOffsetValue(MachineInstr *MI, unsigned &BasePos,
unsigned &OffsetPos, unsigned &NewBase,
@@ -437,7 +464,7 @@ class NodeSet {
using iterator = SetVector<SUnit *>::const_iterator;
NodeSet() = default;
- NodeSet(iterator S, iterator E, const SwingSchedulerDAG *DAG)
+ NodeSet(iterator S, iterator E, const SwingSchedulerDDG *DDG)
: Nodes(S, E), HasRecurrence(true) {
// Calculate the latency of this node set.
// Example to demonstrate the calculation:
@@ -453,7 +480,6 @@ class NodeSet {
//
// Hold a map from each SUnit in the circle to the maximum distance from the
// source node by only considering the nodes.
- const SwingSchedulerDDG *DDG = DAG->getDDG();
DenseMap<SUnit *, unsigned> SUnitToDistance;
for (auto *Node : Nodes)
SUnitToDistance[Node] = 0;
@@ -470,22 +496,6 @@ class NodeSet {
}
}
}
- // Handle a back-edge in loop carried dependencies
- SUnit *FirstNode = Nodes[0];
- SUnit *LastNode = Nodes[Nodes.size() - 1];
-
- for (auto &PI : DDG->getInEdges(LastNode)) {
- // If we have an order dep that is potentially loop carried then a
- // back-edge exists between the last node and the first node that isn't
- // modeled in the DAG. Handle it manually by adding 1 to the distance of
- // the last node.
- if (PI.getSrc() != FirstNode || !PI.isOrderDep() ||
- !DAG->isLoopCarriedDep(PI))
- continue;
- SUnitToDistance[FirstNode] =
- std::max(SUnitToDistance[FirstNode], SUnitToDistance[LastNode] + 1);
- }
-
// The latency is the distance from the source node to itself.
Latency = SUnitToDistance[Nodes.front()];
}
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index acd42aa497c6fe..f0731ccdaa6532 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -194,6 +194,10 @@ static cl::opt<bool>
MVECodeGen("pipeliner-mve-cg", cl::Hidden, cl::init(false),
cl::desc("Use the MVE code generator for software pipelining"));
+static cl::opt<unsigned> MaxCircuitPaths(
+ "pipeliner-max-circuit-paths", cl::Hidden, cl::init(5),
+ cl::desc("Maximum number of circles to be detected for each vertex"));
+
namespace llvm {
// A command line option to enable the CopyToPhi DAG mutation.
@@ -221,7 +225,6 @@ cl::opt<WindowSchedulingFlag> WindowSchedulingOption(
} // end namespace llvm
-unsigned SwingSchedulerDAG::Circuits::MaxPaths = 5;
char MachinePipeliner::ID = 0;
#ifndef NDEBUG
int MachinePipeliner::NumTries = 0;
@@ -562,14 +565,20 @@ void SwingSchedulerDAG::setMAX_II() {
void SwingSchedulerDAG::schedule() {
AliasAnalysis *AA = &Pass.getAnalysis<AAResultsWrapperPass>().getAAResults();
buildSchedGraph(AA);
- addLoopCarriedDependences(AA);
updatePhiDependences();
Topo.InitDAGTopologicalSorting();
changeDependences();
postProcessDAG();
- DDG = std::make_unique<SwingSchedulerDDG>(SUnits, &EntrySU, &ExitSU);
LLVM_DEBUG(dump());
+ auto LCE = addLoopCarriedDependences(AA);
+ LLVM_DEBUG({
+ dbgs() << "Loop Carried Edges:\n";
+ for (SUnit &SU : SUnits)
+ LCE.dump(&SU, TRI, &MRI);
+ });
+ DDG = std::make_unique<SwingSchedulerDDG>(SUnits, &EntrySU, &ExitSU, LCE);
+
NodeSetType NodeSets;
findCircuits(NodeSets);
NodeSetType Circuits = NodeSets;
@@ -779,42 +788,18 @@ static unsigned getLoopPhiReg(const MachineInstr &Phi,
return 0;
}
-/// Return true if SUb can be reached from SUa following the chain edges.
-static bool isSuccOrder(SUnit *SUa, SUnit *SUb) {
- SmallPtrSet<SUnit *, 8> Visited;
- SmallVector<SUnit *, 8> Worklist;
- Worklist.push_back(SUa);
- while (!Worklist.empty()) {
- const SUnit *SU = Worklist.pop_back_val();
- for (const auto &SI : SU->Succs) {
- SUnit *SuccSU = SI.getSUnit();
- if (SI.getKind() == SDep::Order) {
- if (Visited.count(SuccSU))
- continue;
- if (SuccSU == SUb)
- return true;
- Worklist.push_back(SuccSU);
- Visited.insert(SuccSU);
- }
- }
- }
- return false;
-}
-
/// Return true if the instruction causes a chain between memory
/// references before and after it.
-static bool isDependenceBarrier(MachineInstr &MI) {
- return MI.isCall() || MI.mayRaiseFPException() ||
- MI.hasUnmodeledSideEffects() ||
- (MI.hasOrderedMemoryRef() &&
- (!MI.mayLoad() || !MI.isDereferenceableInvariantLoad()));
+static bool isGlobalMemoryObject(MachineInstr &MI) {
+ return MI.isCall() || MI.hasUnmodeledSideEffects() ||
+ (MI.hasOrderedMemoryRef() && !MI.isDereferenceableInvariantLoad());
}
/// Return the underlying objects for the memory references of an instruction.
/// This function calls the code in ValueTracking, but first checks that the
/// instruction has a memory operand.
-static void getUnderlyingObjects(const MachineInstr *MI,
- SmallVectorImpl<const Value *> &Objs) {
+static void getUnderlyingObjectsForInstr(const MachineInstr *MI,
+ SmallVectorImpl<const Value *> &Objs) {
if (!MI->hasOneMemOperand())
return;
MachineMemOperand *MM = *MI->memoperands_begin();
@@ -829,97 +814,63 @@ static void getUnderlyingObjects(const MachineInstr *MI,
}
}
-/// Add a chain edge between a load and store if the store can be an
-/// alias of the load on a subsequent iteration, i.e., a loop carried
-/// dependence. This code is very similar to the code in ScheduleDAGInstrs
-/// but that code doesn't create loop carried dependences.
-void SwingSchedulerDAG::addLoopCarriedDependences(AliasAnalysis *AA) {
- MapVector<const Value *, SmallVector<SUnit *, 4>> PendingLoads;
- Value *UnknownValue =
- UndefValue::get(Type::getVoidTy(MF.getFunction().getContext()));
- for (auto &SU : SUnits) {
- MachineInstr &MI = *SU.getInstr();
- if (isDependenceBarrier(MI))
- PendingLoads.clear();
- else if (MI.mayLoad()) {
- SmallVector<const Value *, 4> Objs;
- ::getUnderlyingObjects(&MI, Objs);
- if (Objs.empty())
- Objs.push_back(UnknownValue);
- for (const auto *V : Objs) {
- SmallVector<SUnit *, 4> &SUs = PendingLoads[V];
- SUs.push_back(&SU);
- }
- } else if (MI.mayStore()) {
- SmallVector<const Value *, 4> Objs;
- ::getUnderlyingObjects(&MI, Objs);
- if (Objs.empty())
- Objs.push_back(UnknownValue);
- for (const auto *V : Objs) {
- MapVector<const Value *, SmallVector<SUnit *, 4>>::iterator I =
- PendingLoads.find(V);
- if (I == PendingLoads.end())
- continue;
- for (auto *Load : I->second) {
- if (isSuccOrder(Load, &SU))
- continue;
- MachineInstr &LdMI = *Load->getInstr();
- // First, perform the cheaper check that compares the base register.
- // If they are the same and the load offset is less than the store
- // offset, then mark the dependence as loop carried potentially.
- const MachineOperand *BaseOp1, *BaseOp2;
- int64_t Offset1, Offset2;
- bool Offset1IsScalable, Offset2IsScalable;
- if (TII->getMemOperandWithOffset(LdMI, BaseOp1, Offset1,
- Offset1IsScalable, TRI) &&
- TII->getMemOperandWithOffset(MI, BaseOp2, Offset2,
- Offset2IsScalable, TRI)) {
- if (BaseOp1->isIdenticalTo(*BaseOp2) &&
- Offset1IsScalable == Offset2IsScalable &&
- (int)Offset1 < (int)Offset2) {
- assert(TII->areMemAccessesTriviallyDisjoint(LdMI, MI) &&
- "What happened to the chain edge?");
- SDep Dep(Load, SDep::Barrier);
- Dep.setLatency(1);
- SU.addPred(Dep);
- continue;
- }
- }
- // Second, the more expensive check that uses alias analysis on the
- // base registers. If they alias, and the load offset is less than
- // the store offset, the mark the dependence as loop carried.
- if (!AA) {
- SDep Dep(Load, SDep::Barrier);
- Dep.setLatency(1);
- SU.addPred(Dep);
- continue;
- }
- MachineMemOperand *MMO1 = *LdMI.memoperands_begin();
- MachineMemOperand *MMO2 = *MI.memoperands_begin();
- if (!MMO1->getValue() || !MMO2->getValue()) {
- SDep Dep(Load, SDep::Barrier);
- Dep.setLatency(1);
- SU.addPred(Dep);
- continue;
- }
- if (MMO1->getValue() == MMO2->getValue() &&
- MMO1->getOffset() <= MMO2->getOffset()) {
- SDep Dep(Load, SDep::Barrier);
- Dep.setLatency(1);
- SU.addPred(Dep);
- continue;
- }
- if (!AA->isNoAlias(
- MemoryLocation::getAfter(MMO1->getValue(), MMO1->getAAInfo()),
- MemoryLocation::getAfter(MMO2->getValue(),
- MMO2->getAAInfo()))) {
- SDep Dep(Load, SDep::Barrier);
- Dep.setLatency(1);
- SU.addPred(Dep);
- }
- }
- }
- }
+static std::optional<MemoryLocation>
+getMemoryLocationForAA(const MachineInstr *MI) {
+ const MachineMemOperand *MMO = *MI->memoperands_begin();
+ const Value *Val = MMO->getValue();
+ if (!Val)
+ return std::nullopt;
+ auto MemLoc = MemoryLocation::getBeforeOrAfter(Val, MMO->getAAInfo());
+
+ // Peel off noalias information from `AATags` because it might be valid only
+ // in single iteration.
+ // FIXME: This is too conservative. Checking
+ // `llvm.experimental.noalias.scope.decl` instrinsics in the original LLVM IR
+ // can perform more accuurately.
+ MemLoc.AATags.NoAlias = nullptr;
+ return MemLoc;
+}
+
+/// Return true for an memory dependence that is loop carried
+/// potentially. A dependence is loop carried if the destination defines a value
+/// that may be used or defined by the source in a subsequent iteration.
+bool SwingSchedulerDAG::hasLoopCarriedMemDep(const MachineInstr *Src,
+ const MachineInstr *Dst,
+ BatchAAResults *BAA) const {
+ if (!SwpPruneLoopCarried)
+ return true;
+
+ // First, check the dependence by comparing base register, offset, and
+ // step value of the loop.
+ switch (checkLoopCarriedMemDep(Src, Dst)) {
+ case AliasResult::Kind::MustAlias:
+ return true;
+ case AliasResult::Kind::NoAlias:
+ return false;
+ case AliasResult::Kind::MayAlias:
+ break;
+ default:
+ llvm_unreachable("Unexpected alias");
+ }
+
+ // If we cannot determine the dependence by previouse check, then
+ // check by using alias analysis.
+ if (!BAA)
+ return true;
+
+ const auto MemLoc1 = getMemoryLocationForAA(Src);
+ const auto MemLoc2 = getMemoryLocationForAA(Dst);
+ if (!MemLoc1.has_value() || !MemLoc2.has_value())
+ return true;
+ switch (BAA->alias(*MemLoc1, *MemLoc2)) {
+ case AliasResult::Kind::MayAlias:
+ case AliasResult::Kind::MustAlias:
+ case AliasResult::Kind::PartialAlias:
+ return true;
+ case AliasResult::Kind::NoAlias:
+ return false;
+ default:
+ llvm_unreachable("Unexpected alias");
}
}
@@ -1544,8 +1495,311 @@ class HighRegisterPressureDetector {
}
};
+/// Add loop-carried chain dependencies. This class handles the same type of
+/// dependencies added by `ScheduleDAGInstrs::buildSchedGraph`, but takes into
+/// account dependencies across iterations.
+class LoopCarriedOrderDepsTracker {
+ // Type of instruction that is relevant to order-dependencies
+ enum class InstrTag {
+ // Instruction related to global memory objects. There are order
+ // dependencies between instructions that may load or store or raise
+ // floating-point exception before and after this one.
+ GlobalMemoryObject = 0,
+
+ // Instruction that may load or store memory, but does not form a global
+ // barrier.
+ LoadOrStore = 1,
+
+ // Instruction that does not match above, but may raise floatin-point
+ // exceptions.
+ FPExceptions = 2,
+ };
+
+ struct TaggedSUnit : PointerIntPair<SUnit *, 2> {
+ TaggedSUnit(SUnit *SU, InstrTag Tag)
+ : PointerIntPair<SUnit *, 2>(SU, unsigned(Tag)) {}
+
+ InstrTag getTag() const { return InstrTag(getInt()); }
+ };
+
+ using SUsType = SmallVector<SUnit *, 4>;
+ using Value2SUs = MapVector<const Value *, SUsType>;
+
+ // Retains loads and stores classified by the underlying objects.
+ struct LoadStoreChunk {
+ Value2SUs Loads, Stores;
+ SUsType UnknownLoads, UnknownStores;
+ };
+
+ SwingSchedulerDAG *DAG;
+ std::unique_ptr<BatchAAResults> BAA;
+ const Value *UnknownValue;
+ std::vector<SUnit> &SUnits;
+
+ // The size of SUnits, for convenience.
+ const unsigned N;
+
+ // Adjacency matrix consisiting of order dependencies of the original DAG.
+ std::vector<BitVector> AdjMatrix;
+
+ // Loop-carried Edges.
+ std::vector<BitVector> LoopCarried;
+
+ // Instructions related to chain dependencies. They are one of the following.
+ //
+ // 1. Global memory object.
+ // 2. Load, but not a global memory object, not invariant, or may load trap
+ // value.
+ // 3. Store, but not global memory object.
+ // 4. None of them, but may raise floating-point exceptions.
+ //
+ // This is used when analyzing loop-carried dependencies that access global
+ // barrier instructions.
+ std::vector<TaggedSUnit> TaggedSUnits;
+
+public:
+ LoopCarriedOrderDepsTracker(SwingSchedulerDAG *SSD, AAResults *AA)
+ : DAG(SSD), BAA(nullptr), SUnits(DAG->SUnits), N(SUnits.size()),
+ AdjMatrix(N, BitVector(N)), LoopCarried(N, BitVector(N)) {
+ UnknownValue =
+ UndefValue::get(Type::getVoidTy(DAG->MF.getFunction().getContext()));
+ if (AA) {
+ BAA = std::make_unique<BatchAAResults>(*AA);
+ BAA->enableCrossIterationMode();
+ }
+ initAdjMatrix();
+ }
+
+ void computeDependencies() {
+ // Traverse all instructions and extract only what we are targetting.
+ for (auto &SU : SUnits) {
+ auto Tagged = checkInstrType(&SU);
+
+ // This instruction has no loop-carried order-dependencies.
+ if (!Tagged)
+ continue;
+
+ TaggedSUnits.push_back(*Tagged);
+ }
+
+ addLoopCarriedDependencies();
+
+ // Finalize the results.
+ for (int I = 0; I != int(N); I++) {
+ // If the dependence between two instructions already exists in the
+ // original DAG, then loop-carried dependence of the same instructions is
+ // unnecessary because the original one expresses stricter
+ // constraint than loop-carried one.
+ LoopCarried[I].reset(AdjMatrix[I]);
+
+ // Self-loops are noisy.
+ LoopCarried[I].reset(I);
+ }
+ }
+
+ const BitVector &getLoopCarried(unsigned Idx) const {
+ return LoopCarried[Idx];
+ }
+
+private:
+ // Calculate reachability induced by the adjacency matrix. The original graph
+ // is DAG, so we can compute them from bottom to top.
+ void initAdjMatrix() {
+ for (int RI = 0; RI != int(N); RI++) {
+ int I = SUnits.size() - (RI + 1);
+ f...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/121907
More information about the llvm-commits
mailing list