[llvm] b3bce6a - [MachineVerifier] Doing ::calcRegsPassed over faster sets: ~15-20% faster MV, NFC

Roman Tereshin via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 24 19:01:52 PST 2020


Author: Roman Tereshin
Date: 2020-02-24T19:01:21-08:00
New Revision: b3bce6a3ddb77a6c6b55ec9de3e36c8de608384c

URL: https://github.com/llvm/llvm-project/commit/b3bce6a3ddb77a6c6b55ec9de3e36c8de608384c
DIFF: https://github.com/llvm/llvm-project/commit/b3bce6a3ddb77a6c6b55ec9de3e36c8de608384c.diff

LOG: [MachineVerifier] Doing ::calcRegsPassed over faster sets: ~15-20% faster MV, NFC

MachineVerifier still takes 45-50% of total compile time with
-verify-machineinstrs, with calcRegsPassed dataflow taking ~50-60% of
MachineVerifier.

The majority of that time is spent in BBInfo::addPassed, mostly within
DenseSet implementing the sets the dataflow is operating over.

In particular, 1/4 of that DenseSet time is spent just iterating over it
(operator++), 40-50% on insertions, and most of the rest in ::count.

Given that, we're implementing custom sets just for this analysis here,
focusing on cheap insertions and O(n) iteration time (as opposed to
O(U), where U is the universe).

As it's based _mostly_ on BitVector for sparse and SmallVector for
dense, it may remotely resemble SparseSet. The difference is, our
solution is a lot less clever, doesn't have constant time `clear` that
we won't use anyway as reusing these sets across analyses is cumbersome,
and thus more space efficient and safer (got a resizable Universe and a
fallback to DenseSet for sparse if it gets too big).

With this patch MachineVerifier gets ~15-20% faster, its contribution to
total compile time drops from 45-50% to ~35%, while contribution of
calcRegsPassed to MachineVerifier drops from 50-60% to ~35% as well.

calcRegsPassed itself gets another 2x faster here.

All measured on a large suite of shaders targeting a number of GPUs.

Reviewers: bogner, stoklund, rudkx, qcolombet

Reviewed By: rudkx

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75033

Added: 
    

Modified: 
    llvm/lib/CodeGen/MachineVerifier.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/MachineVerifier.cpp b/llvm/lib/CodeGen/MachineVerifier.cpp
index 69ecb45a317c..f298ab811fd4 100644
--- a/llvm/lib/CodeGen/MachineVerifier.cpp
+++ b/llvm/lib/CodeGen/MachineVerifier.cpp
@@ -156,25 +156,6 @@ namespace {
 
       BBInfo() = default;
 
-      // Add register to vregsPassed if it belongs there. Return true if
-      // anything changed.
-      bool addPassed(unsigned Reg) {
-        if (!Register::isVirtualRegister(Reg))
-          return false;
-        if (regsKilled.count(Reg) || regsLiveOut.count(Reg))
-          return false;
-        return vregsPassed.insert(Reg).second;
-      }
-
-      // Same for a full set.
-      bool addPassed(const RegSet &RS) {
-        bool changed = false;
-        for (RegSet::const_iterator I = RS.begin(), E = RS.end(); I != E; ++I)
-          if (addPassed(*I))
-            changed = true;
-        return changed;
-      }
-
       // Add register to vregsRequired if it belongs there. Return true if
       // anything changed.
       bool addRequired(unsigned Reg) {
@@ -2144,6 +2125,109 @@ MachineVerifier::visitMachineBasicBlockAfter(const MachineBasicBlock *MBB) {
   }
 }
 
+namespace {
+// This implements a set of registers that serves as a filter: can filter other
+// sets by passing through elements not in the filter and blocking those that
+// are. Any filter implicitly includes the full set of physical registers upon
+// creation, thus filtering them all out. The filter itself as a set only grows,
+// and needs to be as efficient as possible.
+struct VRegFilter {
+  // Add elements to the filter itself. \pre Input set \p FromRegSet must have
+  // no duplicates. Both virtual and physical registers are fine.
+  template <typename RegSetT> void add(const RegSetT &FromRegSet) {
+    SmallVector<unsigned, 0> VRegsBuffer;
+    filterAndAdd(FromRegSet, VRegsBuffer);
+  }
+  // Filter \p FromRegSet through the filter and append passed elements into \p
+  // ToVRegs. All elements appended are then added to the filter itself.
+  // \returns true if anything changed.
+  template <typename RegSetT>
+  bool filterAndAdd(const RegSetT &FromRegSet,
+                    SmallVectorImpl<unsigned> &ToVRegs) {
+    unsigned SparseUniverse = Sparse.size();
+    unsigned NewSparseUniverse = SparseUniverse;
+    unsigned NewDenseSize = Dense.size();
+    size_t Begin = ToVRegs.size();
+    for (unsigned Reg : FromRegSet) {
+      if (!Register::isVirtualRegister(Reg))
+        continue;
+      unsigned Index = Register::virtReg2Index(Reg);
+      if (Index < SparseUniverseMax) {
+        if (Index < SparseUniverse && Sparse.test(Index))
+          continue;
+        NewSparseUniverse = std::max(NewSparseUniverse, Index + 1);
+      } else {
+        if (Dense.count(Reg))
+          continue;
+        ++NewDenseSize;
+      }
+      ToVRegs.push_back(Reg);
+    }
+    size_t End = ToVRegs.size();
+    if (Begin == End)
+      return false;
+    // Reserving space in sets once performs better than doing so continuously
+    // and pays easily for double look-ups (even in Dense with SparseUniverseMax
+    // tuned all the way down) and double iteration (the second one is over a
+    // SmallVector, which is a lot cheaper compared to DenseSet or BitVector).
+    Sparse.resize(NewSparseUniverse);
+    Dense.reserve(NewDenseSize);
+    for (unsigned I = Begin; I < End; ++I) {
+      unsigned Reg = ToVRegs[I];
+      unsigned Index = Register::virtReg2Index(Reg);
+      if (Index < SparseUniverseMax)
+        Sparse.set(Index);
+      else
+        Dense.insert(Reg);
+    }
+    return true;
+  }
+
+private:
+  static constexpr unsigned SparseUniverseMax = 10 * 1024 * 8;
+  // VRegs indexed within SparseUniverseMax are tracked by Sparse, those beyound
+  // are tracked by Dense. The only purpose of the threashold and the Dense set
+  // is to have a reasonably growing memory usage in pathological cases (large
+  // number of very sparse VRegFilter instances live at the same time). In
+  // practice even in the worst-by-execution time cases having all elements
+  // tracked by Sparse (very large SparseUniverseMax scenario) tends to be more
+  // space efficient than if tracked by Dense. The threashold is set to keep the
+  // worst-case memory usage within 2x of figures determined empirically for
+  // "all Dense" scenario in such worst-by-execution-time cases.
+  BitVector Sparse;
+  DenseSet<unsigned> Dense;
+};
+
+// Implements both a transfer function and a (binary, in-place) join operator
+// for a dataflow over register sets with set union join and filtering transfer
+// (out_b = in_b \ filter_b). filter_b is expected to be set-up ahead of time.
+// Maintains out_b as its state, allowing for O(n) iteration over it at any
+// time, where n is the size of the set (as opposed to O(U) where U is the
+// universe). filter_b implicitly contains all physical registers at all times.
+class FilteringVRegSet {
+  VRegFilter Filter;
+  SmallVector<unsigned, 0> VRegs;
+
+public:
+  // Set-up the filter_b. \pre Input register set \p RS must have no duplicates.
+  // Both virtual and physical registers are fine.
+  template <typename RegSetT> void addToFilter(const RegSetT &RS) {
+    Filter.add(RS);
+  }
+  // Passes \p RS through the filter_b (transfer function) and adds what's left
+  // to itself (out_b).
+  template <typename RegSetT> bool add(const RegSetT &RS) {
+    // Double-duty the Filter: to maintain VRegs a set (and the join operation
+    // a set union) just add everything being added here to the Filter as well.
+    return Filter.filterAndAdd(RS, VRegs);
+  }
+  using const_iterator = decltype(VRegs)::const_iterator;
+  const_iterator begin() const { return VRegs.begin(); }
+  const_iterator end() const { return VRegs.end(); }
+  size_t size() const { return VRegs.size(); }
+};
+} // namespace
+
 // Calculate the largest possible vregsPassed sets. These are the registers that
 // can pass through an MBB live, but may not be live every time. It is assumed
 // that all vregsPassed sets are empty before the call.
@@ -2157,22 +2241,28 @@ void MachineVerifier::calcRegsPassed() {
     // ReversePostOrderTraversal doesn't handle empty functions.
     return;
   }
+  std::vector<FilteringVRegSet> VRegsPassedSets(MF->size());
   for (const MachineBasicBlock *MBB :
        ReversePostOrderTraversal<const MachineFunction *>(MF)) {
     // Careful with the evaluation order, fetch next number before allocating.
     unsigned Number = RPONumbers.size();
     RPONumbers[MBB] = Number;
+    // Set-up the transfer functions for all blocks.
+    const BBInfo &MInfo = MBBInfoMap[MBB];
+    VRegsPassedSets[Number].addToFilter(MInfo.regsKilled);
+    VRegsPassedSets[Number].addToFilter(MInfo.regsLiveOut);
   }
   // First push live-out regs to successors' vregsPassed. Remember the MBBs that
   // have any vregsPassed.
   for (const MachineBasicBlock &MBB : *MF) {
-    BBInfo &MInfo = MBBInfoMap[&MBB];
+    const BBInfo &MInfo = MBBInfoMap[&MBB];
     if (!MInfo.reachable)
       continue;
     for (const MachineBasicBlock *Succ : MBB.successors()) {
-      BBInfo &SInfo = MBBInfoMap[Succ];
-      if (SInfo.addPassed(MInfo.regsLiveOut))
-        RPOWorklist.emplace(RPONumbers[Succ], Succ);
+      unsigned SuccNumber = RPONumbers[Succ];
+      FilteringVRegSet &SuccSet = VRegsPassedSets[SuccNumber];
+      if (SuccSet.add(MInfo.regsLiveOut))
+        RPOWorklist.emplace(SuccNumber, Succ);
     }
   }
 
@@ -2181,15 +2271,25 @@ void MachineVerifier::calcRegsPassed() {
     auto Next = RPOWorklist.begin();
     const MachineBasicBlock *MBB = Next->second;
     RPOWorklist.erase(Next);
-    BBInfo &MInfo = MBBInfoMap[MBB];
+    FilteringVRegSet &MSet = VRegsPassedSets[RPONumbers[MBB]];
     for (const MachineBasicBlock *Succ : MBB->successors()) {
       if (Succ == MBB)
         continue;
-      BBInfo &SInfo = MBBInfoMap[Succ];
-      if (SInfo.addPassed(MInfo.vregsPassed))
-        RPOWorklist.emplace(RPONumbers[Succ], Succ);
+      unsigned SuccNumber = RPONumbers[Succ];
+      FilteringVRegSet &SuccSet = VRegsPassedSets[SuccNumber];
+      if (SuccSet.add(MSet))
+        RPOWorklist.emplace(SuccNumber, Succ);
     }
   }
+  // Copy the results back to BBInfos.
+  for (const MachineBasicBlock &MBB : *MF) {
+    BBInfo &MInfo = MBBInfoMap[&MBB];
+    if (!MInfo.reachable)
+      continue;
+    const FilteringVRegSet &MSet = VRegsPassedSets[RPONumbers[&MBB]];
+    MInfo.vregsPassed.reserve(MSet.size());
+    MInfo.vregsPassed.insert(MSet.begin(), MSet.end());
+  }
 }
 
 // Calculate the set of virtual registers that must be passed through each basic


        


More information about the llvm-commits mailing list