[PATCH] Switch lowering: extract jump tables and bit tests before building binary tree (PR22262)

Daniel Jasper djasper at google.com
Tue Mar 31 05:03:10 PDT 2015


================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7337
@@ +7336,3 @@
+
+bool SelectionDAGBuilder::isSuitableForBitTests(unsigned NumDests,
+                                                unsigned NumClusters,
----------------
Everything in this function needs comments. Even if it didn't have any before, write down what you think this does.

Also, I think I'd probably pull out two different function that have names corresponding to what they do.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7374
@@ +7373,3 @@
+  const int BW = DAG.getTargetLoweringInfo().getPointerTy().getSizeInBits();
+  assert((High - Low + 1).sle(BW));
+
----------------
This assert basically test part of isSuitableForBitTests. Pull that part out into a separate well-named function instead.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7430
@@ +7429,3 @@
+
+void SelectionDAGBuilder::findBitTestClusters(CaseClusterVector &Clusters,
+                                              const SwitchInst *SI) {
----------------
With the new structure, I'd probably approach this entirely differently. IIUC, bit tests are really good to handle many (potentially relatively spares) case labels leading to the same destination.

Now, to make use of this, I'd first group all case clusters (maybe even before rangeifying them) and group them by destination. Now, start with the biggest group (possibly biggest weight by taking branch probabilities into account) and look at whether they can be lowered as bit tests. Thereby, it is possible to peel away the most bit-test-worthy groups (grouped by common destination) one at a time. I haven't completely thought about how to do this and there are a few additional things to consider:

- The range of a group might be too large (greater than the bit width). However, it might be possible to split that one group into multiple sub groups.
- It might be possible to lower multiple groups (Group1 leading to Dest1, Group2 leading to Dest2, ..) as a single bit test to conserve extra shifts (if the entire range fits into the bit width). This is currently (and in the previous implementation) sort of done by saying that there can be up to three unique destinations, but I don't that this is actually a good heuristic.
- At some point there will be a cut-off, i.e. lowering a group as bit test is not worth it. Trivially, single-destination groups are never worth to be lowered as bit tests (a single cmp is faster than and+cmp). But as we might be able to lower the others as jump tables, the cut-off might be higher.

Lets chat some more about this.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7448
@@ +7447,3 @@
+
+  int BW = PTy.getSizeInBits();
+  // Partition Clusters into as few subsets as possible, where each subset has a
----------------
I think this function is too long and BW should be replaced with something not abbreviated.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7449
@@ +7448,3 @@
+  int BW = PTy.getSizeInBits();
+  // Partition Clusters into as few subsets as possible, where each subset has a
+  // range <= BW, and <= 3 unique destinations.
----------------
This should be extended and made into a function comment, not hidden somewhere in the code.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7454
@@ +7453,3 @@
+
+  // Share one buffer to save allocations.
+  std::unique_ptr<unsigned[]> Buf(new unsigned[N * 2]);
----------------
Do you actually observe a difference here? Seems like premature optimization.

And also, there seem to be better ways to optimize this (e.g. use a SmallVector to keep the common case of small N on the stack or to re-use and allocated area for this pass instead of allocating a new one for each switch.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7470
@@ +7469,3 @@
+  // Note: loop indexes are signed to avoid underflow.
+  for (int64_t i = N - 2; i >= 0; i--) {
+    // Find optimal partitioning of Clusters[i..N-1].
----------------
--i?

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7476
@@ +7475,3 @@
+
+    // Search for a better solution.
+    for (int64_t j = std::min(N - 1, i + BW - 1); j > i; j--) {
----------------
Please write down what 'better' means.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7482
@@ +7481,3 @@
+      APInt R = Clusters[j].High->getValue() - Clusters[i].Low->getValue();
+      uint64_t Range = R.getLimitedValue(UINT64_MAX - 1) + 1;
+      if (Range > (uint64_t)BW)
----------------
If you pull apart the isSuitableForBitTests() function as mentioned above, you can likely re-use it here.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7489
@@ +7488,3 @@
+      SmallPtrSet<MachineBasicBlock *, 8> Dests;
+      for (int64_t k = i; k <= j; k++) {
+        if (Clusters[k].Kind != CC_Range) {
----------------
So, this is actually a O(N^3) algorithm?

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7837
@@ +7836,3 @@
+    findJumpTables(Clusters, &SI, DefaultMBB);
+    findBitTestClusters(Clusters, &SI);
+  }
----------------
IIUC, this seems to be a fundamental difference between the previous and current implementation. The previous implementation called handleBitTestsSwitchCase first and only later handleJTSwitchCase. Why did you reverse the order?

IMO, it could be really important to prefer bit tests if there are many labels leading to the same destination. These inherently cause redundancy in jump tables.

================
Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h:206
@@ -167,26 +205,3 @@
 
-  typedef std::vector<Case>           CaseVector;
-  typedef std::vector<CaseBits>       CaseBitsVector;
-  typedef CaseVector::iterator        CaseItr;
-  typedef std::pair<CaseItr, CaseItr> CaseRange;
-
-  /// CaseRec - A struct with ctor used in lowering switches to a binary tree
-  /// of conditional branches.
-  struct CaseRec {
-    CaseRec(MachineBasicBlock *bb, const ConstantInt *lt, const ConstantInt *ge,
-            CaseRange r) :
-    CaseBB(bb), LT(lt), GE(ge), Range(r) {}
-
-    /// CaseBB - The MBB in which to emit the compare and branch
-    MachineBasicBlock *CaseBB;
-    /// LT, GE - If nonzero, we know the current case value must be less-than or
-    /// greater-than-or-equal-to these Constants.
-    const ConstantInt *LT;
-    const ConstantInt *GE;
-    /// Range - A pair of iterators representing the range of case values to be
-    /// processed at this point in the binary search tree.
-    CaseRange Range;
-  };
-
-  typedef std::vector<CaseRec> CaseRecVector;
+    CaseBits(): Mask(0), BB(nullptr), Bits(0), ExtraWeight(0) { }
 
----------------
Use space before ":" or ideally clang-format :-).

http://reviews.llvm.org/D8649

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list