Switch containing holes via table lookup, version 4

Jasper Neumann jn at sirrida.de
Sat Mar 8 03:01:45 PST 2014


Hello Hans, hello folks,

let me thank you for helping me getting this little patch to work.


  >> @@ -3453,7 +3454,8 @@
  >>           ConstantInt *Offset,
  >>  const SmallVectorImpl<std::pair<ConstantInt*, Constant*> >& Values,
  >>           Constant *DefaultValue,
  >> -         const DataLayout *DL);
  >> +         const DataLayout *DL,
  >> +         bool IgnoreHoles);

 > We're not really ignoring the holes though, we're just going to fill
 > them with TableContents[0]. How about we rename "DefaultValue" to
 > "HoleValue" or something like that, and just pass in TableContents[0]
 > to it directly when we're using a bitmask?

You are right. The default value can easily be passed directly even when 
we deal with holes.
After thinking about renaming, I decided that DefaultValue already is 
the proper description.
The SwitchLookupTable constructor can be left untouched.


 >> @@ -3745,12 +3753,26 @@
 >>    uint64_t TableSize = RangeSpread.getLimitedValue() + 1;
 >>    bool TableHasHoles = (NumResults < TableSize);
 >>
 >> -  // If the table has holes, we need a constant result for the 
default case.
 >> +  // If the table has holes, we need a constant result for the 
default case...
 >>    SmallVector<std::pair<PHINode*, Constant*>, 4> DefaultResultsList;
 >> -  if (TableHasHoles && !GetCaseResults(SI, 0, SI->getDefaultDest(), 
&CommonDest,
 >> -                                       DefaultResultsList, DL))
 >> +  bool GotCaseResults = false;
 >> +  if (TableHasHoles) {
 >> +    GotCaseResults = GetCaseResults(SI, 0, SI->getDefaultDest(), 
&CommonDest,
 >> +                                    DefaultResultsList, DL);
 >> +  }
 >> +  // ...or need to check for valid values with a sufficiently small 
bit set.
 >> +  bool NeedMask = (TableHasHoles && !GotCaseResults);

 > How about renmaing GotCaseResults to HasDefaultResults?
 > I think that's easier to understand.

OK.


 >> +  if (NeedMask && !DL->fitsInLegalInteger(TableSize))

 > DL can be NULL here.
 > This is causing the segfaults in the lit tests you mentioned.

Thank you for finding this bug!
I assume that if DL is NULL we cannot do anything and therefore have to 
bail out.


 >> +  // Now we either can fill all holes (if present) with the default 
value
 >> +  // or we can check for the holes with a suitable bitmask.
 >> +  if (NeedMask) {
 >> +    // As an extra penalty for the validity test we require more cases.
 >> +    if (SI->getNumCases() < 4)  // TODO: Make threshold configurable.
 >> +      return false;

 > I think this extra getNumCases() check should come right after we've
 > defined NeedMask above.

OK, I swap the ifs.


 >> @@ -3796,12 +3818,53 @@
 >>
 >>    // Populate the BB that does the lookups.
 >>    Builder.SetInsertPoint(LookupBB);
 >> +
 >> +  if (NeedMask) {
 >> +    // Insert code which checks a bitmask to check for valid table 
indices.
 >> +    BasicBlock *MaskBB = LookupBB;
 >> +    MaskBB->setName("hole_check");

 > It feels a little odd that we kind of take LookupBB here, rename it,
 > and then create a new one :) The code above which creates branches to
 > LookupBB is actually branching to this hole_check bb.

 > I wonder if it would be easier to follow if we first created LookupBB,
 > then if NeedMask is true we create HoleCheckBB. When we create the
 > branches above we branch to HoleCheckBB if it's non-NULL, and LookupBB
 > otherwise.

The problem is that the BB descriptions match the following block and 
not the block currently dealt with.
I have thought a while about this problem. My solution introduces the 
new block only when needed and all the code is generated within one if.
Your approach exchanges the renaming oddity with additional tests, i.e. 
the code gets longer.
Which approach is better readable might depend on personal taste.


 >> +    LookupBB = BasicBlock::Create(Mod.getContext(),
 >> +                                  "switch.lookup",
 >> +                                  CommonDest->getParent(),
 >> +                                  CommonDest);
 >> +
 >> +    // Build bitmask; fill in a 1 bit for every case.
 >> +    APInt MaskInt(TableSize, 0);
 >> +    APInt One(TableSize, 1);
 >> +    APInt Zero(TableSize, 0);
 >> +    const ResultListTy &ResultList = ResultLists[PHIs[0]];
 >> +    for (size_t I = 0, E = ResultList.size(); I != E; ++I) {
 >> +      uint64_t Idx = (ResultList[I].first->getValue() -
 >> +                      MinCaseVal->getValue()).getLimitedValue();
 >> +      MaskInt |= One << Idx;
 >> +    }
 >> +    ConstantInt *TableMask = ConstantInt::get(Mod.getContext(), 
MaskInt);
 >> +
 >> +    // Get the TableIndex'th bit of the bitmask.
 >> +    // If this bit is 0 (meaning hole) jump to the default destination,
 >> +    // else continue with table lookup.
 >> +    IntegerType *MapTy = TableMask->getType();
 >> +    Value *CmpVal = Builder.CreateZExtOrTrunc(TableIndex, MapTy,
 >> +                                              "switch.cmpval");
 >> +    Value *Shifted = Builder.CreateLShr(TableMask, CmpVal,
 >> +                                        "switch.shifted");
 >> +    Value *Masked = Builder.CreateAnd(Shifted,
 >> + ConstantInt::get(Mod.getContext(), One),
 >> +                                      "switch.masked");
 >> +    Value *CmpZ = Builder.CreateICmpNE(Masked,
 >> + ConstantInt::get(Mod.getContext(), Zero),
 >> +                                       "switch.cmpz");
 >> +    Builder.CreateCondBr(CmpZ, LookupBB, SI->getDefaultDest());

 > Like I said before, I wonder if this could be simplified. To read the
 > n'th least significant bit from x, we should be able to just do
 > (x >> n), then truncate that to i1, and stick that in the CondBr
 > directly.
 > What do you think? (We'd have to check that it still gets lowered to
 > a bt instruction of course.)

Hmm. That only saves one Value* exchanging the explicit masking and 
comparing with a truncating operation which a processor usually cannot 
do directly.
The solution above is very similar to machine code without a BT instruction.
Your solution also works and produces the expected BT instruction.

Here comes an updated version together with an adapted test file which 
failed because of the new hole detection.

Best regards
Jasper
-------------- next part --------------
Index: lib/Transforms/Utils/SimplifyCFG.cpp
===================================================================
--- lib/Transforms/Utils/SimplifyCFG.cpp	(revision 203284)
+++ lib/Transforms/Utils/SimplifyCFG.cpp	(working copy)
@@ -68,6 +68,7 @@
 
 STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
 STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
+STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
 STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
 STATISTIC(NumSpeculations, "Number of speculative executed instructions");
 
@@ -3745,12 +3746,25 @@
   uint64_t TableSize = RangeSpread.getLimitedValue() + 1;
   bool TableHasHoles = (NumResults < TableSize);
 
-  // If the table has holes, we need a constant result for the default case.
+  // If the table has holes, we need a constant result for the default case...
   SmallVector<std::pair<PHINode*, Constant*>, 4> DefaultResultsList;
-  if (TableHasHoles && !GetCaseResults(SI, 0, SI->getDefaultDest(), &CommonDest,
-                                       DefaultResultsList, DL))
-    return false;
+  bool HasCaseResults = false;
+  if (TableHasHoles) {
+    HasCaseResults = GetCaseResults(SI, 0, SI->getDefaultDest(), &CommonDest,
+                                    DefaultResultsList, DL);
+  }
+  // ...or need to check for valid values with a sufficiently small bit set.
+  bool NeedMask = (TableHasHoles && !HasCaseResults);
+  if (NeedMask) {
+    // As an extra penalty for the validity test we require more cases.
+    if (SI->getNumCases() < 4)  // TODO: Make threshold configurable.
+      return false;  // The tests are judged too costly.
+    if (!(DL && DL->fitsInLegalInteger(TableSize)))
+      return false;  // The needed mask is too big.
+  }
 
+  // Now we either can fill all holes (if present) with the default value
+  // or we can check for the holes with a suitable bitmask.
   for (size_t I = 0, E = DefaultResultsList.size(); I != E; ++I) {
     PHINode *PHI = DefaultResultsList[I].first;
     Constant *Result = DefaultResultsList[I].second;
@@ -3796,12 +3810,58 @@
 
   // Populate the BB that does the lookups.
   Builder.SetInsertPoint(LookupBB);
+
+  if (NeedMask) {
+    // Before doing the lookup we do the hole check.
+    // The LookupBB is therefore re-purposed to do the hole check
+    // and we create a new LookupBB.
+    BasicBlock *MaskBB = LookupBB;
+    MaskBB->setName("switch.hole_check");
+    LookupBB = BasicBlock::Create(Mod.getContext(),
+                                  "switch.lookup",
+                                  CommonDest->getParent(),
+                                  CommonDest);
+
+    // Build bitmask; fill in a 1 bit for every case.
+    APInt MaskInt(TableSize, 0);
+    APInt One(TableSize, 1);
+    APInt Zero(TableSize, 0);
+    const ResultListTy &ResultList = ResultLists[PHIs[0]];
+    for (size_t I = 0, E = ResultList.size(); I != E; ++I) {
+      uint64_t Idx = (ResultList[I].first->getValue() -
+                      MinCaseVal->getValue()).getLimitedValue();
+      MaskInt |= One << Idx;
+    }
+    ConstantInt *TableMask = ConstantInt::get(Mod.getContext(), MaskInt);
+
+    // Get the TableIndex'th bit of the bitmask.
+    // If this bit is 0 (meaning hole) jump to the default destination,
+    // else continue with table lookup.
+    IntegerType *MapTy = TableMask->getType();
+    Value *CmpVal = Builder.CreateZExtOrTrunc(TableIndex, MapTy,
+                                              "switch.cmpval");
+    Value *Shifted = Builder.CreateLShr(TableMask, CmpVal,
+                                        "switch.shifted");
+    Value *LoBit = Builder.CreateTrunc(Shifted,
+                                       Type::getInt1Ty(Mod.getContext()),
+                                       "switch.lobit");
+    Builder.CreateCondBr(LoBit, LookupBB, SI->getDefaultDest());
+
+    Builder.SetInsertPoint(LookupBB);
+    AddPredecessorToBlock(SI->getDefaultDest(), MaskBB, SI->getParent());
+  }
+
   bool ReturnedEarly = false;
   for (size_t I = 0, E = PHIs.size(); I != E; ++I) {
     PHINode *PHI = PHIs[I];
 
+    Constant *DV;
+    if (NeedMask)  // To fill the holes any valid value suffices.
+      DV = ResultLists[PHI][0].second;  // The first entry is present.
+    else
+      DV = DefaultResults[PHI];  // This is the proper default value.
     SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultLists[PHI],
-                            DefaultResults[PHI], DL);
+                            DV, DL);
 
     Value *Result = Table.BuildLookup(TableIndex, Builder);
 
@@ -3831,6 +3891,8 @@
   SI->eraseFromParent();
 
   ++NumLookupTables;
+  if (NeedMask)
+    ++NumLookupTablesHoles;
   return true;
 }
 
Index: test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
===================================================================
--- test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll	(revision 203284)
+++ test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll	(working copy)
@@ -853,8 +853,8 @@
   ret i32 %x
 
 ; CHECK-LABEL: @nodefaultwithholes(
-; CHECK-NOT: @switch.table
-; CHECK: switch i32
+; CHECK: switch.hole_check
+; CHECK-NOT: switch i32
 }
 
 ; We build lookup tables for switches with three or more cases.


More information about the llvm-commits mailing list