[PATCH] peephole optimization in switch table lookup: reuse the guarding table comparison if possible

Wed Nov 26 11:47:00 PST 2014

Thanks for the detailed review!
I have put a new version it into phabricator: http://reviews.llvm.org/D6423 <http://reviews.llvm.org/D6423>
It should cover all your comments.

> On 25 Nov 2014, at 20:48, Hans Wennborg <hans at chromium.org> wrote:
> 
> On Tue, Nov 25, 2014 at 6:19 AM, Erik Eckstein <eeckstein at apple.com> wrote:
>>> Could jump threading or one of its analyses be taught to handle this?
>>> So that we could also handle a case like:
>> 
>> Actually the current jump threading can handle this, but only if the "r == 0" is a compare + branch. E.g. if do_something is a call, it will work.
>> It currently does not handle select instructions. So if do_something is a simple variable assignment, then it will not work. I think this could be added easily.
>> 
>> But we have a phase ordering problem: jump threading is obviously done after switch table generation (so it does not work currently for switches which are converted to tables).
>> If we would do jump threading before, then it might prevent switch table generation.
>> 
>> I suggest the following:
>> 1) Use my patch to do this kind of "jump threading" for switch tables (solves the phase ordering problem).
>> 2) Teach the jump threading pass to handle select instructions.
>> 
>> 1) and 2) are unrelated.
> 
> Yeah, I guess the phase ordering makes things tricky. Fair enough.
> 
> Comments on the actual patch below:
> 
>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>> ===================================================================
>> --- lib/Transforms/Utils/SimplifyCFG.cpp (revision 222430)
>> +++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)
>> @@ -73,6 +73,7 @@
>> STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");
>> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>> STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");
>> +STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");
>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>> STATISTIC(NumSpeculations, "Number of speculative executed instructions");
>> 
>> @@ -3963,6 +3964,57 @@
>>   return SI->getNumCases() * 10 >= TableSize * 4;
>> }
>> 
>> +/// Try to reuse the result of the compare for guarding the switch table lookup.
>> +/// If the value of the resulting phi is used in a compare which yields the same
>> +/// result as the guarding compare, we can reuse the guarding compare.
> 
> The comment should probably say that the purpose of reusing the
> compare is to facilitate jump threading.
> 
>> +void reuseTableCompare(ICmpInst *CmpInst, BranchInst *BR,
>> +            Value *&InvertedTableCmp,
> 
> I'm not sure caching InvertedTableCmp is worth the extra book-keeping.
> I assume jump threading also works for the inverted case?
> 
>> +            const SmallVectorImpl<std::pair<ConstantInt*, Constant*> >& Values,
>> +            Constant *DefaultValue) {
>> +
>> +  Constant *CmpOp1 = dyn_cast<Constant>(CmpInst->getOperand(1));
>> +  if (!CmpOp1)
>> +    return;
>> +
>> +
>> +  Constant *TrueConst = ConstantInt::getTrue(CmpInst->getType());
>> +  Constant *FalseConst = ConstantInt::getFalse(CmpInst->getType());
>> +
>> +  // Check if the compare with the default value is constant true or false.
>> +  Constant *DefaultConst = ConstantExpr::getICmp(CmpInst->getPredicate(),
>> +                                                 DefaultValue, CmpOp1, true);
>> +  if (DefaultConst != TrueConst && DefaultConst != FalseConst)
>> +    return;
>> +
>> +  // Check if we have a consistent compare result for all case values.
>> +  Constant *CommonCaseConst = nullptr;
>> +  for (auto ValuePair : Values) {
>> +    Constant *CaseConst = ConstantExpr::getICmp(CmpInst->getPredicate(),
>> +                              ValuePair.second, CmpOp1, true);
>> +    if (CommonCaseConst && CommonCaseConst != CaseConst)
>> +      return;
>> +    CommonCaseConst = CaseConst;
> 
> I would have written this as:
> 
>  if (!CommonCaseConst)
>    CommonCaseConst = CaseConst;
>  if (CaseConst != CommonCaseConst)
>    return;
> 
> Actually, instead of checking against CommonCaseConst, couldn't we
> just check that CaseConst is always the opposite of DefaultConst? I
> think that could simplify the loop a bit, and also the code below?
> 
>> +  }
>> +  if (CommonCaseConst != TrueConst && CommonCaseConst != FalseConst)
>> +    return;
>> +
>> +  Value *TableCmp = BR->getCondition();
>> +  if (DefaultConst == FalseConst && CommonCaseConst == TrueConst) {
>> +    // The compare yields the same result. We can replace it.
>> +    CmpInst->replaceAllUsesWith(TableCmp);
>> +    ++NumTableCmpReuses;
>> +  } else if (DefaultConst == TrueConst && CommonCaseConst == FalseConst) {
>> +    // The compare yields the same result, just inverted. We can replace it.
>> +    if (!InvertedTableCmp) {
>> +      // Create a boolean invert, if we don't have it yet.
>> +      InvertedTableCmp = BinaryOperator::CreateXor(TableCmp,
>> +                ConstantInt::get(TableCmp->getType(), 1), "inverted.cmp", BR);
>> +    }
>> +    CmpInst->replaceAllUsesWith(InvertedTableCmp);
>> +    ++NumTableCmpReuses;
>> +  }
>> +}
>> +
>> /// SwitchToLookupTable - If the switch is only used to initialize one or more
>> /// phi nodes in a common successor block with different constant values,
>> /// replace the switch with lookup tables.
>> @@ -4039,11 +4091,8 @@
>>   // If the table has holes, we need a constant result for the default case
>>   // or a bitmask that fits in a register.
>>   SmallVector<std::pair<PHINode*, Constant*>, 4> DefaultResultsList;
>> -  bool HasDefaultResults = false;
>> -  if (TableHasHoles) {
>> -    HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),
>> +  bool HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),
>>                                        &CommonDest, DefaultResultsList, DL);
>> -  }
>> 
>>   bool NeedMask = (TableHasHoles && !HasDefaultResults);
>>   if (NeedMask) {
>> @@ -4087,6 +4136,8 @@
>>   // lookup table BB. Otherwise, check if the condition value is within the case
>>   // range. If it is so, branch to the new BB. Otherwise branch to SI's default
>>   // destination.
>> +  BranchInst *BranchInst = nullptr;
> 
> Since we create a branch also for "covered" lookup tables, maybe this
> should be called CondBranchInst or something? Or actually, could we
> keep track of the comparison instruction instead, e.g. "Value
> *TableRangeCheck"? Or maybe the conditional branch could be called
> TableRangeCheck.
> 
>> +
>>   const bool GeneratingCoveredLookupTable = MaxTableSize == TableSize;
>>   if (GeneratingCoveredLookupTable) {
>>     Builder.CreateBr(LookupBB);
>> @@ -4097,7 +4148,7 @@
>>   } else {
>>     Value *Cmp = Builder.CreateICmpULT(TableIndex, ConstantInt::get(
>>                                        MinCaseVal->getType(), TableSize));
>> -    Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
>> +    BranchInst = Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());
>>   }
>> 
>>   // Populate the BB that does the lookups.
>> @@ -4148,11 +4199,11 @@
>>   bool ReturnedEarly = false;
>>   for (size_t I = 0, E = PHIs.size(); I != E; ++I) {
>>     PHINode *PHI = PHIs[I];
>> +    const ResultListTy &ResultList = ResultLists[PHI];
>> 
>>     // If using a bitmask, use any value to fill the lookup table holes.
>>     Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];
>> -    SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultLists[PHI],
>> -                            DV, DL);
>> +    SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL);
>> 
>>     Value *Result = Table.BuildLookup(TableIndex, Builder);
>> 
>> @@ -4164,6 +4215,22 @@
>>       ReturnedEarly = true;
>>       break;
>>     }
>> +
>> +    // Do a small peephole optimization: re-use the switch table compare if
>> +    // possible.
>> +    // This is similiar to InstCombiner::FoldOpIntoPhi. FoldOpIntoPhi can't
>> +    // handle switch tables so we do it explicitly here.
>> +    if (!TableHasHoles && HasDefaultResults && BranchInst) {
> 
> I wish the body of this if statement could be extracted to a separate
> utility function to keep the code here a little simpler. I feel if it
> was just something like:
> 
>  if (BranchInst && HasDefaultResults && !TableHasHoles)
>    reuseTableRangeCheck(...)
> 
> It would feel less intrusive. Maybe just move the search for the Cmp
> instruction into reuseTableCompare.
> 
>> +      Value *InvertedTableCmp = nullptr;
>> +      for (auto UI = PHI->user_begin(), E = PHI->user_end(); UI != E; ++UI) {
> 
> Could probably use a range-based for loop over PHI->users() instead.
> 
>> +        // Check if we have an icmp in the same block.
>> +        ICmpInst *CmpInst = dyn_cast<ICmpInst>(*UI);
>> +        if (CmpInst && CmpInst->getParent() == PHI->getParent()) {
>> +          reuseTableCompare(CmpInst, BranchInst, InvertedTableCmp, ResultList,
>> +                            DV);
>> +        }
>> +      }
>> +    }
>> 
>>     PHI->addIncoming(Result, LookupBB);
>>   }
>> Index: test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (revision 222430)
>> +++ test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (working copy)
>> @@ -1078,3 +1078,93 @@
>> ; CHECK-NEXT: ret i8 %switch.idx.cast
>> }
> 
> Please add a comment for each test to make it easier for a casual
> reader to see what they're doing.
> 
>> 
>> +define i32 @reuse_cmp1(i32 %x) {
>> +entry:
>> +  switch i32 %x, label %sw.default [
>> +    i32 0, label %sw.bb
>> +    i32 1, label %sw.bb1
>> +    i32 2, label %sw.bb2
>> +    i32 3, label %sw.bb3
>> +  ]
>> +sw.bb: br label %sw.epilog
>> +sw.bb1: br label %sw.epilog
>> +sw.bb2: br label %sw.epilog
>> +sw.bb3: br label %sw.epilog
>> +sw.default: br label %sw.epilog
>> +sw.epilog:
>> +  %r.0 = phi i32 [ 0, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]
>> +  %cmp = icmp eq i32 %r.0, 0
>> +  br i1 %cmp, label %if.then, label %if.end
>> +if.then: br label %return
>> +if.end: br label %return
>> +return:
>> +  %retval.0 = phi i32 [ 100, %if.then ], [ %r.0, %if.end ]
>> +  ret i32 %retval.0
>> +; CHECK-LABEL: @reuse_cmp1(
>> +; CHECK: entry:
>> +; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0
>> +; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4
>> +; CHECK-NEXT: %inverted.cmp = xor i1 [[C]], true
>> +; CHECK:      [[R:%.+]] = select i1 %inverted.cmp, i32 100, i32 {{.*}}
>> +; CHECK-NEXT: ret i32 [[R]]
>> +}
>> +
>> +define i32 @reuse_cmp2(i32 %x) {
>> +entry:
>> +  switch i32 %x, label %sw.default [
>> +    i32 0, label %sw.bb
>> +    i32 1, label %sw.bb1
>> +    i32 2, label %sw.bb2
>> +    i32 3, label %sw.bb3
>> +  ]
>> +sw.bb: br label %sw.epilog
>> +sw.bb1: br label %sw.epilog
>> +sw.bb2: br label %sw.epilog
>> +sw.bb3: br label %sw.epilog
>> +sw.default: br label %sw.epilog
>> +sw.epilog:
>> +  %r.0 = phi i32 [ 4, %sw.default ], [ 3, %sw.bb3 ], [ 2, %sw.bb2 ], [ 1, %sw.bb1 ], [ 0, %sw.bb ]
>> +  %cmp = icmp ne i32 %r.0, 4
>> +  br i1 %cmp, label %if.then, label %if.end
>> +if.then: br label %return
>> +if.end: br label %return
>> +return:
>> +  %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]
>> +  ret i32 %retval.0
>> +; CHECK-LABEL: @reuse_cmp2(
>> +; CHECK: entry:
>> +; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0
>> +; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4
>> +; CHECK:      [[R:%.+]] = select i1 [[C]], i32 {{.*}}, i32 100
>> +; CHECK-NEXT: ret i32 [[R]]
>> +}
>> +
>> +define i32 @no_reuse_cmp(i32 %x) {
>> +entry:
>> +  switch i32 %x, label %sw.default [
>> +    i32 0, label %sw.bb
>> +    i32 1, label %sw.bb1
>> +    i32 2, label %sw.bb2
>> +    i32 3, label %sw.bb3
>> +  ]
>> +sw.bb: br label %sw.epilog
>> +sw.bb1: br label %sw.epilog
>> +sw.bb2: br label %sw.epilog
>> +sw.bb3: br label %sw.epilog
>> +sw.default: br label %sw.epilog
>> +sw.epilog:
>> +  %r.0 = phi i32 [ 12, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]
>> +  %cmp = icmp ne i32 %r.0, 0
>> +  br i1 %cmp, label %if.then, label %if.end
>> +if.then: br label %return
>> +if.end: br label %return
>> +return:
>> +  %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]
>> +  ret i32 %retval.0
>> +; CHECK-LABEL: @no_reuse_cmp(
>> +; CHECK:  [[S:%.+]] = select
>> +; CHECK-NEXT:  %cmp = icmp ne i32 [[S]], 0
>> +; CHECK-NEXT:  [[R:%.+]] = select i1 %cmp, i32 [[S]], i32 100
>> +; CHECK-NEXT:  ret i32 [[R]]
>> +}
>> +

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141126/486bd7c4/attachment.html>