<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Thanks for the detailed review!</div>I have put a new version it into phabricator: <a href="http://reviews.llvm.org/D6423" class="">http://reviews.llvm.org/D6423</a><div class="">It should cover all your comments.</div><div class=""><br class=""><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><div><blockquote type="cite" class=""><div class="">On 25 Nov 2014, at 20:48, Hans Wennborg <<a href="mailto:hans@chromium.org" class="">hans@chromium.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">On Tue, Nov 25, 2014 at 6:19 AM, Erik Eckstein <<a href="mailto:eeckstein@apple.com" class="">eeckstein@apple.com</a>> wrote:<br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">Could jump threading or one of its analyses be taught to handle this?<br class="">So that we could also handle a case like:<br class=""></blockquote><br class="">Actually the current jump threading can handle this, but only if the "r == 0" is a compare + branch. E.g. if do_something is a call, it will work.<br class="">It currently does not handle select instructions. So if do_something is a simple variable assignment, then it will not work. I think this could be added easily.<br class=""><br class="">But we have a phase ordering problem: jump threading is obviously done after switch table generation (so it does not work currently for switches which are converted to tables).<br class="">If we would do jump threading before, then it might prevent switch table generation.<br class=""><br class="">I suggest the following:<br class="">1) Use my patch to do this kind of "jump threading" for switch tables (solves the phase ordering problem).<br class="">2) Teach the jump threading pass to handle select instructions.<br class=""><br class="">1) and 2) are unrelated.<br class=""></blockquote><br class="">Yeah, I guess the phase ordering makes things tricky. Fair enough.<br class=""><br class="">Comments on the actual patch below:<br class=""><br class=""><blockquote type="cite" class="">Index: lib/Transforms/Utils/SimplifyCFG.cpp<br class="">===================================================================<br class="">--- lib/Transforms/Utils/SimplifyCFG.cpp (revision 222430)<br class="">+++ lib/Transforms/Utils/SimplifyCFG.cpp (working copy)<br class="">@@ -73,6 +73,7 @@<br class=""> STATISTIC(NumLinearMaps, "Number of switch instructions turned into linear mapping");<br class=""> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");<br class=""> STATISTIC(NumLookupTablesHoles, "Number of switch instructions turned into lookup tables (holes checked)");<br class="">+STATISTIC(NumTableCmpReuses, "Number of reused switch table lookup compares");<br class=""> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");<br class=""> STATISTIC(NumSpeculations, "Number of speculative executed instructions");<br class=""><br class="">@@ -3963,6 +3964,57 @@<br class=""> return SI->getNumCases() * 10 >= TableSize * 4;<br class=""> }<br class=""><br class="">+/// Try to reuse the result of the compare for guarding the switch table lookup.<br class="">+/// If the value of the resulting phi is used in a compare which yields the same<br class="">+/// result as the guarding compare, we can reuse the guarding compare.<br class=""></blockquote><br class="">The comment should probably say that the purpose of reusing the<br class="">compare is to facilitate jump threading.<br class=""><br class=""><blockquote type="cite" class="">+void reuseTableCompare(ICmpInst *CmpInst, BranchInst *BR,<br class="">+ Value *&InvertedTableCmp,<br class=""></blockquote><br class="">I'm not sure caching InvertedTableCmp is worth the extra book-keeping.<br class="">I assume jump threading also works for the inverted case?<br class=""><br class=""><blockquote type="cite" class="">+ const SmallVectorImpl<std::pair<ConstantInt*, Constant*> >& Values,<br class="">+ Constant *DefaultValue) {<br class="">+<br class="">+ Constant *CmpOp1 = dyn_cast<Constant>(CmpInst->getOperand(1));<br class="">+ if (!CmpOp1)<br class="">+ return;<br class="">+<br class="">+<br class="">+ Constant *TrueConst = ConstantInt::getTrue(CmpInst->getType());<br class="">+ Constant *FalseConst = ConstantInt::getFalse(CmpInst->getType());<br class="">+<br class="">+ // Check if the compare with the default value is constant true or false.<br class="">+ Constant *DefaultConst = ConstantExpr::getICmp(CmpInst->getPredicate(),<br class="">+ DefaultValue, CmpOp1, true);<br class="">+ if (DefaultConst != TrueConst && DefaultConst != FalseConst)<br class="">+ return;<br class="">+<br class="">+ // Check if we have a consistent compare result for all case values.<br class="">+ Constant *CommonCaseConst = nullptr;<br class="">+ for (auto ValuePair : Values) {<br class="">+ Constant *CaseConst = ConstantExpr::getICmp(CmpInst->getPredicate(),<br class="">+ ValuePair.second, CmpOp1, true);<br class="">+ if (CommonCaseConst && CommonCaseConst != CaseConst)<br class="">+ return;<br class="">+ CommonCaseConst = CaseConst;<br class=""></blockquote><br class="">I would have written this as:<br class=""><br class=""> if (!CommonCaseConst)<br class=""> CommonCaseConst = CaseConst;<br class=""> if (CaseConst != CommonCaseConst)<br class=""> return;<br class=""><br class="">Actually, instead of checking against CommonCaseConst, couldn't we<br class="">just check that CaseConst is always the opposite of DefaultConst? I<br class="">think that could simplify the loop a bit, and also the code below?<br class=""><br class=""><blockquote type="cite" class="">+ }<br class="">+ if (CommonCaseConst != TrueConst && CommonCaseConst != FalseConst)<br class="">+ return;<br class="">+<br class="">+ Value *TableCmp = BR->getCondition();<br class="">+ if (DefaultConst == FalseConst && CommonCaseConst == TrueConst) {<br class="">+ // The compare yields the same result. We can replace it.<br class="">+ CmpInst->replaceAllUsesWith(TableCmp);<br class="">+ ++NumTableCmpReuses;<br class="">+ } else if (DefaultConst == TrueConst && CommonCaseConst == FalseConst) {<br class="">+ // The compare yields the same result, just inverted. We can replace it.<br class="">+ if (!InvertedTableCmp) {<br class="">+ // Create a boolean invert, if we don't have it yet.<br class="">+ InvertedTableCmp = BinaryOperator::CreateXor(TableCmp,<br class="">+ ConstantInt::get(TableCmp->getType(), 1), "inverted.cmp", BR);<br class="">+ }<br class="">+ CmpInst->replaceAllUsesWith(InvertedTableCmp);<br class="">+ ++NumTableCmpReuses;<br class="">+ }<br class="">+}<br class="">+<br class=""> /// SwitchToLookupTable - If the switch is only used to initialize one or more<br class=""> /// phi nodes in a common successor block with different constant values,<br class=""> /// replace the switch with lookup tables.<br class="">@@ -4039,11 +4091,8 @@<br class=""> // If the table has holes, we need a constant result for the default case<br class=""> // or a bitmask that fits in a register.<br class=""> SmallVector<std::pair<PHINode*, Constant*>, 4> DefaultResultsList;<br class="">- bool HasDefaultResults = false;<br class="">- if (TableHasHoles) {<br class="">- HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),<br class="">+ bool HasDefaultResults = GetCaseResults(SI, nullptr, SI->getDefaultDest(),<br class=""> &CommonDest, DefaultResultsList, DL);<br class="">- }<br class=""><br class=""> bool NeedMask = (TableHasHoles && !HasDefaultResults);<br class=""> if (NeedMask) {<br class="">@@ -4087,6 +4136,8 @@<br class=""> // lookup table BB. Otherwise, check if the condition value is within the case<br class=""> // range. If it is so, branch to the new BB. Otherwise branch to SI's default<br class=""> // destination.<br class="">+ BranchInst *BranchInst = nullptr;<br class=""></blockquote><br class="">Since we create a branch also for "covered" lookup tables, maybe this<br class="">should be called CondBranchInst or something? Or actually, could we<br class="">keep track of the comparison instruction instead, e.g. "Value<br class="">*TableRangeCheck"? Or maybe the conditional branch could be called<br class="">TableRangeCheck.<br class=""><br class=""><blockquote type="cite" class="">+<br class=""> const bool GeneratingCoveredLookupTable = MaxTableSize == TableSize;<br class=""> if (GeneratingCoveredLookupTable) {<br class=""> Builder.CreateBr(LookupBB);<br class="">@@ -4097,7 +4148,7 @@<br class=""> } else {<br class=""> Value *Cmp = Builder.CreateICmpULT(TableIndex, ConstantInt::get(<br class=""> MinCaseVal->getType(), TableSize));<br class="">- Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());<br class="">+ BranchInst = Builder.CreateCondBr(Cmp, LookupBB, SI->getDefaultDest());<br class=""> }<br class=""><br class=""> // Populate the BB that does the lookups.<br class="">@@ -4148,11 +4199,11 @@<br class=""> bool ReturnedEarly = false;<br class=""> for (size_t I = 0, E = PHIs.size(); I != E; ++I) {<br class=""> PHINode *PHI = PHIs[I];<br class="">+ const ResultListTy &ResultList = ResultLists[PHI];<br class=""><br class=""> // If using a bitmask, use any value to fill the lookup table holes.<br class=""> Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];<br class="">- SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultLists[PHI],<br class="">- DV, DL);<br class="">+ SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL);<br class=""><br class=""> Value *Result = Table.BuildLookup(TableIndex, Builder);<br class=""><br class="">@@ -4164,6 +4215,22 @@<br class=""> ReturnedEarly = true;<br class=""> break;<br class=""> }<br class="">+<br class="">+ // Do a small peephole optimization: re-use the switch table compare if<br class="">+ // possible.<br class="">+ // This is similiar to InstCombiner::FoldOpIntoPhi. FoldOpIntoPhi can't<br class="">+ // handle switch tables so we do it explicitly here.<br class="">+ if (!TableHasHoles && HasDefaultResults && BranchInst) {<br class=""></blockquote><br class="">I wish the body of this if statement could be extracted to a separate<br class="">utility function to keep the code here a little simpler. I feel if it<br class="">was just something like:<br class=""><br class=""> if (BranchInst && HasDefaultResults && !TableHasHoles)<br class=""> reuseTableRangeCheck(...)<br class=""><br class="">It would feel less intrusive. Maybe just move the search for the Cmp<br class="">instruction into reuseTableCompare.<br class=""><br class=""><blockquote type="cite" class="">+ Value *InvertedTableCmp = nullptr;<br class="">+ for (auto UI = PHI->user_begin(), E = PHI->user_end(); UI != E; ++UI) {<br class=""></blockquote><br class="">Could probably use a range-based for loop over PHI->users() instead.<br class=""><br class=""><blockquote type="cite" class="">+ // Check if we have an icmp in the same block.<br class="">+ ICmpInst *CmpInst = dyn_cast<ICmpInst>(*UI);<br class="">+ if (CmpInst && CmpInst->getParent() == PHI->getParent()) {<br class="">+ reuseTableCompare(CmpInst, BranchInst, InvertedTableCmp, ResultList,<br class="">+ DV);<br class="">+ }<br class="">+ }<br class="">+ }<br class=""><br class=""> PHI->addIncoming(Result, LookupBB);<br class=""> }<br class="">Index: test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll<br class="">===================================================================<br class="">--- test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (revision 222430)<br class="">+++ test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll (working copy)<br class="">@@ -1078,3 +1078,93 @@<br class=""> ; CHECK-NEXT: ret i8 %switch.idx.cast<br class=""> }<br class=""></blockquote><br class="">Please add a comment for each test to make it easier for a casual<br class="">reader to see what they're doing.<br class=""><br class=""><blockquote type="cite" class=""><br class="">+define i32 @reuse_cmp1(i32 %x) {<br class="">+entry:<br class="">+ switch i32 %x, label %sw.default [<br class="">+ i32 0, label %sw.bb<br class="">+ i32 1, label %sw.bb1<br class="">+ i32 2, label %sw.bb2<br class="">+ i32 3, label %sw.bb3<br class="">+ ]<br class="">+sw.bb: br label %sw.epilog<br class="">+sw.bb1: br label %sw.epilog<br class="">+sw.bb2: br label %sw.epilog<br class="">+sw.bb3: br label %sw.epilog<br class="">+sw.default: br label %sw.epilog<br class="">+sw.epilog:<br class="">+ %r.0 = phi i32 [ 0, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]<br class="">+ %cmp = icmp eq i32 %r.0, 0<br class="">+ br i1 %cmp, label %if.then, label %if.end<br class="">+if.then: br label %return<br class="">+if.end: br label %return<br class="">+return:<br class="">+ %retval.0 = phi i32 [ 100, %if.then ], [ %r.0, %if.end ]<br class="">+ ret i32 %retval.0<br class="">+; CHECK-LABEL: @reuse_cmp1(<br class="">+; CHECK: entry:<br class="">+; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0<br class="">+; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4<br class="">+; CHECK-NEXT: %inverted.cmp = xor i1 [[C]], true<br class="">+; CHECK: [[R:%.+]] = select i1 %inverted.cmp, i32 100, i32 {{.*}}<br class="">+; CHECK-NEXT: ret i32 [[R]]<br class="">+}<br class="">+<br class="">+define i32 @reuse_cmp2(i32 %x) {<br class="">+entry:<br class="">+ switch i32 %x, label %sw.default [<br class="">+ i32 0, label %sw.bb<br class="">+ i32 1, label %sw.bb1<br class="">+ i32 2, label %sw.bb2<br class="">+ i32 3, label %sw.bb3<br class="">+ ]<br class="">+sw.bb: br label %sw.epilog<br class="">+sw.bb1: br label %sw.epilog<br class="">+sw.bb2: br label %sw.epilog<br class="">+sw.bb3: br label %sw.epilog<br class="">+sw.default: br label %sw.epilog<br class="">+sw.epilog:<br class="">+ %r.0 = phi i32 [ 4, %sw.default ], [ 3, %sw.bb3 ], [ 2, %sw.bb2 ], [ 1, %sw.bb1 ], [ 0, %sw.bb ]<br class="">+ %cmp = icmp ne i32 %r.0, 4<br class="">+ br i1 %cmp, label %if.then, label %if.end<br class="">+if.then: br label %return<br class="">+if.end: br label %return<br class="">+return:<br class="">+ %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]<br class="">+ ret i32 %retval.0<br class="">+; CHECK-LABEL: @reuse_cmp2(<br class="">+; CHECK: entry:<br class="">+; CHECK-NEXT: %switch.tableidx = sub i32 %x, 0<br class="">+; CHECK-NEXT: [[C:%.+]] = icmp ult i32 %switch.tableidx, 4<br class="">+; CHECK: [[R:%.+]] = select i1 [[C]], i32 {{.*}}, i32 100<br class="">+; CHECK-NEXT: ret i32 [[R]]<br class="">+}<br class="">+<br class="">+define i32 @no_reuse_cmp(i32 %x) {<br class="">+entry:<br class="">+ switch i32 %x, label %sw.default [<br class="">+ i32 0, label %sw.bb<br class="">+ i32 1, label %sw.bb1<br class="">+ i32 2, label %sw.bb2<br class="">+ i32 3, label %sw.bb3<br class="">+ ]<br class="">+sw.bb: br label %sw.epilog<br class="">+sw.bb1: br label %sw.epilog<br class="">+sw.bb2: br label %sw.epilog<br class="">+sw.bb3: br label %sw.epilog<br class="">+sw.default: br label %sw.epilog<br class="">+sw.epilog:<br class="">+ %r.0 = phi i32 [ 12, %sw.default ], [ 13, %sw.bb3 ], [ 12, %sw.bb2 ], [ 11, %sw.bb1 ], [ 10, %sw.bb ]<br class="">+ %cmp = icmp ne i32 %r.0, 0<br class="">+ br i1 %cmp, label %if.then, label %if.end<br class="">+if.then: br label %return<br class="">+if.end: br label %return<br class="">+return:<br class="">+ %retval.0 = phi i32 [ %r.0, %if.then ], [ 100, %if.end ]<br class="">+ ret i32 %retval.0<br class="">+; CHECK-LABEL: @no_reuse_cmp(<br class="">+; CHECK: [[S:%.+]] = select<br class="">+; CHECK-NEXT: %cmp = icmp ne i32 [[S]], 0<br class="">+; CHECK-NEXT: [[R:%.+]] = select i1 %cmp, i32 [[S]], i32 100<br class="">+; CHECK-NEXT: ret i32 [[R]]<br class="">+}<br class="">+<br class=""></blockquote></div></blockquote></div><br class=""></div></div></body></html>