<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Mon, Aug 1, 2016 at 12:52 AM James Molloy via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: jamesm<br>
Date: Mon Aug 1 02:45:11 2016<br>
New Revision: 277325<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=277325&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=277325&view=rev</a><br>
Log:<br>
[SimplifyCFG] Range reduce switches<br>
<br>
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:<br>
<br>
switch (i) {<br>
case 5: ...<br>
case 9: ...<br>
case 13: ...<br>
case 17: ...<br>
}<br>
<br>
can become:<br>
<br>
if ( (i - 5) % 4 ) goto default;<br>
switch ((i - 5) / 4) {<br>
case 0: ...<br>
case 1: ...<br>
case 2: ...<br>
case 3: ...<br>
}<br>
<br>
or even better:<br>
<br>
switch ( ROTR(i - 5, 2) {<br>
case 0: ...<br>
case 1: ...<br>
case 2: ...<br>
case 3: ...<br>
}<br>
<br>
The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.<br></blockquote><div><br></div><div>So I don't necessarily disagree with the idea here, but I thought this kind of factoring of switches was thoroughly handled in the code generator.</div><div><br></div><div>If LLVM should be doing this transform in simplify-cfg, I don't think the *lowering* ease is the right motivation for doing it. It should be exposing combines with other transforms or otherwise making the common factor visible, and there should be some rationale or argument for why this is better done at this level.</div><div><br></div><div>And even if we *do* keep this for canonicalization purposes, if the switch lowering is failing to build lookup tables or other basic optimizations merely because we need to extract a common factor, that sounds like a serious bug in the lowering code as it was designed to take advantage of such properties.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Added:<br>
llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll<br>
Modified:<br>
llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp<br>
<br>
Modified: llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=277325&r1=277324&r2=277325&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp?rev=277325&r1=277324&r2=277325&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp (original)<br>
+++ llvm/trunk/lib/Transforms/Utils/SimplifyCFG.cpp Mon Aug 1 02:45:11 2016<br>
@@ -5038,6 +5038,109 @@ static bool SwitchToLookupTable(SwitchIn<br>
return true;<br>
}<br>
<br>
+static bool isSwitchDense(ArrayRef<int64_t> Values) {<br>
+ // See also SelectionDAGBuilder::isDense(), which this function was based on.<br>
+ uint64_t Diff = (uint64_t)Values.back() - (uint64_t)Values.front();<br>
+ uint64_t Range = Diff + 1;<br>
+ uint64_t NumCases = Values.size();<br>
+ // 40% is the default density for building a jump table in optsize/minsize mode.<br>
+ uint64_t MinDensity = 40;<br>
+<br>
+ return NumCases * 100 >= Range * MinDensity;<br>
+}<br>
+<br>
+// Try and transform a switch that has "holes" in it to a contiguous sequence<br>
+// of cases.<br>
+//<br>
+// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be<br>
+// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}.<br>
+//<br>
+// This converts a sparse switch into a dense switch which allows better<br>
+// lowering and could also allow transforming into a lookup table.<br>
+static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,<br>
+ const DataLayout &DL,<br>
+ const TargetTransformInfo &TTI) {<br>
+ auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());<br>
+ if (CondTy->getIntegerBitWidth() > 64 ||<br>
+ !DL.fitsInLegalInteger(CondTy->getIntegerBitWidth()))<br>
+ return false;<br>
+ // Only bother with this optimization if there are more than 3 switch cases;<br>
+ // SDAG will only bother creating jump tables for 4 or more cases.<br>
+ if (SI->getNumCases() < 4)<br>
+ return false;<br>
+<br>
+ // This transform is agnostic to the signedness of the input or case values. We<br>
+ // can treat the case values as signed or unsigned. We can optimize more common<br>
+ // cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values<br>
+ // as signed.<br>
+ SmallVector<int64_t,4> Values;<br>
+ for (auto &C : SI->cases())<br>
+ Values.push_back(C.getCaseValue()->getValue().getSExtValue());<br>
+ std::sort(Values.begin(), Values.end());<br>
+<br>
+ // If the switch is already dense, there's nothing useful to do here.<br>
+ if (isSwitchDense(Values))<br>
+ return false;<br>
+<br>
+ // First, transform the values such that they start at zero and ascend.<br>
+ int64_t Base = Values[0];<br>
+ for (auto &V : Values)<br>
+ V -= Base;<br>
+<br>
+ // Now we have signed numbers that have been shifted so that, given enough<br>
+ // precision, there are no negative values. Since the rest of the transform<br>
+ // is bitwise only, we switch now to an unsigned representation.<br>
+ uint64_t GCD = 0;<br>
+ for (auto &V : Values)<br>
+ GCD = llvm::GreatestCommonDivisor64(GCD, (uint64_t)V);<br>
+<br>
+ // This transform can be done speculatively because it is so cheap - it results<br>
+ // in a single rotate operation being inserted. This can only happen if the<br>
+ // factor extracted is a power of 2.<br>
+ // FIXME: If the GCD is an odd number we can multiply by the multiplicative<br>
+ // inverse of GCD and then perform this transform.<br>
+ // FIXME: It's possible that optimizing a switch on powers of two might also<br>
+ // be beneficial - flag values are often powers of two and we could use a CLZ<br>
+ // as the key function.<br>
+ if (GCD <= 1 || !llvm::isPowerOf2_64(GCD))<br>
+ // No common divisor found or too expensive to compute key function.<br>
+ return false;<br>
+<br>
+ unsigned Shift = llvm::Log2_64(GCD);<br>
+ for (auto &V : Values)<br>
+ V = (int64_t)((uint64_t)V >> Shift);<br>
+<br>
+ if (!isSwitchDense(Values))<br>
+ // Transform didn't create a dense switch.<br>
+ return false;<br>
+<br>
+ // The obvious transform is to shift the switch condition right and emit a<br>
+ // check that the condition actually cleanly divided by GCD, i.e.<br>
+ // C & (1 << Shift - 1) == 0<br>
+ // inserting a new CFG edge to handle the case where it didn't divide cleanly.<br>
+ //<br>
+ // A cheaper way of doing this is a simple ROTR(C, Shift). This performs the<br>
+ // shift and puts the shifted-off bits in the uppermost bits. If any of these<br>
+ // are nonzero then the switch condition will be very large and will hit the<br>
+ // default case.<br>
+<br>
+ auto *Ty = cast<IntegerType>(SI->getCondition()->getType());<br>
+ Builder.SetInsertPoint(SI);<br>
+ auto *ShiftC = ConstantInt::get(Ty, Shift);<br>
+ auto *Sub = Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base));<br>
+ auto *Rot = Builder.CreateOr(Builder.CreateLShr(Sub, ShiftC),<br>
+ Builder.CreateShl(Sub, Ty->getBitWidth() - Shift));<br>
+ SI->replaceUsesOfWith(SI->getCondition(), Rot);<br>
+<br>
+ for (auto &C : SI->cases()) {<br>
+ auto *Orig = C.getCaseValue();<br>
+ auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base);<br>
+ SI->replaceUsesOfWith(Orig,<br>
+ ConstantInt::get(Ty, Sub.lshr(ShiftC->getValue())));<br>
+ }<br>
+ return true;<br>
+}<br>
+<br>
bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {<br>
BasicBlock *BB = SI->getParent();<br>
<br>
@@ -5081,6 +5184,9 @@ bool SimplifyCFGOpt::SimplifySwitch(Swit<br>
if (SwitchToLookupTable(SI, Builder, DL, TTI))<br>
return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) | true;<br>
<br>
+ if (ReduceSwitchRange(SI, Builder, DL, TTI))<br>
+ return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) | true;<br>
+<br>
return false;<br>
}<br>
<br>
<br>
Added: llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll?rev=277325&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll?rev=277325&view=auto</a><br>
==============================================================================<br>
--- llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll (added)<br>
+++ llvm/trunk/test/Transforms/SimplifyCFG/rangereduce.ll Mon Aug 1 02:45:11 2016<br>
@@ -0,0 +1,195 @@<br>
+; RUN: opt < %s -simplifycfg -S | FileCheck %s<br>
+<br>
+target datalayout = "e-n32"<br>
+<br>
+; CHECK-LABEL: @test1<br>
+; CHECK: %1 = sub i32 %a, 97<br>
+; CHECK: %2 = lshr i32 %1, 2<br>
+; CHECK: %3 = shl i32 %1, 30<br>
+; CHECK: %4 = or i32 %2, %3<br>
+; CHECK: switch i32 %4, label %def [<br>
+; CHECK: i32 0, label %one<br>
+; CHECK: i32 1, label %two<br>
+; CHECK: i32 2, label %three<br>
+; CHECK: ]<br>
+define i32 @test1(i32 %a) {<br>
+ switch i32 %a, label %def [<br>
+ i32 97, label %one<br>
+ i32 101, label %two<br>
+ i32 105, label %three<br>
+ i32 109, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
+<br>
+; Optimization shouldn't trigger; bitwidth > 64<br>
+; CHECK-LABEL: @test2<br>
+; CHECK: switch i128 %a, label %def<br>
+define i128 @test2(i128 %a) {<br>
+ switch i128 %a, label %def [<br>
+ i128 97, label %one<br>
+ i128 101, label %two<br>
+ i128 105, label %three<br>
+ i128 109, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i128 8867<br>
+<br>
+one:<br>
+ ret i128 11984<br>
+two:<br>
+ ret i128 1143<br>
+three:<br>
+ ret i128 99783<br>
+}<br>
+<br>
+<br>
+; Optimization shouldn't trigger; no holes present<br>
+; CHECK-LABEL: @test3<br>
+; CHECK: switch i32 %a, label %def<br>
+define i32 @test3(i32 %a) {<br>
+ switch i32 %a, label %def [<br>
+ i32 97, label %one<br>
+ i32 98, label %two<br>
+ i32 99, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
+<br>
+; Optimization shouldn't trigger; not an arithmetic progression<br>
+; CHECK-LABEL: @test4<br>
+; CHECK: switch i32 %a, label %def<br>
+define i32 @test4(i32 %a) {<br>
+ switch i32 %a, label %def [<br>
+ i32 97, label %one<br>
+ i32 102, label %two<br>
+ i32 105, label %three<br>
+ i32 109, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
+<br>
+; Optimization shouldn't trigger; not a power of two<br>
+; CHECK-LABEL: @test5<br>
+; CHECK: switch i32 %a, label %def<br>
+define i32 @test5(i32 %a) {<br>
+ switch i32 %a, label %def [<br>
+ i32 97, label %one<br>
+ i32 102, label %two<br>
+ i32 107, label %three<br>
+ i32 112, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
+<br>
+; CHECK-LABEL: @test6<br>
+; CHECK: %1 = sub i32 %a, -109<br>
+; CHECK: %2 = lshr i32 %1, 2<br>
+; CHECK: %3 = shl i32 %1, 30<br>
+; CHECK: %4 = or i32 %2, %3<br>
+; CHECK: switch i32 %4, label %def [<br>
+define i32 @test6(i32 %a) optsize {<br>
+ switch i32 %a, label %def [<br>
+ i32 -97, label %one<br>
+ i32 -101, label %two<br>
+ i32 -105, label %three<br>
+ i32 -109, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
+<br>
+; CHECK-LABEL: @test7<br>
+; CHECK: %1 = sub i8 %a, -36<br>
+; CHECK: %2 = lshr i8 %1, 2<br>
+; CHECK: %3 = shl i8 %1, 6<br>
+; CHECK: %4 = or i8 %2, %3<br>
+; CHECK: switch.tableidx = {{.*}} %4<br>
+define i8 @test7(i8 %a) optsize {<br>
+ switch i8 %a, label %def [<br>
+ i8 220, label %one<br>
+ i8 224, label %two<br>
+ i8 228, label %three<br>
+ i8 232, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i8 8867<br>
+<br>
+one:<br>
+ ret i8 11984<br>
+two:<br>
+ ret i8 1143<br>
+three:<br>
+ ret i8 99783<br>
+}<br>
+<br>
+; CHECK-LABEL: @test8<br>
+; CHECK: %1 = sub i32 %a, 97<br>
+; CHECK: %2 = lshr i32 %1, 2<br>
+; CHECK: %3 = shl i32 %1, 30<br>
+; CHECK: %4 = or i32 %2, %3<br>
+; CHECK: switch i32 %4, label %def [<br>
+define i32 @test8(i32 %a) optsize {<br>
+ switch i32 %a, label %def [<br>
+ i32 97, label %one<br>
+ i32 101, label %two<br>
+ i32 105, label %three<br>
+ i32 113, label %three<br>
+ ]<br>
+<br>
+def:<br>
+ ret i32 8867<br>
+<br>
+one:<br>
+ ret i32 11984<br>
+two:<br>
+ ret i32 1143<br>
+three:<br>
+ ret i32 99783<br>
+}<br>
\ No newline at end of file<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote></div></div>