[PATCH] Expand SimplifyCFG to convert certain simple switches to selects

Philip Reames listmail at philipreames.com
Mon Jun 23 17:23:32 PDT 2014


Answering inline questions.  Will glance at the revised version in separate review.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3686
@@ +3685,3 @@
+    Constant *&DefaultResult) {
+  for (auto I = SI->case_begin(), E = SI->case_end(); I != E; ++I) {
+    ConstantInt *CaseVal = I.getCaseValue();
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > range loop?
> I checked here. I don't think it is possible with SwitchInst to iterate over the cases of the switch with range loop, because there are no begin()/end() methods and range loops only work over classes that expose those.
Maybe something we should add.  Not asking that of you now. 

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3700
@@ +3699,3 @@
+    // intermediate unconditional block to the common destination.
+    if (CaseSuccessor != CommonDest &&
+        !(BI && BI->isUnconditional() && BI->getSuccessor(0) == CommonDest))
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > split the two early exit cases?  might improve readability, particularly if the CaseSuccessor != CommonDest check is above BI.  
> The conditions for the exit cases here are &&ed and not ||ed . So the exit case here is actually one (when both the conditions are verified). What exactly do you mean here?
I had misread the code when scanning quickly.  Please ignore previous comment.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3725
@@ +3724,3 @@
+                 DL);
+  // No default case makes us assume that default is not possible.
+  DefaultResult =
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > It looks like you might end up with PHI unset if you had a switch with only a default case.  Could such a switch reach here?  
> I tried to catch a case here where that happens, but I couldn't reproduce this case by trying to compile something like this:
> 
> define i32 @foo(i32 %a) #0 {
> entry:
>   switch i32 %a, label %sw.default [
>   ]
> 
> sw.default:                                       ; preds = %entry
>   ret i32 10
> }
> 
> with opt -simplifycfg
> 
> Anyway, if that is the case the test:
> 
>   if (UniqueResults.size() != 2)
>     return false;
> 
> in SwitchToSelect() should fail, because a PHI must be set if at least one Result is found, which would prevent the assertion on the next line to trigger.
> 
> I think bailing out in a case like that would be ok, because I don't think this is the right place where to handle a case like this one. 
Your answer makes sense.  I worry a bit that we're developing a series of small optimization tests where a single more general one would be better, but that's a concern for later.  

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3766
@@ +3765,3 @@
+// select.
+static Value *
+ConvertTwoCaseSwitch(const SmallVectorImpl<ConstantInt *> &FirstGroupCases,
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > It looks like this code is actually handling two cases + default.  Update the comment?
> > 
> > Also, is it always profitable to replace one switch with two selects?  I would guess not, but will defer to actual measurement.  
> On X86 from the measurements I did with a very simple micro benchmark it seems profitable.
> 
> This is the code of the benchmark:
> 
> int foo1_with_def(int a) {
>   switch(a) {
>     case 10:
>       return 10;
>     case 20:
>       return 2;
>   }
>   return 4;
> }
> 
> int main() {
>   int a = 0;
>   int ret = 0;
>   for (; a < 2000000000; ++a) {
>     ret |= foo1_with_def(a);
>   }
> 
>   return ret;
> }
> 
> And the generated code for the functions containing the switch is (+ performance results):
> 
> LLVM without patch:
> _foo1_with_def:
> 	movl	$10, %eax
> 	cmpl	$10, %edi
> 	je	LBB0_4
> 	cmpl	$20, %edi
> 	jne	LBB0_3
> 	movl	$2, %eax
> 	retq
> LBB0_3:
> 	movl	$4, %eax
> LBB0_4:
> 	retq
> 
> real	0m5.522s
> user	0m5.519s
> sys	0m0.002s
> 
> LLVM with patch:
> 
> _foo1_with_def:
> 	cmpl	$10, %edi
> 	movl	$4, %ecx
> 	cmovel	%edi, %ecx
> 	cmpl	$20, %edi
> 	movl	$2, %eax
> 	cmovnel	%ecx, %eax
> 	retq
> 
> real	0m3.233s
> user	0m3.230s
> sys	0m0.002s
> 
> I don't have any real world application though to back this up (it looks like a good result though).
> 
> In addition to that X86 is quite a control flow "friendly" architecture with respect to many others and if the results are positive on X86 I would expect that on some more custom architectures where control flow is more expensive this optimization might actually be even more effective.
Thanks for the data.  I'm not entirely sure about how this will play out on other architectures (i.e. ones without a conditional move).  I am not requesting you hold the patch on this mind you.  Just be prepared in case someone objects based on non-x86 perf data.  :)

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3814
@@ +3813,3 @@
+// transform a switch with cases arranged into two ordered not overlapping
+// groups into a value select
+static Value *
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > Could you add examples to your function comments?  It would make them a bit more obvious.  
> Very good point, I'll do that definitely in the next revision
Thanks for doing this.  It helped a lot.

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3902
@@ +3901,3 @@
+  // Selects choose between two values only.
+  if (UniqueResults.size() != 2)
+    return false;
----------------
Marcello Maggioni wrote:
> Marcello Maggioni wrote:
> > Philip Reames wrote:
> > > Since you're handling defaults, wouldn't it also make sense to handle one unique result and a default?  i.e.
> > > switch(i) {
> > > case 4,5,6: return 5;
> > > default: return 0;
> > > }
> > This is a good idea. I need to check if that is profitable in performance though (needs two comparisons to check if the value falls in the range). If it seems profitable I'll add it.
> I actually checked and seems like this specific case is already handled by TurnSwitchRangeIntoICmp()
Thanks for following up.  This fits in with my "series of small optimizations" comment above, but not something which needs to be addressed now.  

================
Comment at: lib/Transforms/Utils/SimplifyCFG.cpp:3955
@@ +3954,3 @@
+                          (FirstPossibleHits + SecondPossibleHits)) != 0);
+    DefaultCanTrigger &= DefaultResult != nullptr;
+
----------------
Marcello Maggioni wrote:
> Philip Reames wrote:
> > It seems like proving the switch default is dead is a useful optimization even if we can make no other changes.  Should this be separated and generalized?
> Do you mean making it a generic independent static function in SimplifyCFG.cpp ?
For this change, no action needed.

I was essentially wondering whether the optimization "prove default case is dead based AnalyzeSwitchCases" would apply to switches with more than two groups or for which we can't completely fold to selects.  This might be worth looking at as a follow up change.

http://reviews.llvm.org/D4219






More information about the llvm-commits mailing list