[PATCH] D17288: [CodeGenPrepare] Do select to branch transform when cmp's operand is expensive.

Thu Mar 31 17:10:47 PDT 2016

Gerolf added a comment.

A little more elaboration on the combiner idea:
In its current form the machine combiner only evaluates a few instruction pattern and picks the "best". Generalizing this to regions - and in particular - to code regions with if-converted code would be a necessary step to make better code generation decisions in that case. Whether this is a good or not so good idea depends on the architecture/uArchitecture and compile-time budget. With a select multiple parameters come into play where the scheduler is in the best position to evaluate different code sequences: the parameters that must be evaluated  include resources (in general more instruction have to execute in parallel in if-converted code), branch predictability, scheduling gains (for example in architecture w/o control speculation the select might enable it) etc. From the compile-time perspective not all combinations can be tried. So a hierarchical approach where simple heuristics (eg. filter branch that are highly predictable) catch most/many cases and the combiner only evaluates some of the "hard" ones likely will result in the best code quality.


================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:4525
@@ +4524,3 @@
+    if (I && I->getOpcode() == Instruction::FDiv &&
+        STI->getSchedModel().FdivLatency >
+            STI->getSchedModel().MispredictPenalty)
----------------
It that really a good heuristic? Even when the divide latency is less than or equal to the branch mispredication penalty issuing a branch can be the better choice. That depends on the program behavior. I believe the reasoning you are looking for is this: in the presence of a long latency instruction assume the dependent branch is well predicted most of the time. Practically the long latency of the divide covers for the (dynamic) instances when that assumption is wrong. 

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:4532
@@ +4531,3 @@
+
+  if (IsExpensiveCostInst(CmpOp0) || IsExpensiveCostInst(CmpOp1))
+    return true;
----------------
In the case both paths consume the long latency select is still the better choice.

================
Comment at: lib/CodeGen/CodeGenPrepare.cpp:4538
@@ -4513,3 +4537,3 @@
   // probably another cmov or setcc around, so it's not worth emitting a branch.
-  if (!Cmp || !Cmp->hasOneUse())
+  if (!Cmp->hasOneUse())
     return false;
----------------
Why? The cmp could feed multiple selects from PHI nodes and still a branch would be preferable.

================
Comment at: test/Transforms/CodeGenPrepare/X86/select.ll:145
@@ +144,3 @@
+  %cmp = fcmp ogt float %div, %b
+  %sel = select i1 %cmp, float %div, float 8.0
+  ret float %sel
----------------
I find this example misleading. The use of the %div in the select is irrelevant. The only issue is whether or not the branch is predictable.


http://reviews.llvm.org/D17288