[PATCH] D16836: [CodeGenPrepare] Don't transform select instructions into branches when both of operands are cheap

Thu Feb 4 22:06:45 PST 2016

flyingforyou updated this revision to Diff 46992.
flyingforyou added a comment.

Addressed Hal's comment.

Thanks, Hal. It is great advice.

Sanjay, Thank you for sharing long history of releated commit.

I agree with your concern also. But first commit was merged on 2012. It's almost 4 years ago. Recent OoO core has more logic for avoiding cache-miss. (Something likes HW prefetcher..)
And I also don't think your previous patch is hack. Even if we can avoid cache-miss penalty, if there are no enough heavy instructions like division, it can't get enough improvement. (Of course, it depends on instruction stream after branch.)

As you said, this approach can bypass load-cmp heuristic. But it might be very minimum portion.

I give more limitation which is Hal's suguesstion on this patch. Can we try to apply this heuristic also?

I also get some improvements on commercial benchmark both of X86, AArch64.


http://reviews.llvm.org/D16836

Files:
  lib/CodeGen/CodeGenPrepare.cpp
  test/CodeGen/AArch64/arm64-select.ll

Index: test/CodeGen/AArch64/arm64-select.ll
===================================================================

--- /dev/null
+++ test/CodeGen/AArch64/arm64-select.ll
@@ -0,0 +1,26 @@
+; RUN: llc -march=arm64 -mcpu=cortex-a57 < %s | FileCheck %s
+; We don't transform below case which has cheap operands of select.
+
+target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
+
+%class.A = type { i32, i32, i32, i32 }
+
+define i32 @test(%class.A* nocapture readonly %cla, float* nocapture readonly %b, i32 %c) #0 {
+entry:
+; CHECK-LABEL: test:
+; CHECK: csel
+  %call = tail call fast float @_Z6getvalv()
+  %0 = load float, float* %b, align 4, !tbaa !0
+  %cmp = fcmp fast olt float %call, %0
+  %a1 = getelementptr inbounds %class.A, %class.A* %cla, i64 0, i32 1
+  %a2 = getelementptr inbounds %class.A, %class.A* %cla, i64 0, i32 2
+  %cond.in = select i1 %cmp, i32* %a1, i32* %a2
+  %cond = load i32, i32* %cond.in, align 4, !tbaa !0
+  ret i32 %cond
+}
+
+declare float @_Z6getvalv() #0
+
+!0 = !{!1, !1, i64 0}
+!1 = distinct !{!"int", !1, i64 0}
\ No newline at end of file
Index: lib/CodeGen/CodeGenPrepare.cpp
===================================================================
--- lib/CodeGen/CodeGenPrepare.cpp
+++ lib/CodeGen/CodeGenPrepare.cpp
@@ -4475,6 +4475,26 @@
   if (!Cmp || !Cmp->hasOneUse())
     return false;
 
+  // If both operand of the select is expected to fold away in lowering,
+  // the mispredicted branch might be more painful.
+  auto IsFreeCostInst = [&](Value *V) -> bool {
+    auto *I = dyn_cast<Instruction>(V);
+    if (I == nullptr)
+      return false;
+
+    if (TTI->getUserCost(I) == TargetTransformInfo::TCC_Free) {
+      for (const Use &U : I->operands()) {
+        const Value *OpVal = U.get();
+        if (!(dyn_cast<Argument>(OpVal) || dyn_cast<PHINode>(OpVal) ||
+              dyn_cast<Constant>(OpVal)))
+          return false;
+      }
+    }
+    return true;
+  };
+  if (IsFreeCostInst(SI->getTrueValue()) && IsFreeCostInst(SI->getFalseValue()))
+    return false;
+
   Value *CmpOp0 = Cmp->getOperand(0);
   Value *CmpOp1 = Cmp->getOperand(1);
 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D16836.46992.patch
Type: text/x-patch
Size: 2155 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160205/1832ea36/attachment.bin>