[PATCH][AArch64] Enable sign check optimization by CSE

Mon Jul 28 01:40:09 PDT 2014

Hi Jiangning,

You're right, I just thought it might make sense to leave a chance of
catching illegal immediate values after optimizations, which is actually
not something to consider in this case with one and zero.

Also, maybe

    C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);

worth changing to

    C = 0ULL;

It's left from my tries to generalize optimization, but it might be
useful if someone will add additional condition.

Thanks,
Sergey

On Sun, Jul 27, 2014 at 10:30:45PM -0700, Jiangning Liu wrote:
>    Hi Sergey,
>    Would it be more clear on logic if you move that piece of code to be
>    inside the else branch of "if (!isLegalArithImmed(C)) { ... } else {...}"?
>    I think it would be clear that (C==1) is legal, but we still want to
>    optimize it, and any more optimization cases falling into "legal" scenario
>    would be easily added in future.
>    I will be OK if you can simply make this change.
>    Thanks,
>    -Jiangning
> 
>    2014-07-25 15:21 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
> 
>      Hi Jiangning,
>      > Did you forget to attach your new patch?
> 
>      Yes, sorry for that. A It's attached now.
>      > Hopefully we can understand why this could happen, but maybe this is
>      just
>      > a heuristic result depending on the real control flow and workload.
> 
>      As that code is common for all supported architectures, maybe it is
>      important for only some of them and doesn't cause significant
>      performance
>      changes for others?
> 
>      Best regards,
>      Sergey
>      On Thu, Jul 24, 2014 at 07:05:43PM -0700, Jiangning Liu wrote:
>      > A  A Hi Sergey,
>      > A  A Did you forget to attach your new patch?
>      > A  A I tried the spec benchmarks by disabling that code of inverting
>      the
>      > A  A condition, and only see the following performance changes and no
>      change
>      > A  A for all others.
>      > A  A 164.gzip (ref) A +1.76%
>      > A  A 458.sjeng (train) + 2.19%
>      > A  A 471.omnetpp (train) -1.43%
>      > A  A 473.astar (train) -1.51%
>      > A  A Hopefully we can understand why this could happen, but maybe this
>      is just
>      > A  A a heuristic result depending on the real control flow and
>      workload.
>      > A  A Thanks,
>      > A  A -Jiangning
>      >
>      > A  A 2014-07-24 16:17 GMT+08:00 Sergey Dmitrouk
>      <sdmitrouk at accesssoftek.com>:
>      >
>      > A  A  A Hello Jiangning,
>      >
>      > A  A  A Thanks for your comments.
>      >
>      > A  A  A > 1) I expect the fix should be insideA getAArch64Cmp.
>      > A  A  A >...
>      > A  A  A > but essentially getAArch64Cmp missed the case of (x < 1) ->
>      (x <= 0).
>      >
>      > A  A  A It was initial placement of the fix, but the function doesn't
>      seem to
>      > A  A  A perform transformations like that. A It updates conditions
>      only when
>      > A  A  A immediate values are not legal. A There is no comment for the
>      function,
>      > A  A  A so I'm not sure whether such checks fit there, but I moved the
>      change.
>      > A  A  A > 2) Your comment is inconsistent with your code.
>      >
>      > A  A  A Thanks, it's probably because of inverted conditions in DAGs.
>      > A  A  A > So now I'm wondering how to justify this is always
>      meaningful for
>      > A  A  A AArch64?
>      >
>      > A  A  A I wasn't sure whether it's worth such change, but as an option
>      something
>      > A  A  A like TargetLowering::isInversionBeneficial(SDValue Cond) can
>      be added,
>      > A  A  A but I don't know whether it's possible to check for conditions
>      like
>      > A  A  A "(a < 0 && b == c || a > 0 && b == d)" to do not block
>      inversion for all
>      > A  A  A cases.
>      >
>      > A  A  A Attached updated patch at least to see whether the fix fits
>      well in
>      > A  A  A getAArch64Cmp().
>      >
>      > A  A  A Regards,
>      > A  A  A Sergey
>      > A  A  A On Wed, Jul 23, 2014 at 10:05:37PM -0700, Jiangning Liu wrote:
>      > A  A  A > A A A Hi Sergey,
>      > A  A  A > A A A 1) I expect the fix should be insideA getAArch64Cmp.
>      > A  A  A > A A A 2) Your comment is inconsistent with your code. Your
>      code is to
>      > A  A  A transform
>      > A  A  A > A A A (x < 1) to be (x<=0), rather than "Turn "x > 1"
>      condition into "x
>      > A  A  A >= 0"".
>      > A  A  A > A A A I also noticed we have the following transformation
>      for if
>      > A  A  A condition (x <
>      > A  A  A > A A A 0) in back-end,
>      > A  A  A > A A A stage 1: (x < 0) -> (x >= 0), i.e. (x<0) and invert
>      the targets.
>      > A  A  A > A A A stage 2: (x >= 0) -> (x > -1). This happens in
>      combine1.
>      > A  A  A > A A A stage 3: (x > -1) -> (x >= 0) in getAArch64Cmp.
>      > A  A  A > A A A For if condition (x > 0), the transformation is
>      similar. Your
>      > A  A  A patch is
>      > A  A  A > A A A trying to cover this case, but essentially
>      getAArch64Cmp missed
>      > A  A  A the case
>      > A  A  A > A A A of (x < 1) -> (x <= 0).
>      > A  A  A > A A A However, as you can see the root cause of generating
>      the
>      > A  A  A comparison with
>      > A  A  A > A A A constant 1 is stage 1. This happens
>      > A  A  A > A A A insideA SelectionDAGBuilder::visitSwitchCase
>      > A  A  A > A A A A A // If the lhs block is the next block, invert the
>      condition
>      > A  A  A so that we
>      > A  A  A > A A A can
>      > A  A  A > A A A A A // fall through to the lhs instead of the rhs
>      block.
>      > A  A  A > A A A A A if (CB.TrueBB == NextBlock) {
>      > A  A  A > A A A A A A A std::swap(CB.TrueBB, CB.FalseBB);
>      > A  A  A > A A A A A A A SDValue True = DAG.getConstant(1,
>      Cond.getValueType());
>      > A  A  A > A A A A A A A Cond = DAG.getNode(ISD::XOR, dl,
>      Cond.getValueType(),
>      > A  A  A Cond, True);
>      > A  A  A > A A A A A }
>      > A  A  A > A A A So now I'm wondering how to justify this is always
>      meaningful for
>      > A  A  A AArch64?
>      > A  A  A > A A A Thanks,
>      > A  A  A > A A A -Jiangning
>      > A  A  A >
>      > A  A  A > A A A 2014-07-23 23:54 GMT+08:00 Sergey Dmitrouk
>      > A  A  A <sdmitrouk at accesssoftek.com>:
>      > A  A  A >
>      > A  A  A > A A A A A Hi,
>      > A  A  A >
>      > A  A  A > A A A A A Basing on the following information from [this
>      post][0] by
>      > A  A  A James Molloy:
>      > A  A  A >
>      > A  A  A > A A A A A A A 2. "if (a < 0 && b == c || a > 0 && b == d)" -
>      the first
>      > A  A  A comparison
>      > A  A  A > A A A A A of
>      > A  A  A > A A A A A A A 'a' against zero is done twice, when the flag
>      results of
>      > A  A  A the first
>      > A  A  A > A A A A A A A comparison could be used for the second
>      comparison.
>      > A  A  A >
>      > A  A  A > A A A A A I've made a patch (attached) that removes this
>      extra
>      > A  A  A comparison. A More
>      > A  A  A > A A A A A complex cases like comparisons with non-zero
>      immediate values
>      > A  A  A or with
>      > A  A  A > A A A A A registers doesn't seem to be task for a code
>      generator. A
>      > A  A  A Comparing with
>      > A  A  A > A A A A A zero is quite common, so I seems to be worth
>      adding.
>      > A  A  A >
>      > A  A  A > A A A A A Please review the patch. A Couldn't find a better
>      place to
>      > A  A  A make the
>      > A  A  A > A A A A A change, but I'll be happy to adjust the patch if
>      anyone has
>      > A  A  A better
>      > A  A  A > A A A A A ideas.
>      > A  A  A >
>      > A  A  A > A A A A A Best regards,
>      > A  A  A > A A A A A Sergey
>      > A  A  A >
>      > A  A  A > A A A A A 0:
>      > A  A  A http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
>      > A  A  A >
>      > A  A  A > A A A A A _______________________________________________
>      > A  A  A > A A A A A llvm-commits mailing list
>      > A  A  A > A A A A A llvm-commits at cs.uiuc.edu
>      > A  A  A > A A A A A
>      http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
[AArch64] Enable sign check optimization by CSE

Turn "x < 1" condition into "x <= 0" to make both "x < 0" and "x > 0"
checks generate the same code.  This allows CSE pass to remove
duplicated compare instructions in this quite common branching case.

diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2c677ab..83e7930 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1023,6 +1023,7 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
   if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS.getNode())) {
     EVT VT = RHS.getValueType();
     uint64_t C = RHSC->getZExtValue();
+
     if (!isLegalArithImmed(C)) {
       // Constant does not fit, try adjusting it by one?
       switch (CC) {
@@ -1072,6 +1073,15 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
         }
         break;
       }
+    } else {
+      // Transform "x < 1" to be "x <= 0" to make both "x < 0" and "x > 0" checks
+      // generate the same code.  This allows CSE pass to remove duplicated
+      // compare instructions in this quite common branching case.
+      if (RHSC->isOne() && CC == ISD::SETLT) {
+        CC = ISD::SETLE;
+        C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
+        RHS = DAG.getConstant(C, RHS.getValueType());
+      }
     }
   }
 
diff --git a/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
new file mode 100644
index 0000000..aa74be9
--- /dev/null
+++ b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
@@ -0,0 +1,39 @@
+; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; marked as external to prevent possible optimizations
+ at a = external global i32
+ at b = external global i32
+ at c = external global i32
+ at d = external global i32
+
+define void @combine-sign-comparisons-by-cse() {
+; CHECK-LABEL: %lor.lhs.false
+; CHECK-NEXT: b.l
+entry:
+  %0 = load i32* @a, align 4
+  %cmp = icmp slt i32 %0, 0
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:
+  %1 = load i32* @b, align 4
+  %2 = load i32* @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:
+  %cmp2 = icmp sgt i32 %0, 0
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:
+  %3 = load i32* @b, align 4
+  %4 = load i32* @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:
+  br label %return
+
+return:
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret void
+}