[PATCH][AArch64] Enable sign check optimization by CSE

Fri Jul 25 00:21:12 PDT 2014

Hi Jiangning,

> Did you forget to attach your new patch?

Yes, sorry for that.  It's attached now.

> Hopefully we can understand why this could happen, but maybe this is just
> a heuristic result depending on the real control flow and workload.

As that code is common for all supported architectures, maybe it is
important for only some of them and doesn't cause significant performance
changes for others?

Best regards,
Sergey

On Thu, Jul 24, 2014 at 07:05:43PM -0700, Jiangning Liu wrote:
>    Hi Sergey,
>    Did you forget to attach your new patch?
>    I tried the spec benchmarks by disabling that code of inverting the
>    condition, and only see the following performance changes and no change
>    for all others.
>    164.gzip (ref) A +1.76%
>    458.sjeng (train) + 2.19%
>    471.omnetpp (train) -1.43%
>    473.astar (train) -1.51%
>    Hopefully we can understand why this could happen, but maybe this is just
>    a heuristic result depending on the real control flow and workload.
>    Thanks,
>    -Jiangning
> 
>    2014-07-24 16:17 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
> 
>      Hello Jiangning,
> 
>      Thanks for your comments.
> 
>      > 1) I expect the fix should be insideA getAArch64Cmp.
>      >...
>      > but essentially getAArch64Cmp missed the case of (x < 1) -> (x <= 0).
> 
>      It was initial placement of the fix, but the function doesn't seem to
>      perform transformations like that. A It updates conditions only when
>      immediate values are not legal. A There is no comment for the function,
>      so I'm not sure whether such checks fit there, but I moved the change.
>      > 2) Your comment is inconsistent with your code.
> 
>      Thanks, it's probably because of inverted conditions in DAGs.
>      > So now I'm wondering how to justify this is always meaningful for
>      AArch64?
> 
>      I wasn't sure whether it's worth such change, but as an option something
>      like TargetLowering::isInversionBeneficial(SDValue Cond) can be added,
>      but I don't know whether it's possible to check for conditions like
>      "(a < 0 && b == c || a > 0 && b == d)" to do not block inversion for all
>      cases.
> 
>      Attached updated patch at least to see whether the fix fits well in
>      getAArch64Cmp().
> 
>      Regards,
>      Sergey
>      On Wed, Jul 23, 2014 at 10:05:37PM -0700, Jiangning Liu wrote:
>      > A  A Hi Sergey,
>      > A  A 1) I expect the fix should be insideA getAArch64Cmp.
>      > A  A 2) Your comment is inconsistent with your code. Your code is to
>      transform
>      > A  A (x < 1) to be (x<=0), rather than "Turn "x > 1" condition into "x
>      >= 0"".
>      > A  A I also noticed we have the following transformation for if
>      condition (x <
>      > A  A 0) in back-end,
>      > A  A stage 1: (x < 0) -> (x >= 0), i.e. (x<0) and invert the targets.
>      > A  A stage 2: (x >= 0) -> (x > -1). This happens in combine1.
>      > A  A stage 3: (x > -1) -> (x >= 0) in getAArch64Cmp.
>      > A  A For if condition (x > 0), the transformation is similar. Your
>      patch is
>      > A  A trying to cover this case, but essentially getAArch64Cmp missed
>      the case
>      > A  A of (x < 1) -> (x <= 0).
>      > A  A However, as you can see the root cause of generating the
>      comparison with
>      > A  A constant 1 is stage 1. This happens
>      > A  A insideA SelectionDAGBuilder::visitSwitchCase
>      > A  A A A // If the lhs block is the next block, invert the condition
>      so that we
>      > A  A can
>      > A  A A A // fall through to the lhs instead of the rhs block.
>      > A  A A A if (CB.TrueBB == NextBlock) {
>      > A  A A A A A std::swap(CB.TrueBB, CB.FalseBB);
>      > A  A A A A A SDValue True = DAG.getConstant(1, Cond.getValueType());
>      > A  A A A A A Cond = DAG.getNode(ISD::XOR, dl, Cond.getValueType(),
>      Cond, True);
>      > A  A A A }
>      > A  A So now I'm wondering how to justify this is always meaningful for
>      AArch64?
>      > A  A Thanks,
>      > A  A -Jiangning
>      >
>      > A  A 2014-07-23 23:54 GMT+08:00 Sergey Dmitrouk
>      <sdmitrouk at accesssoftek.com>:
>      >
>      > A  A  A Hi,
>      >
>      > A  A  A Basing on the following information from [this post][0] by
>      James Molloy:
>      >
>      > A  A  A A A 2. "if (a < 0 && b == c || a > 0 && b == d)" - the first
>      comparison
>      > A  A  A of
>      > A  A  A A A 'a' against zero is done twice, when the flag results of
>      the first
>      > A  A  A A A comparison could be used for the second comparison.
>      >
>      > A  A  A I've made a patch (attached) that removes this extra
>      comparison. A More
>      > A  A  A complex cases like comparisons with non-zero immediate values
>      or with
>      > A  A  A registers doesn't seem to be task for a code generator. A
>      Comparing with
>      > A  A  A zero is quite common, so I seems to be worth adding.
>      >
>      > A  A  A Please review the patch. A Couldn't find a better place to
>      make the
>      > A  A  A change, but I'll be happy to adjust the patch if anyone has
>      better
>      > A  A  A ideas.
>      >
>      > A  A  A Best regards,
>      > A  A  A Sergey
>      >
>      > A  A  A 0:
>      http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
>      >
>      > A  A  A _______________________________________________
>      > A  A  A llvm-commits mailing list
>      > A  A  A llvm-commits at cs.uiuc.edu
>      > A  A  A http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
[AArch64] Enable sign check optimization by CSE

Turn "x < 1" condition into "x <= 0" to make both "x < 0" and "x > 0"
checks generate the same code.  This allows CSE pass to remove
duplicated compare instructions in this quite common branching case.

diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 4921826..50a3ef6 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1023,6 +1023,16 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
   if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS.getNode())) {
     EVT VT = RHS.getValueType();
     uint64_t C = RHSC->getZExtValue();
+
+    // Transform "x < 1" to be "x <= 0" to make both "x < 0" and "x > 0" checks
+    // generate the same code.  This allows CSE pass to remove duplicated
+    // compare instructions in this quite common branching case.
+    if (RHSC->isOne() && CC == ISD::SETLT) {
+      CC = ISD::SETLE;
+      C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
+      RHS = DAG.getConstant(C, RHS.getValueType());
+    }
+
     if (!isLegalArithImmed(C)) {
       // Constant does not fit, try adjusting it by one?
       switch (CC) {
diff --git a/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
new file mode 100644
index 0000000..aa74be9
--- /dev/null
+++ b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
@@ -0,0 +1,39 @@
+; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; marked as external to prevent possible optimizations
+ at a = external global i32
+ at b = external global i32
+ at c = external global i32
+ at d = external global i32
+
+define void @combine-sign-comparisons-by-cse() {
+; CHECK-LABEL: %lor.lhs.false
+; CHECK-NEXT: b.l
+entry:
+  %0 = load i32* @a, align 4
+  %cmp = icmp slt i32 %0, 0
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:
+  %1 = load i32* @b, align 4
+  %2 = load i32* @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:
+  %cmp2 = icmp sgt i32 %0, 0
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:
+  %3 = load i32* @b, align 4
+  %4 = load i32* @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:
+  br label %return
+
+return:
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret void
+}