[PATCH][AArch64] Enable sign check optimization by CSE

Tue Jul 29 02:51:22 PDT 2014

Hi Jiangning,

> I think it would be better if you can simply check sequential b.xx
> instructions like,

Thanks, updated that, I though it won't be changed as "lor.lhs.false" is
in the .ll file.

> I found your patch would trigger two failures for llvm regression test,
> test/CodeGen/AArch64/arm64-ccmp.ll
> test/CodeGen/AArch64/arm64-cse.ll
> Have you tried that?

Sorry, I did run all test before, but maybe failed to do this with correct
version of llc binary.

"test/CodeGen/AArch64/arm64-ccmp.ll" required a small update in
conditional check, which doesn't seem to affect anything.

"test/CodeGen/AArch64/arm64-cse.ll" explicitly checks for old behaviour and
assumes comparison with 1.  Modified to perform two separate checks:
 1. No CSE for that case when comparing with 1.
 2. CSE for all other cases.
It seems to be fine as comparison with zero should be more frequent.  I
guess we can't support both kinds of optimizations at the same time.

By the way, thanks for you patience.

Cheers,
Sergey

On Mon, Jul 28, 2014 at 09:32:40PM -0700, Jiangning Liu wrote:
>    Hi Sergey,
>    I found your patch would trigger two failures for llvm regression test,
>    test/CodeGen/AArch64/arm64-ccmp.ll
>    test/CodeGen/AArch64/arm64-cse.ll
>    Have you tried that?
>    Thanks,
>    -Jiangning
> 
>    2014-07-29 10:02 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
> 
>      Hi Sergey,
>      I just noticed your test case is checking the "comment" in assembly
>      code. I think it would be better if you can simply check sequential b.xx
>      instructions like,
>      ; CHECK: cmp
>      ; CHECK: b.lt
>      ; CHECK-NOT: cmp
>      ; CHECK: b.le
>      This way the test can be more stable, because the internal symbol like
>      %lor.lhs.false can be easily changed from time to time.
>      Thanks,
>      -Jiangning
> 
>      2014-07-28 16:40 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
> 
>        Hi Jiangning,
> 
>        You're right, I just thought it might make sense to leave a chance of
>        catching illegal immediate values after optimizations, which is
>        actually
>        not something to consider in this case with one and zero.
> 
>        Also, maybe
> 
>        A  A  C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C -
>        1);
> 
>        worth changing to
> 
>        A  A  C = 0ULL;
> 
>        It's left from my tries to generalize optimization, but it might be
>        useful if someone will add additional condition.
> 
>        Thanks,
>        Sergey
>        On Sun, Jul 27, 2014 at 10:30:45PM -0700, Jiangning Liu wrote:
>        > A  A Hi Sergey,
>        > A  A Would it be more clear on logic if you move that piece of code
>        to be
>        > A  A inside the else branch of "if (!isLegalArithImmed(C)) { ... }
>        else {...}"?
>        > A  A I think it would be clear that (C==1) is legal, but we still
>        want to
>        > A  A optimize it, and any more optimization cases falling into
>        "legal" scenario
>        > A  A would be easily added in future.
>        > A  A I will be OK if you can simply make this change.
>        > A  A Thanks,
>        > A  A -Jiangning
>        >
>        > A  A 2014-07-25 15:21 GMT+08:00 Sergey Dmitrouk
>        <sdmitrouk at accesssoftek.com>:
>        >
>        > A  A  A Hi Jiangning,
>        > A  A  A > Did you forget to attach your new patch?
>        >
>        > A  A  A Yes, sorry for that. A It's attached now.
>        > A  A  A > Hopefully we can understand why this could happen, but
>        maybe this is
>        > A  A  A just
>        > A  A  A > a heuristic result depending on the real control flow and
>        workload.
>        >
>        > A  A  A As that code is common for all supported architectures,
>        maybe it is
>        > A  A  A important for only some of them and doesn't cause
>        significant
>        > A  A  A performance
>        > A  A  A changes for others?
>        >
>        > A  A  A Best regards,
>        > A  A  A Sergey
>        > A  A  A On Thu, Jul 24, 2014 at 07:05:43PM -0700, Jiangning Liu
>        wrote:
>        > A  A  A > A A A Hi Sergey,
>        > A  A  A > A A A Did you forget to attach your new patch?
>        > A  A  A > A A A I tried the spec benchmarks by disabling that code
>        of inverting
>        > A  A  A the
>        > A  A  A > A A A condition, and only see the following performance
>        changes and no
>        > A  A  A change
>        > A  A  A > A A A for all others.
>        > A  A  A > A A A 164.gzip (ref) A +1.76%
>        > A  A  A > A A A 458.sjeng (train) + 2.19%
>        > A  A  A > A A A 471.omnetpp (train) -1.43%
>        > A  A  A > A A A 473.astar (train) -1.51%
>        > A  A  A > A A A Hopefully we can understand why this could happen,
>        but maybe this
>        > A  A  A is just
>        > A  A  A > A A A a heuristic result depending on the real control
>        flow and
>        > A  A  A workload.
>        > A  A  A > A A A Thanks,
>        > A  A  A > A A A -Jiangning
>        > A  A  A >
>        > A  A  A > A A A 2014-07-24 16:17 GMT+08:00 Sergey Dmitrouk
>        > A  A  A <sdmitrouk at accesssoftek.com>:
>        > A  A  A >
>        > A  A  A > A A A A A Hello Jiangning,
>        > A  A  A >
>        > A  A  A > A A A A A Thanks for your comments.
>        > A  A  A >
>        > A  A  A > A A A A A > 1) I expect the fix should be insideA
>        getAArch64Cmp.
>        > A  A  A > A A A A A >...
>        > A  A  A > A A A A A > but essentially getAArch64Cmp missed the case
>        of (x < 1) ->
>        > A  A  A (x <= 0).
>        > A  A  A >
>        > A  A  A > A A A A A It was initial placement of the fix, but the
>        function doesn't
>        > A  A  A seem to
>        > A  A  A > A A A A A perform transformations like that. A It updates
>        conditions
>        > A  A  A only when
>        > A  A  A > A A A A A immediate values are not legal. A There is no
>        comment for the
>        > A  A  A function,
>        > A  A  A > A A A A A so I'm not sure whether such checks fit there,
>        but I moved the
>        > A  A  A change.
>        > A  A  A > A A A A A > 2) Your comment is inconsistent with your
>        code.
>        > A  A  A >
>        > A  A  A > A A A A A Thanks, it's probably because of inverted
>        conditions in DAGs.
>        > A  A  A > A A A A A > So now I'm wondering how to justify this is
>        always
>        > A  A  A meaningful for
>        > A  A  A > A A A A A AArch64?
>        > A  A  A >
>        > A  A  A > A A A A A I wasn't sure whether it's worth such change,
>        but as an option
>        > A  A  A something
>        > A  A  A > A A A A A like
>        TargetLowering::isInversionBeneficial(SDValue Cond) can
>        > A  A  A be added,
>        > A  A  A > A A A A A but I don't know whether it's possible to check
>        for conditions
>        > A  A  A like
>        > A  A  A > A A A A A "(a < 0 && b == c || a > 0 && b == d)" to do not
>        block
>        > A  A  A inversion for all
>        > A  A  A > A A A A A cases.
>        > A  A  A >
>        > A  A  A > A A A A A Attached updated patch at least to see whether
>        the fix fits
>        > A  A  A well in
>        > A  A  A > A A A A A getAArch64Cmp().
>        > A  A  A >
>        > A  A  A > A A A A A Regards,
>        > A  A  A > A A A A A Sergey
>        > A  A  A > A A A A A On Wed, Jul 23, 2014 at 10:05:37PM -0700,
>        Jiangning Liu wrote:
>        > A  A  A > A A A A A > A A A Hi Sergey,
>        > A  A  A > A A A A A > A A A 1) I expect the fix should be insideA
>        getAArch64Cmp.
>        > A  A  A > A A A A A > A A A 2) Your comment is inconsistent with
>        your code. Your
>        > A  A  A code is to
>        > A  A  A > A A A A A transform
>        > A  A  A > A A A A A > A A A (x < 1) to be (x<=0), rather than "Turn
>        "x > 1"
>        > A  A  A condition into "x
>        > A  A  A > A A A A A >= 0"".
>        > A  A  A > A A A A A > A A A I also noticed we have the following
>        transformation
>        > A  A  A for if
>        > A  A  A > A A A A A condition (x <
>        > A  A  A > A A A A A > A A A 0) in back-end,
>        > A  A  A > A A A A A > A A A stage 1: (x < 0) -> (x >= 0), i.e. (x<0)
>        and invert
>        > A  A  A the targets.
>        > A  A  A > A A A A A > A A A stage 2: (x >= 0) -> (x > -1). This
>        happens in
>        > A  A  A combine1.
>        > A  A  A > A A A A A > A A A stage 3: (x > -1) -> (x >= 0) in
>        getAArch64Cmp.
>        > A  A  A > A A A A A > A A A For if condition (x > 0), the
>        transformation is
>        > A  A  A similar. Your
>        > A  A  A > A A A A A patch is
>        > A  A  A > A A A A A > A A A trying to cover this case, but
>        essentially
>        > A  A  A getAArch64Cmp missed
>        > A  A  A > A A A A A the case
>        > A  A  A > A A A A A > A A A of (x < 1) -> (x <= 0).
>        > A  A  A > A A A A A > A A A However, as you can see the root cause
>        of generating
>        > A  A  A the
>        > A  A  A > A A A A A comparison with
>        > A  A  A > A A A A A > A A A constant 1 is stage 1. This happens
>        > A  A  A > A A A A A > A A A insideA
>        SelectionDAGBuilder::visitSwitchCase
>        > A  A  A > A A A A A > A A A A A // If the lhs block is the next
>        block, invert the
>        > A  A  A condition
>        > A  A  A > A A A A A so that we
>        > A  A  A > A A A A A > A A A can
>        > A  A  A > A A A A A > A A A A A // fall through to the lhs instead
>        of the rhs
>        > A  A  A block.
>        > A  A  A > A A A A A > A A A A A if (CB.TrueBB == NextBlock) {
>        > A  A  A > A A A A A > A A A A A A A std::swap(CB.TrueBB,
>        CB.FalseBB);
>        > A  A  A > A A A A A > A A A A A A A SDValue True =
>        DAG.getConstant(1,
>        > A  A  A Cond.getValueType());
>        > A  A  A > A A A A A > A A A A A A A Cond = DAG.getNode(ISD::XOR, dl,
>        > A  A  A Cond.getValueType(),
>        > A  A  A > A A A A A Cond, True);
>        > A  A  A > A A A A A > A A A A A }
>        > A  A  A > A A A A A > A A A So now I'm wondering how to justify this
>        is always
>        > A  A  A meaningful for
>        > A  A  A > A A A A A AArch64?
>        > A  A  A > A A A A A > A A A Thanks,
>        > A  A  A > A A A A A > A A A -Jiangning
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A 2014-07-23 23:54 GMT+08:00 Sergey
>        Dmitrouk
>        > A  A  A > A A A A A <sdmitrouk at accesssoftek.com>:
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A Hi,
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A Basing on the following information
>        from [this
>        > A  A  A post][0] by
>        > A  A  A > A A A A A James Molloy:
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A A A 2. "if (a < 0 && b == c || a > 0
>        && b == d)" -
>        > A  A  A the first
>        > A  A  A > A A A A A comparison
>        > A  A  A > A A A A A > A A A A A of
>        > A  A  A > A A A A A > A A A A A A A 'a' against zero is done twice,
>        when the flag
>        > A  A  A results of
>        > A  A  A > A A A A A the first
>        > A  A  A > A A A A A > A A A A A A A comparison could be used for the
>        second
>        > A  A  A comparison.
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A I've made a patch (attached) that
>        removes this
>        > A  A  A extra
>        > A  A  A > A A A A A comparison. A More
>        > A  A  A > A A A A A > A A A A A complex cases like comparisons with
>        non-zero
>        > A  A  A immediate values
>        > A  A  A > A A A A A or with
>        > A  A  A > A A A A A > A A A A A registers doesn't seem to be task
>        for a code
>        > A  A  A generator. A
>        > A  A  A > A A A A A Comparing with
>        > A  A  A > A A A A A > A A A A A zero is quite common, so I seems to
>        be worth
>        > A  A  A adding.
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A Please review the patch. A Couldn't
>        find a better
>        > A  A  A place to
>        > A  A  A > A A A A A make the
>        > A  A  A > A A A A A > A A A A A change, but I'll be happy to adjust
>        the patch if
>        > A  A  A anyone has
>        > A  A  A > A A A A A better
>        > A  A  A > A A A A A > A A A A A ideas.
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A Best regards,
>        > A  A  A > A A A A A > A A A A A Sergey
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A 0:
>        > A  A  A > A A A A A
>        http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
>        > A  A  A > A A A A A >
>        > A  A  A > A A A A A > A A A A A
>        _______________________________________________
>        > A  A  A > A A A A A > A A A A A llvm-commits mailing list
>        > A  A  A > A A A A A > A A A A A llvm-commits at cs.uiuc.edu
>        > A  A  A > A A A A A > A A A A A
>        > A  A  A http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
[AArch64] Enable sign check optimization by CSE

Turn "x < 1" condition into "x <= 0" to make both "x < 0" and "x > 0"
checks generate the same code.  This allows CSE pass to remove
duplicated compare instructions in this quite common branching case.

diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2c677ab..83e7930 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1072,6 +1073,15 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
         }
         break;
       }
+    } else {
+      // Transform "x < 1" to be "x <= 0" to make both "x < 0" and "x > 0" checks
+      // generate the same code.  This allows CSE pass to remove duplicated
+      // compare instructions in this quite common branching case.
+      if (RHSC->isOne() && CC == ISD::SETLT) {
+        CC = ISD::SETLE;
+        C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
+        RHS = DAG.getConstant(C, RHS.getValueType());
+      }
     }
   }
 
diff --git a/test/CodeGen/AArch64/arm64-ccmp.ll b/test/CodeGen/AArch64/arm64-ccmp.ll
index 63965f9..6261fe8 100644
--- a/test/CodeGen/AArch64/arm64-ccmp.ll
+++ b/test/CodeGen/AArch64/arm64-ccmp.ll
@@ -131,7 +131,7 @@ if.end:
 ; CHECK: single_fcmp
 ; CHECK: cmp
 ; CHECK-NOT: b.
-; CHECK: fccmp {{.*}}, #8, ge
+; CHECK: fccmp {{.*}}, #8, gt
 ; CHECK: b.lt
 define i32 @single_fcmp(i32 %a, float %b) nounwind ssp {
 entry:
diff --git a/test/CodeGen/AArch64/arm64-cse.ll b/test/CodeGen/AArch64/arm64-cse.ll
index 5d62cfe..22c8a34 100644
--- a/test/CodeGen/AArch64/arm64-cse.ll
+++ b/test/CodeGen/AArch64/arm64-cse.ll
@@ -33,17 +33,40 @@ return:
  ret i8* %retval.0
 }
 
-; CSE between "icmp reg imm" and "sub reg imm".
+; CSE between "icmp reg imm" and "sub reg imm" with imm != 1.
 define i8* @t2(i8* %base, i32* nocapture %offset) nounwind {
 entry:
 ; CHECK-LABEL: t2:
 ; CHECK: subs
 ; CHECK-NOT: cmp
 ; CHECK-NOT: sub
-; CHECK: b.lt
+; CHECK: b.ge
 ; CHECK-NOT: sub
 ; CHECK: ret
  %0 = load i32* %offset, align 4
+ %cmp = icmp slt i32 %0, 2
+ br i1 %cmp, label %return, label %if.end
+
+if.end:
+ %sub = sub nsw i32 %0, 2
+ store i32 %sub, i32* %offset, align 4
+ %add.ptr = getelementptr inbounds i8* %base, i32 %sub
+ br label %return
+
+return:
+ %retval.0 = phi i8* [ %add.ptr, %if.end ], [ null, %entry ]
+ ret i8* %retval.0
+}
+
+; no CSE between "icmp reg imm" and "sub reg imm" with imm == 1.
+define i8* @t3(i8* %base, i32* nocapture %offset) nounwind {
+entry:
+; CHECK-LABEL: t3:
+; CHECK: cmp
+; CHECK: b.le
+; CHECK: sub
+; CHECK: ret
+ %0 = load i32* %offset, align 4
  %cmp = icmp slt i32 %0, 1
  br i1 %cmp, label %return, label %if.end
 
diff --git a/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
new file mode 100644
index 0000000..3fdca0d
--- /dev/null
+++ b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
@@ -0,0 +1,42 @@
+; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; marked as external to prevent possible optimizations
+ at a = external global i32
+ at b = external global i32
+ at c = external global i32
+ at d = external global i32
+
+define void @combine-sign-comparisons-by-cse() {
+; CHECK-LABEL: combine-sign-comparisons-by-cse
+; CHECK: cmp
+; CHECK: b.lt
+; CHECK-NOT: cmp
+; CHECK: b.le
+entry:
+  %0 = load i32* @a, align 4
+  %cmp = icmp slt i32 %0, 0
+  br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:
+  %1 = load i32* @b, align 4
+  %2 = load i32* @c, align 4
+  %cmp1 = icmp eq i32 %1, %2
+  br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:
+  %cmp2 = icmp sgt i32 %0, 0
+  br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:
+  %3 = load i32* @b, align 4
+  %4 = load i32* @d, align 4
+  %cmp4 = icmp eq i32 %3, %4
+  br i1 %cmp4, label %return, label %if.end
+
+if.end:
+  br label %return
+
+return:
+  %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+  ret void
+}