[PATCH][AArch64] Enable sign check optimization by CSE
Sergey Dmitrouk
sdmitrouk at accesssoftek.com
Tue Jul 29 02:51:22 PDT 2014
Hi Jiangning,
> I think it would be better if you can simply check sequential b.xx
> instructions like,
Thanks, updated that, I though it won't be changed as "lor.lhs.false" is
in the .ll file.
> I found your patch would trigger two failures for llvm regression test,
> test/CodeGen/AArch64/arm64-ccmp.ll
> test/CodeGen/AArch64/arm64-cse.ll
> Have you tried that?
Sorry, I did run all test before, but maybe failed to do this with correct
version of llc binary.
"test/CodeGen/AArch64/arm64-ccmp.ll" required a small update in
conditional check, which doesn't seem to affect anything.
"test/CodeGen/AArch64/arm64-cse.ll" explicitly checks for old behaviour and
assumes comparison with 1. Modified to perform two separate checks:
1. No CSE for that case when comparing with 1.
2. CSE for all other cases.
It seems to be fine as comparison with zero should be more frequent. I
guess we can't support both kinds of optimizations at the same time.
By the way, thanks for you patience.
Cheers,
Sergey
On Mon, Jul 28, 2014 at 09:32:40PM -0700, Jiangning Liu wrote:
> Hi Sergey,
> I found your patch would trigger two failures for llvm regression test,
> test/CodeGen/AArch64/arm64-ccmp.ll
> test/CodeGen/AArch64/arm64-cse.ll
> Have you tried that?
> Thanks,
> -Jiangning
>
> 2014-07-29 10:02 GMT+08:00 Jiangning Liu <liujiangning1 at gmail.com>:
>
> Hi Sergey,
> I just noticed your test case is checking the "comment" in assembly
> code. I think it would be better if you can simply check sequential b.xx
> instructions like,
> ; CHECK: cmp
> ; CHECK: b.lt
> ; CHECK-NOT: cmp
> ; CHECK: b.le
> This way the test can be more stable, because the internal symbol like
> %lor.lhs.false can be easily changed from time to time.
> Thanks,
> -Jiangning
>
> 2014-07-28 16:40 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
>
> Hi Jiangning,
>
> You're right, I just thought it might make sense to leave a chance of
> catching illegal immediate values after optimizations, which is
> actually
> not something to consider in this case with one and zero.
>
> Also, maybe
>
> A A C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C -
> 1);
>
> worth changing to
>
> A A C = 0ULL;
>
> It's left from my tries to generalize optimization, but it might be
> useful if someone will add additional condition.
>
> Thanks,
> Sergey
> On Sun, Jul 27, 2014 at 10:30:45PM -0700, Jiangning Liu wrote:
> > A A Hi Sergey,
> > A A Would it be more clear on logic if you move that piece of code
> to be
> > A A inside the else branch of "if (!isLegalArithImmed(C)) { ... }
> else {...}"?
> > A A I think it would be clear that (C==1) is legal, but we still
> want to
> > A A optimize it, and any more optimization cases falling into
> "legal" scenario
> > A A would be easily added in future.
> > A A I will be OK if you can simply make this change.
> > A A Thanks,
> > A A -Jiangning
> >
> > A A 2014-07-25 15:21 GMT+08:00 Sergey Dmitrouk
> <sdmitrouk at accesssoftek.com>:
> >
> > A A A Hi Jiangning,
> > A A A > Did you forget to attach your new patch?
> >
> > A A A Yes, sorry for that. A It's attached now.
> > A A A > Hopefully we can understand why this could happen, but
> maybe this is
> > A A A just
> > A A A > a heuristic result depending on the real control flow and
> workload.
> >
> > A A A As that code is common for all supported architectures,
> maybe it is
> > A A A important for only some of them and doesn't cause
> significant
> > A A A performance
> > A A A changes for others?
> >
> > A A A Best regards,
> > A A A Sergey
> > A A A On Thu, Jul 24, 2014 at 07:05:43PM -0700, Jiangning Liu
> wrote:
> > A A A > A A A Hi Sergey,
> > A A A > A A A Did you forget to attach your new patch?
> > A A A > A A A I tried the spec benchmarks by disabling that code
> of inverting
> > A A A the
> > A A A > A A A condition, and only see the following performance
> changes and no
> > A A A change
> > A A A > A A A for all others.
> > A A A > A A A 164.gzip (ref) A +1.76%
> > A A A > A A A 458.sjeng (train) + 2.19%
> > A A A > A A A 471.omnetpp (train) -1.43%
> > A A A > A A A 473.astar (train) -1.51%
> > A A A > A A A Hopefully we can understand why this could happen,
> but maybe this
> > A A A is just
> > A A A > A A A a heuristic result depending on the real control
> flow and
> > A A A workload.
> > A A A > A A A Thanks,
> > A A A > A A A -Jiangning
> > A A A >
> > A A A > A A A 2014-07-24 16:17 GMT+08:00 Sergey Dmitrouk
> > A A A <sdmitrouk at accesssoftek.com>:
> > A A A >
> > A A A > A A A A A Hello Jiangning,
> > A A A >
> > A A A > A A A A A Thanks for your comments.
> > A A A >
> > A A A > A A A A A > 1) I expect the fix should be insideA
> getAArch64Cmp.
> > A A A > A A A A A >...
> > A A A > A A A A A > but essentially getAArch64Cmp missed the case
> of (x < 1) ->
> > A A A (x <= 0).
> > A A A >
> > A A A > A A A A A It was initial placement of the fix, but the
> function doesn't
> > A A A seem to
> > A A A > A A A A A perform transformations like that. A It updates
> conditions
> > A A A only when
> > A A A > A A A A A immediate values are not legal. A There is no
> comment for the
> > A A A function,
> > A A A > A A A A A so I'm not sure whether such checks fit there,
> but I moved the
> > A A A change.
> > A A A > A A A A A > 2) Your comment is inconsistent with your
> code.
> > A A A >
> > A A A > A A A A A Thanks, it's probably because of inverted
> conditions in DAGs.
> > A A A > A A A A A > So now I'm wondering how to justify this is
> always
> > A A A meaningful for
> > A A A > A A A A A AArch64?
> > A A A >
> > A A A > A A A A A I wasn't sure whether it's worth such change,
> but as an option
> > A A A something
> > A A A > A A A A A like
> TargetLowering::isInversionBeneficial(SDValue Cond) can
> > A A A be added,
> > A A A > A A A A A but I don't know whether it's possible to check
> for conditions
> > A A A like
> > A A A > A A A A A "(a < 0 && b == c || a > 0 && b == d)" to do not
> block
> > A A A inversion for all
> > A A A > A A A A A cases.
> > A A A >
> > A A A > A A A A A Attached updated patch at least to see whether
> the fix fits
> > A A A well in
> > A A A > A A A A A getAArch64Cmp().
> > A A A >
> > A A A > A A A A A Regards,
> > A A A > A A A A A Sergey
> > A A A > A A A A A On Wed, Jul 23, 2014 at 10:05:37PM -0700,
> Jiangning Liu wrote:
> > A A A > A A A A A > A A A Hi Sergey,
> > A A A > A A A A A > A A A 1) I expect the fix should be insideA
> getAArch64Cmp.
> > A A A > A A A A A > A A A 2) Your comment is inconsistent with
> your code. Your
> > A A A code is to
> > A A A > A A A A A transform
> > A A A > A A A A A > A A A (x < 1) to be (x<=0), rather than "Turn
> "x > 1"
> > A A A condition into "x
> > A A A > A A A A A >= 0"".
> > A A A > A A A A A > A A A I also noticed we have the following
> transformation
> > A A A for if
> > A A A > A A A A A condition (x <
> > A A A > A A A A A > A A A 0) in back-end,
> > A A A > A A A A A > A A A stage 1: (x < 0) -> (x >= 0), i.e. (x<0)
> and invert
> > A A A the targets.
> > A A A > A A A A A > A A A stage 2: (x >= 0) -> (x > -1). This
> happens in
> > A A A combine1.
> > A A A > A A A A A > A A A stage 3: (x > -1) -> (x >= 0) in
> getAArch64Cmp.
> > A A A > A A A A A > A A A For if condition (x > 0), the
> transformation is
> > A A A similar. Your
> > A A A > A A A A A patch is
> > A A A > A A A A A > A A A trying to cover this case, but
> essentially
> > A A A getAArch64Cmp missed
> > A A A > A A A A A the case
> > A A A > A A A A A > A A A of (x < 1) -> (x <= 0).
> > A A A > A A A A A > A A A However, as you can see the root cause
> of generating
> > A A A the
> > A A A > A A A A A comparison with
> > A A A > A A A A A > A A A constant 1 is stage 1. This happens
> > A A A > A A A A A > A A A insideA
> SelectionDAGBuilder::visitSwitchCase
> > A A A > A A A A A > A A A A A // If the lhs block is the next
> block, invert the
> > A A A condition
> > A A A > A A A A A so that we
> > A A A > A A A A A > A A A can
> > A A A > A A A A A > A A A A A // fall through to the lhs instead
> of the rhs
> > A A A block.
> > A A A > A A A A A > A A A A A if (CB.TrueBB == NextBlock) {
> > A A A > A A A A A > A A A A A A A std::swap(CB.TrueBB,
> CB.FalseBB);
> > A A A > A A A A A > A A A A A A A SDValue True =
> DAG.getConstant(1,
> > A A A Cond.getValueType());
> > A A A > A A A A A > A A A A A A A Cond = DAG.getNode(ISD::XOR, dl,
> > A A A Cond.getValueType(),
> > A A A > A A A A A Cond, True);
> > A A A > A A A A A > A A A A A }
> > A A A > A A A A A > A A A So now I'm wondering how to justify this
> is always
> > A A A meaningful for
> > A A A > A A A A A AArch64?
> > A A A > A A A A A > A A A Thanks,
> > A A A > A A A A A > A A A -Jiangning
> > A A A > A A A A A >
> > A A A > A A A A A > A A A 2014-07-23 23:54 GMT+08:00 Sergey
> Dmitrouk
> > A A A > A A A A A <sdmitrouk at accesssoftek.com>:
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A Hi,
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A Basing on the following information
> from [this
> > A A A post][0] by
> > A A A > A A A A A James Molloy:
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A A A 2. "if (a < 0 && b == c || a > 0
> && b == d)" -
> > A A A the first
> > A A A > A A A A A comparison
> > A A A > A A A A A > A A A A A of
> > A A A > A A A A A > A A A A A A A 'a' against zero is done twice,
> when the flag
> > A A A results of
> > A A A > A A A A A the first
> > A A A > A A A A A > A A A A A A A comparison could be used for the
> second
> > A A A comparison.
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A I've made a patch (attached) that
> removes this
> > A A A extra
> > A A A > A A A A A comparison. A More
> > A A A > A A A A A > A A A A A complex cases like comparisons with
> non-zero
> > A A A immediate values
> > A A A > A A A A A or with
> > A A A > A A A A A > A A A A A registers doesn't seem to be task
> for a code
> > A A A generator. A
> > A A A > A A A A A Comparing with
> > A A A > A A A A A > A A A A A zero is quite common, so I seems to
> be worth
> > A A A adding.
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A Please review the patch. A Couldn't
> find a better
> > A A A place to
> > A A A > A A A A A make the
> > A A A > A A A A A > A A A A A change, but I'll be happy to adjust
> the patch if
> > A A A anyone has
> > A A A > A A A A A better
> > A A A > A A A A A > A A A A A ideas.
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A Best regards,
> > A A A > A A A A A > A A A A A Sergey
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A 0:
> > A A A > A A A A A
> http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
> > A A A > A A A A A >
> > A A A > A A A A A > A A A A A
> _______________________________________________
> > A A A > A A A A A > A A A A A llvm-commits mailing list
> > A A A > A A A A A > A A A A A llvm-commits at cs.uiuc.edu
> > A A A > A A A A A > A A A A A
> > A A A http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
[AArch64] Enable sign check optimization by CSE
Turn "x < 1" condition into "x <= 0" to make both "x < 0" and "x > 0"
checks generate the same code. This allows CSE pass to remove
duplicated compare instructions in this quite common branching case.
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2c677ab..83e7930 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1072,6 +1073,15 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
}
break;
}
+ } else {
+ // Transform "x < 1" to be "x <= 0" to make both "x < 0" and "x > 0" checks
+ // generate the same code. This allows CSE pass to remove duplicated
+ // compare instructions in this quite common branching case.
+ if (RHSC->isOne() && CC == ISD::SETLT) {
+ CC = ISD::SETLE;
+ C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
+ RHS = DAG.getConstant(C, RHS.getValueType());
+ }
}
}
diff --git a/test/CodeGen/AArch64/arm64-ccmp.ll b/test/CodeGen/AArch64/arm64-ccmp.ll
index 63965f9..6261fe8 100644
--- a/test/CodeGen/AArch64/arm64-ccmp.ll
+++ b/test/CodeGen/AArch64/arm64-ccmp.ll
@@ -131,7 +131,7 @@ if.end:
; CHECK: single_fcmp
; CHECK: cmp
; CHECK-NOT: b.
-; CHECK: fccmp {{.*}}, #8, ge
+; CHECK: fccmp {{.*}}, #8, gt
; CHECK: b.lt
define i32 @single_fcmp(i32 %a, float %b) nounwind ssp {
entry:
diff --git a/test/CodeGen/AArch64/arm64-cse.ll b/test/CodeGen/AArch64/arm64-cse.ll
index 5d62cfe..22c8a34 100644
--- a/test/CodeGen/AArch64/arm64-cse.ll
+++ b/test/CodeGen/AArch64/arm64-cse.ll
@@ -33,17 +33,40 @@ return:
ret i8* %retval.0
}
-; CSE between "icmp reg imm" and "sub reg imm".
+; CSE between "icmp reg imm" and "sub reg imm" with imm != 1.
define i8* @t2(i8* %base, i32* nocapture %offset) nounwind {
entry:
; CHECK-LABEL: t2:
; CHECK: subs
; CHECK-NOT: cmp
; CHECK-NOT: sub
-; CHECK: b.lt
+; CHECK: b.ge
; CHECK-NOT: sub
; CHECK: ret
%0 = load i32* %offset, align 4
+ %cmp = icmp slt i32 %0, 2
+ br i1 %cmp, label %return, label %if.end
+
+if.end:
+ %sub = sub nsw i32 %0, 2
+ store i32 %sub, i32* %offset, align 4
+ %add.ptr = getelementptr inbounds i8* %base, i32 %sub
+ br label %return
+
+return:
+ %retval.0 = phi i8* [ %add.ptr, %if.end ], [ null, %entry ]
+ ret i8* %retval.0
+}
+
+; no CSE between "icmp reg imm" and "sub reg imm" with imm == 1.
+define i8* @t3(i8* %base, i32* nocapture %offset) nounwind {
+entry:
+; CHECK-LABEL: t3:
+; CHECK: cmp
+; CHECK: b.le
+; CHECK: sub
+; CHECK: ret
+ %0 = load i32* %offset, align 4
%cmp = icmp slt i32 %0, 1
br i1 %cmp, label %return, label %if.end
diff --git a/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
new file mode 100644
index 0000000..3fdca0d
--- /dev/null
+++ b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
@@ -0,0 +1,42 @@
+; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; marked as external to prevent possible optimizations
+ at a = external global i32
+ at b = external global i32
+ at c = external global i32
+ at d = external global i32
+
+define void @combine-sign-comparisons-by-cse() {
+; CHECK-LABEL: combine-sign-comparisons-by-cse
+; CHECK: cmp
+; CHECK: b.lt
+; CHECK-NOT: cmp
+; CHECK: b.le
+entry:
+ %0 = load i32* @a, align 4
+ %cmp = icmp slt i32 %0, 0
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:
+ %1 = load i32* @b, align 4
+ %2 = load i32* @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:
+ %cmp2 = icmp sgt i32 %0, 0
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:
+ %3 = load i32* @b, align 4
+ %4 = load i32* @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end:
+ br label %return
+
+return:
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret void
+}
More information about the llvm-commits
mailing list