[PATCH][AArch64] Enable sign check optimization by CSE
Sergey Dmitrouk
sdmitrouk at accesssoftek.com
Mon Jul 28 01:40:09 PDT 2014
Hi Jiangning,
You're right, I just thought it might make sense to leave a chance of
catching illegal immediate values after optimizations, which is actually
not something to consider in this case with one and zero.
Also, maybe
C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
worth changing to
C = 0ULL;
It's left from my tries to generalize optimization, but it might be
useful if someone will add additional condition.
Thanks,
Sergey
On Sun, Jul 27, 2014 at 10:30:45PM -0700, Jiangning Liu wrote:
> Hi Sergey,
> Would it be more clear on logic if you move that piece of code to be
> inside the else branch of "if (!isLegalArithImmed(C)) { ... } else {...}"?
> I think it would be clear that (C==1) is legal, but we still want to
> optimize it, and any more optimization cases falling into "legal" scenario
> would be easily added in future.
> I will be OK if you can simply make this change.
> Thanks,
> -Jiangning
>
> 2014-07-25 15:21 GMT+08:00 Sergey Dmitrouk <sdmitrouk at accesssoftek.com>:
>
> Hi Jiangning,
> > Did you forget to attach your new patch?
>
> Yes, sorry for that. A It's attached now.
> > Hopefully we can understand why this could happen, but maybe this is
> just
> > a heuristic result depending on the real control flow and workload.
>
> As that code is common for all supported architectures, maybe it is
> important for only some of them and doesn't cause significant
> performance
> changes for others?
>
> Best regards,
> Sergey
> On Thu, Jul 24, 2014 at 07:05:43PM -0700, Jiangning Liu wrote:
> > A A Hi Sergey,
> > A A Did you forget to attach your new patch?
> > A A I tried the spec benchmarks by disabling that code of inverting
> the
> > A A condition, and only see the following performance changes and no
> change
> > A A for all others.
> > A A 164.gzip (ref) A +1.76%
> > A A 458.sjeng (train) + 2.19%
> > A A 471.omnetpp (train) -1.43%
> > A A 473.astar (train) -1.51%
> > A A Hopefully we can understand why this could happen, but maybe this
> is just
> > A A a heuristic result depending on the real control flow and
> workload.
> > A A Thanks,
> > A A -Jiangning
> >
> > A A 2014-07-24 16:17 GMT+08:00 Sergey Dmitrouk
> <sdmitrouk at accesssoftek.com>:
> >
> > A A A Hello Jiangning,
> >
> > A A A Thanks for your comments.
> >
> > A A A > 1) I expect the fix should be insideA getAArch64Cmp.
> > A A A >...
> > A A A > but essentially getAArch64Cmp missed the case of (x < 1) ->
> (x <= 0).
> >
> > A A A It was initial placement of the fix, but the function doesn't
> seem to
> > A A A perform transformations like that. A It updates conditions
> only when
> > A A A immediate values are not legal. A There is no comment for the
> function,
> > A A A so I'm not sure whether such checks fit there, but I moved the
> change.
> > A A A > 2) Your comment is inconsistent with your code.
> >
> > A A A Thanks, it's probably because of inverted conditions in DAGs.
> > A A A > So now I'm wondering how to justify this is always
> meaningful for
> > A A A AArch64?
> >
> > A A A I wasn't sure whether it's worth such change, but as an option
> something
> > A A A like TargetLowering::isInversionBeneficial(SDValue Cond) can
> be added,
> > A A A but I don't know whether it's possible to check for conditions
> like
> > A A A "(a < 0 && b == c || a > 0 && b == d)" to do not block
> inversion for all
> > A A A cases.
> >
> > A A A Attached updated patch at least to see whether the fix fits
> well in
> > A A A getAArch64Cmp().
> >
> > A A A Regards,
> > A A A Sergey
> > A A A On Wed, Jul 23, 2014 at 10:05:37PM -0700, Jiangning Liu wrote:
> > A A A > A A A Hi Sergey,
> > A A A > A A A 1) I expect the fix should be insideA getAArch64Cmp.
> > A A A > A A A 2) Your comment is inconsistent with your code. Your
> code is to
> > A A A transform
> > A A A > A A A (x < 1) to be (x<=0), rather than "Turn "x > 1"
> condition into "x
> > A A A >= 0"".
> > A A A > A A A I also noticed we have the following transformation
> for if
> > A A A condition (x <
> > A A A > A A A 0) in back-end,
> > A A A > A A A stage 1: (x < 0) -> (x >= 0), i.e. (x<0) and invert
> the targets.
> > A A A > A A A stage 2: (x >= 0) -> (x > -1). This happens in
> combine1.
> > A A A > A A A stage 3: (x > -1) -> (x >= 0) in getAArch64Cmp.
> > A A A > A A A For if condition (x > 0), the transformation is
> similar. Your
> > A A A patch is
> > A A A > A A A trying to cover this case, but essentially
> getAArch64Cmp missed
> > A A A the case
> > A A A > A A A of (x < 1) -> (x <= 0).
> > A A A > A A A However, as you can see the root cause of generating
> the
> > A A A comparison with
> > A A A > A A A constant 1 is stage 1. This happens
> > A A A > A A A insideA SelectionDAGBuilder::visitSwitchCase
> > A A A > A A A A A // If the lhs block is the next block, invert the
> condition
> > A A A so that we
> > A A A > A A A can
> > A A A > A A A A A // fall through to the lhs instead of the rhs
> block.
> > A A A > A A A A A if (CB.TrueBB == NextBlock) {
> > A A A > A A A A A A A std::swap(CB.TrueBB, CB.FalseBB);
> > A A A > A A A A A A A SDValue True = DAG.getConstant(1,
> Cond.getValueType());
> > A A A > A A A A A A A Cond = DAG.getNode(ISD::XOR, dl,
> Cond.getValueType(),
> > A A A Cond, True);
> > A A A > A A A A A }
> > A A A > A A A So now I'm wondering how to justify this is always
> meaningful for
> > A A A AArch64?
> > A A A > A A A Thanks,
> > A A A > A A A -Jiangning
> > A A A >
> > A A A > A A A 2014-07-23 23:54 GMT+08:00 Sergey Dmitrouk
> > A A A <sdmitrouk at accesssoftek.com>:
> > A A A >
> > A A A > A A A A A Hi,
> > A A A >
> > A A A > A A A A A Basing on the following information from [this
> post][0] by
> > A A A James Molloy:
> > A A A >
> > A A A > A A A A A A A 2. "if (a < 0 && b == c || a > 0 && b == d)" -
> the first
> > A A A comparison
> > A A A > A A A A A of
> > A A A > A A A A A A A 'a' against zero is done twice, when the flag
> results of
> > A A A the first
> > A A A > A A A A A A A comparison could be used for the second
> comparison.
> > A A A >
> > A A A > A A A A A I've made a patch (attached) that removes this
> extra
> > A A A comparison. A More
> > A A A > A A A A A complex cases like comparisons with non-zero
> immediate values
> > A A A or with
> > A A A > A A A A A registers doesn't seem to be task for a code
> generator. A
> > A A A Comparing with
> > A A A > A A A A A zero is quite common, so I seems to be worth
> adding.
> > A A A >
> > A A A > A A A A A Please review the patch. A Couldn't find a better
> place to
> > A A A make the
> > A A A > A A A A A change, but I'll be happy to adjust the patch if
> anyone has
> > A A A better
> > A A A > A A A A A ideas.
> > A A A >
> > A A A > A A A A A Best regards,
> > A A A > A A A A A Sergey
> > A A A >
> > A A A > A A A A A 0:
> > A A A http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
> > A A A >
> > A A A > A A A A A _______________________________________________
> > A A A > A A A A A llvm-commits mailing list
> > A A A > A A A A A llvm-commits at cs.uiuc.edu
> > A A A > A A A A A
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
[AArch64] Enable sign check optimization by CSE
Turn "x < 1" condition into "x <= 0" to make both "x < 0" and "x > 0"
checks generate the same code. This allows CSE pass to remove
duplicated compare instructions in this quite common branching case.
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp b/lib/Target/AArch64/AArch64ISelLowering.cpp
index 2c677ab..83e7930 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -1023,6 +1023,7 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
if (ConstantSDNode *RHSC = dyn_cast<ConstantSDNode>(RHS.getNode())) {
EVT VT = RHS.getValueType();
uint64_t C = RHSC->getZExtValue();
+
if (!isLegalArithImmed(C)) {
// Constant does not fit, try adjusting it by one?
switch (CC) {
@@ -1072,6 +1073,15 @@ static SDValue getAArch64Cmp(SDValue LHS, SDValue RHS, ISD::CondCode CC,
}
break;
}
+ } else {
+ // Transform "x < 1" to be "x <= 0" to make both "x < 0" and "x > 0" checks
+ // generate the same code. This allows CSE pass to remove duplicated
+ // compare instructions in this quite common branching case.
+ if (RHSC->isOne() && CC == ISD::SETLT) {
+ CC = ISD::SETLE;
+ C = (RHS.getValueType() == MVT::i32) ? (uint32_t)(C - 1) : (C - 1);
+ RHS = DAG.getConstant(C, RHS.getValueType());
+ }
}
}
diff --git a/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
new file mode 100644
index 0000000..aa74be9
--- /dev/null
+++ b/test/CodeGen/AArch64/combine-sign-comparisons-by-cse.ll
@@ -0,0 +1,39 @@
+; RUN: llc < %s -march=aarch64 -mtriple=aarch64-linux-gnu | FileCheck %s
+
+; marked as external to prevent possible optimizations
+ at a = external global i32
+ at b = external global i32
+ at c = external global i32
+ at d = external global i32
+
+define void @combine-sign-comparisons-by-cse() {
+; CHECK-LABEL: %lor.lhs.false
+; CHECK-NEXT: b.l
+entry:
+ %0 = load i32* @a, align 4
+ %cmp = icmp slt i32 %0, 0
+ br i1 %cmp, label %land.lhs.true, label %lor.lhs.false
+
+land.lhs.true:
+ %1 = load i32* @b, align 4
+ %2 = load i32* @c, align 4
+ %cmp1 = icmp eq i32 %1, %2
+ br i1 %cmp1, label %return, label %if.end
+
+lor.lhs.false:
+ %cmp2 = icmp sgt i32 %0, 0
+ br i1 %cmp2, label %land.lhs.true3, label %if.end
+
+land.lhs.true3:
+ %3 = load i32* @b, align 4
+ %4 = load i32* @d, align 4
+ %cmp4 = icmp eq i32 %3, %4
+ br i1 %cmp4, label %return, label %if.end
+
+if.end:
+ br label %return
+
+return:
+ %retval.0 = phi i32 [ 0, %if.end ], [ 1, %land.lhs.true3 ], [ 1, %land.lhs.true ]
+ ret void
+}
More information about the llvm-commits
mailing list