[llvm-commits] [PATCH] add-carray/sub-borrow optimization (CodeGen, X86)
Shuxin Yang
shuxin.llvm at gmail.com
Tue Oct 30 15:37:50 PDT 2012
The motivating example:
========================
The attached patch is to fix the performance defect reported in
rdar://problem/12579915.
The motivating example is:
-----------------------------------------
int foo(unsigned x, unsigned y, int ret) {
if (x > y)
++ret; /* or --ret */
return ret;
}
-----------------------------------------
Gcc gives:
movl %edx, %eax // return val = 3rd parameter
cmpl %edi, %esi // carry-bit = (y < x)
adcl $0, %eax // return val = 0 + carry-bit.
ret
and LLVM gave:
cmpl %esi, %edi // flags = x cmp y
seta %al // %al = 1 iff x > y
movzbl %al, %eax // cmp-result = zext (%al)
addl %edx, %eax // return-val = <ret> + cmp-result
ret
unsigned-less-than (ult) cmp has a nice feature: carray bit is set iff
the cmp is satisified.
Code-gen an take advantage of this feature to optimize expr like this:
(ult) = x + 1
(ult) = x - 1
The Fix
========
LLVM is already able to generate right code if the comparision is "<"
(unsigned).
So, this patch is simply to flip "x >u y" to "y <u x" in
PerformSETCCCombine().
One testing case is provied; and a "CHECK: sub" in another testing
case is removed
because it is irrelevant, and its position depends on scheduling.
TODO:
=====
1. With this patch, the assembly is:
cmpl %edi, %esi
adcl $0, %edx
movl %edx, %eax
Compared to gcc, critial path has 3 instructions vs 2 instrs in
gcc. As of I write
this mail, I have not yet got chance to investigate how gcc reduce
the critial path.
Maybe it is just lucky. This opt seems to be bit difficult.
2. gcc is way better than llvm in this case:
int test3(unsigned int x, unsigned int y, int res) {
if (x > 100)
res++;
return res;
}
gcc give:
movl %edx, %eax
cmpl $101, %edi
sbbl $-1, %eax
ret
Gcc handles all these cases in ce_try_addcc() of if-conversion pass
in @ifcvt.c,
while in llvm, the logic of such if-conv scatter in many places.
3. With -m32, the instruction sequence is worse than -m64, I have not
yet got chance
to dig the root cause.
-------------- next part --------------
Index: test/CodeGen/X86/jump_sign.ll
===================================================================
--- test/CodeGen/X86/jump_sign.ll (revision 166937)
+++ test/CodeGen/X86/jump_sign.ll (working copy)
@@ -219,7 +219,6 @@
; by sbb, we should not optimize cmp away.
define i32 @q(i32 %j.4, i32 %w, i32 %el) {
; CHECK: q:
-; CHECK: sub
; CHECK: cmp
; CHECK-NEXT: sbb
%tmp532 = add i32 %j.4, %w
Index: test/CodeGen/X86/add-of-carry.ll
===================================================================
--- test/CodeGen/X86/add-of-carry.ll (revision 166937)
+++ test/CodeGen/X86/add-of-carry.ll (working copy)
@@ -30,4 +30,17 @@
ret i32 %z.0
}
+; <rdar://problem/12579915>
+define i32 @test3(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
+entry:
+ %cmp = icmp ugt i32 %x, %y
+ %dec = sext i1 %cmp to i32
+ %dec.res = add nsw i32 %dec, %res
+ ret i32 %dec.res
+; CHECK: test3:
+; CHECK: cmpl
+; CHECK: sbbl
+; CHECK: ret
+}
+
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp (revision 166937)
+++ lib/Target/X86/X86ISelLowering.cpp (working copy)
@@ -16474,6 +16474,27 @@
X86::CondCode CC = X86::CondCode(N->getConstantOperandVal(0));
SDValue EFLAGS = N->getOperand(1);
+ if (CC == X86::COND_A) {
+ // Try to convert cond_a into cond_b in an attemp to facilitate
+ // materializing "setb reg"; see the following code.
+ //
+ // Do not flip "e > c", where "c" is a constant, because Cmp instruction
+ // cannot take an immedidate as its first operand.
+ //
+ if (EFLAGS.getOpcode() == X86ISD::SUB && EFLAGS.hasOneUse() &&
+ EFLAGS.getValueType().isInteger() &&
+ !isa<ConstantSDNode>(EFLAGS.getOperand(1))) {
+ CC = X86::COND_B;
+ SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
+ EFLAGS.getNode()->getVTList(),
+ EFLAGS.getOperand(1), EFLAGS.getOperand(0));
+ EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
+ SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
+ DAG.getConstant(CC, MVT::i8), EFLAGS);
+ N = NewVal.getNode ();
+ }
+ }
+
// Materialize "setb reg" as "sbb reg,reg", since it can be extended without
// a zext and produces an all-ones bit which is more useful than 0/1 in some
// cases.
More information about the llvm-commits
mailing list