[llvm-commits] [PATCH] add-carray/sub-borrow optimization (CodeGen, X86)

Tue Oct 30 15:37:50 PDT 2012

The motivating example:
========================

   The attached patch is to fix the performance defect reported in 
rdar://problem/12579915.
The motivating example is:

  -----------------------------------------
  int foo(unsigned x, unsigned y, int ret) {
     if (x > y)
         ++ret; /* or --ret */
     return ret;
  }
  -----------------------------------------

  Gcc gives:
         movl    %edx, %eax // return val = 3rd parameter
         cmpl    %edi, %esi // carry-bit = (y < x)
         adcl    $0, %eax   // return val = 0 + carry-bit.
         ret

  and LLVM gave:
         cmpl    %esi, %edi  // flags = x cmp y
         seta    %al         // %al = 1 iff x > y
         movzbl  %al, %eax   // cmp-result = zext (%al)
         addl    %edx, %eax  // return-val = <ret> + cmp-result
         ret

  unsigned-less-than (ult) cmp has a nice feature: carray bit is set iff 
the cmp is satisified.
Code-gen an take advantage of this feature to optimize expr like this:
   (ult)  = x + 1
   (ult)  = x - 1


The Fix
========

   LLVM is already able to generate right code if the comparision is "<" 
(unsigned).
So, this patch is simply to flip "x >u y" to "y <u x" in 
PerformSETCCCombine().

   One testing case is provied; and a "CHECK: sub" in another testing 
case is removed
because it is irrelevant, and its position depends on scheduling.

TODO:
=====

   1. With this patch, the assembly is:
         cmpl    %edi, %esi
         adcl    $0, %edx
         movl    %edx, %eax
     Compared to gcc, critial path has 3 instructions vs 2 instrs in 
gcc. As of I write
     this mail, I have not yet got chance to investigate how gcc reduce 
the critial path.
     Maybe it is just lucky.  This opt seems to be bit difficult.

   2. gcc is way better than llvm in this case:

     int test3(unsigned int x, unsigned int y, int res) {
       if (x > 100)
         res++;
       return res;
     }

     gcc give:
         movl    %edx, %eax
         cmpl    $101, %edi
         sbbl    $-1, %eax
         ret

     Gcc handles all these cases in ce_try_addcc() of if-conversion pass 
in @ifcvt.c,
     while in llvm, the logic of such if-conv scatter in many places.

   3. With -m32, the instruction sequence is worse than -m64, I have not 
yet got chance
      to dig the root cause.

-------------- next part --------------
Index: test/CodeGen/X86/jump_sign.ll
===================================================================

--- test/CodeGen/X86/jump_sign.ll	(revision 166937)
+++ test/CodeGen/X86/jump_sign.ll	(working copy)
@@ -219,7 +219,6 @@
 ; by sbb, we should not optimize cmp away.
 define i32 @q(i32 %j.4, i32 %w, i32 %el) {
 ; CHECK: q:
-; CHECK: sub
 ; CHECK: cmp
 ; CHECK-NEXT: sbb
   %tmp532 = add i32 %j.4, %w
Index: test/CodeGen/X86/add-of-carry.ll
===================================================================
--- test/CodeGen/X86/add-of-carry.ll	(revision 166937)
+++ test/CodeGen/X86/add-of-carry.ll	(working copy)
@@ -30,4 +30,17 @@
   ret i32 %z.0
 }
 
+; <rdar://problem/12579915>
+define i32 @test3(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
+entry:
+  %cmp = icmp ugt i32 %x, %y
+  %dec = sext i1 %cmp to i32
+  %dec.res = add nsw i32 %dec, %res
+  ret i32 %dec.res
+; CHECK: test3:
+; CHECK: cmpl
+; CHECK: sbbl
+; CHECK: ret
+}
+
 declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp	(revision 166937)
+++ lib/Target/X86/X86ISelLowering.cpp	(working copy)
@@ -16474,6 +16474,27 @@
   X86::CondCode CC = X86::CondCode(N->getConstantOperandVal(0));
   SDValue EFLAGS = N->getOperand(1);
 
+  if (CC == X86::COND_A) {
+    // Try to convert cond_a into cond_b in an attemp to facilitate 
+    // materializing "setb reg"; see the following code.
+    //
+    // Do not flip "e > c", where "c" is a constant, because Cmp instruction
+    // cannot take an immedidate as its first operand.
+    //
+    if (EFLAGS.getOpcode() == X86ISD::SUB && EFLAGS.hasOneUse() && 
+        EFLAGS.getValueType().isInteger() &&
+        !isa<ConstantSDNode>(EFLAGS.getOperand(1))) {
+      CC = X86::COND_B;
+      SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
+                                   EFLAGS.getNode()->getVTList(),
+                                   EFLAGS.getOperand(1), EFLAGS.getOperand(0));
+      EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
+      SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
+                                   DAG.getConstant(CC, MVT::i8), EFLAGS);
+      N = NewVal.getNode ();
+    }
+  }
+
   // Materialize "setb reg" as "sbb reg,reg", since it can be extended without
   // a zext and produces an all-ones bit which is more useful than 0/1 in some
   // cases.