[llvm-commits] [PATCH] add-carray/sub-borrow optimization (CodeGen, X86)

Wed Oct 31 09:47:05 PDT 2012

Hi, Manman:

   Thank you so much for the code review.  I will fix the typo. Sorry 
for the sloppiness.
     I break the complicate the conditions into two nesting if-statements,
because I think it is easier to read.
As to "Can we directly return a SETCC_CARRY instead of falling to the 
if"..., certainly we can
but it duplicate the codes, and have to maintain the code. 
Unfortunately, if the duplication
is not very high, compiler is able to duplicate that code for us via 
range-propagation/jump-threading.
I sketch the code bellow:

    if (cc == X86::COND_A) {
       ...
        cc = X86::COND_B;
        /* point x */
     }

     if (cc == X86::COND_B)
         statements xyz;  // will be tail-dup to point x if cost permit.

  Thank you again for the code review.

Shuxin

On 10/31/12 9:17 AM, Manman Ren wrote:
> Hi Shuxin,
>
> A few comments:
> +  if (CC == X86::COND_A) {
> +    // Try to convert cond_a into cond_b in an attemp to facilitate
> Typo attempt
> +    // materializing "setb reg"; see the following code.
> +    //
> +    // Do not flip "e > c", where "c" is a constant, because Cmp instruction
> +    // cannot take an immedidate as its first operand.
> Typo immediate
> +    //
> +    if (EFLAGS.getOpcode() == X86ISD::SUB && EFLAGS.hasOneUse() &&
> +        EFLAGS.getValueType().isInteger() &&
> +        !isa<ConstantSDNode>(EFLAGS.getOperand(1))) {
> +      CC = X86::COND_B;
> +      SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
> +                                   EFLAGS.getNode()->getVTList(),
> +                                   EFLAGS.getOperand(1), EFLAGS.getOperand(0));
> +      EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
> +      SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
> +                                   DAG.getConstant(CC, MVT::i8), EFLAGS);
> +      N = NewVal.getNode ();
> Why not a single "if"?
> Can we directly return a SETCC_CARRY instead of falling to the "if (CC == X86::COND_B)"?
> Otherwise, looks good.
>
> Thanks,
> Manman
> +    }
> +  }
> +
>
> On Oct 30, 2012, at 3:37 PM, Shuxin Yang wrote:
>
>> The motivating example:
>> ========================
>>
>>   The attached patch is to fix the performance defect reported in rdar://problem/12579915.
>> The motivating example is:
>>
>> -----------------------------------------
>> int foo(unsigned x, unsigned y, int ret) {
>>     if (x > y)
>>         ++ret; /* or --ret */
>>     return ret;
>> }
>> -----------------------------------------
>>
>> Gcc gives:
>>         movl    %edx, %eax // return val = 3rd parameter
>>         cmpl    %edi, %esi // carry-bit = (y < x)
>>         adcl    $0, %eax   // return val = 0 + carry-bit.
>>         ret
>>
>> and LLVM gave:
>>         cmpl    %esi, %edi  // flags = x cmp y
>>         seta    %al         // %al = 1 iff x > y
>>         movzbl  %al, %eax   // cmp-result = zext (%al)
>>         addl    %edx, %eax  // return-val = <ret> + cmp-result
>>         ret
>>
>> unsigned-less-than (ult) cmp has a nice feature: carray bit is set iff the cmp is satisified.
>> Code-gen an take advantage of this feature to optimize expr like this:
>>   (ult)  = x + 1
>>   (ult)  = x - 1
>>
>>
>> The Fix
>> ========
>>
>>   LLVM is already able to generate right code if the comparision is "<" (unsigned).
>> So, this patch is simply to flip "x >u y" to "y <u x" in PerformSETCCCombine().
>>
>>   One testing case is provied; and a "CHECK: sub" in another testing case is removed
>> because it is irrelevant, and its position depends on scheduling.
>>
>> TODO:
>> =====
>>
>>   1. With this patch, the assembly is:
>>         cmpl    %edi, %esi
>>         adcl    $0, %edx
>>         movl    %edx, %eax
>>     Compared to gcc, critial path has 3 instructions vs 2 instrs in gcc. As of I write
>>     this mail, I have not yet got chance to investigate how gcc reduce the critial path.
>>     Maybe it is just lucky.  This opt seems to be bit difficult.
>>
>>   2. gcc is way better than llvm in this case:
>>
>>     int test3(unsigned int x, unsigned int y, int res) {
>>       if (x > 100)
>>         res++;
>>       return res;
>>     }
>>
>>     gcc give:
>>         movl    %edx, %eax
>>         cmpl    $101, %edi
>>         sbbl    $-1, %eax
>>         ret
>>
>>     Gcc handles all these cases in ce_try_addcc() of if-conversion pass in @ifcvt.c,
>>     while in llvm, the logic of such if-conv scatter in many places.
>>
>>   3. With -m32, the instruction sequence is worse than -m64, I have not yet got chance
>>      to dig the root cause.
>>
>> <diff.patch>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits