[llvm-commits] [PATCH] add-carray/sub-borrow optimization (CodeGen, X86)

Wed Oct 31 10:05:32 PDT 2012

On Oct 31, 2012, at 9:47 AM, Shuxin Yang wrote:

> Hi, Manman:
> 
>  Thank you so much for the code review.  I will fix the typo. Sorry for the sloppiness.
>    I break the complicate the conditions into two nesting if-statements,
> because I think it is easier to read.
> As to "Can we directly return a SETCC_CARRY instead of falling to the if"..., certainly we can
> but it duplicate the codes, and have to maintain the code. Unfortunately, if the duplication
> is not very high, compiler is able to duplicate that code for us via range-propagation/jump-threading.

>> +      CC = X86::COND_B;
>> +      SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
>> +                                   EFLAGS.getNode()->getVTList(),
>> +                                   EFLAGS.getOperand(1), EFLAGS.getOperand(0));
>> +      EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
>> +      SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
>> +                                   DAG.getConstant(CC, MVT::i8), EFLAGS);
>> +      N = NewVal.getNode ();

I suggested to return instead of falling to the "if" mainly due to readability. 
  if (CC == X86::COND_B)
    return DAG.getNode(ISD::AND, DL, MVT::i8,
                       DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,
                                   DAG.getConstant(CC, MVT::i8), EFLAGS),
                       DAG.getConstant(1, MVT::i8));

If we return SETCC_CARRY directly, there is no need to create NewVal.
I guess we have different opinions on readability :)

Thanks,
Manman

> I sketch the code bellow:
> 
>   if (cc == X86::COND_A) {
>      ...
>       cc = X86::COND_B;
>       /* point x */
>    }
> 
>    if (cc == X86::COND_B)
>        statements xyz;  // will be tail-dup to point x if cost permit.
> 
> Thank you again for the code review.
> 
> Shuxin
> 
> 
> 
> 
> 
> 
> 
> 
> On 10/31/12 9:17 AM, Manman Ren wrote:
>> Hi Shuxin,
>> 
>> A few comments:
>> +  if (CC == X86::COND_A) {
>> +    // Try to convert cond_a into cond_b in an attemp to facilitate
>> Typo attempt
>> +    // materializing "setb reg"; see the following code.
>> +    //
>> +    // Do not flip "e > c", where "c" is a constant, because Cmp instruction
>> +    // cannot take an immedidate as its first operand.
>> Typo immediate
>> +    //
>> +    if (EFLAGS.getOpcode() == X86ISD::SUB && EFLAGS.hasOneUse() &&
>> +        EFLAGS.getValueType().isInteger() &&
>> +        !isa<ConstantSDNode>(EFLAGS.getOperand(1))) {
>> +      CC = X86::COND_B;
>> +      SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
>> +                                   EFLAGS.getNode()->getVTList(),
>> +                                   EFLAGS.getOperand(1), EFLAGS.getOperand(0));
>> +      EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
>> +      SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
>> +                                   DAG.getConstant(CC, MVT::i8), EFLAGS);
>> +      N = NewVal.getNode ();
>> Why not a single "if"?
>> Can we directly return a SETCC_CARRY instead of falling to the "if (CC == X86::COND_B)"?
>> Otherwise, looks good.
>> 
>> Thanks,
>> Manman
>> +    }
>> +  }
>> +
>> 
>> On Oct 30, 2012, at 3:37 PM, Shuxin Yang wrote:
>> 
>>> The motivating example:
>>> ========================
>>> 
>>>  The attached patch is to fix the performance defect reported in rdar://problem/12579915.
>>> The motivating example is:
>>> 
>>> -----------------------------------------
>>> int foo(unsigned x, unsigned y, int ret) {
>>>    if (x > y)
>>>        ++ret; /* or --ret */
>>>    return ret;
>>> }
>>> -----------------------------------------
>>> 
>>> Gcc gives:
>>>        movl    %edx, %eax // return val = 3rd parameter
>>>        cmpl    %edi, %esi // carry-bit = (y < x)
>>>        adcl    $0, %eax   // return val = 0 + carry-bit.
>>>        ret
>>> 
>>> and LLVM gave:
>>>        cmpl    %esi, %edi  // flags = x cmp y
>>>        seta    %al         // %al = 1 iff x > y
>>>        movzbl  %al, %eax   // cmp-result = zext (%al)
>>>        addl    %edx, %eax  // return-val = <ret> + cmp-result
>>>        ret
>>> 
>>> unsigned-less-than (ult) cmp has a nice feature: carray bit is set iff the cmp is satisified.
>>> Code-gen an take advantage of this feature to optimize expr like this:
>>>  (ult)  = x + 1
>>>  (ult)  = x - 1
>>> 
>>> 
>>> The Fix
>>> ========
>>> 
>>>  LLVM is already able to generate right code if the comparision is "<" (unsigned).
>>> So, this patch is simply to flip "x >u y" to "y <u x" in PerformSETCCCombine().
>>> 
>>>  One testing case is provied; and a "CHECK: sub" in another testing case is removed
>>> because it is irrelevant, and its position depends on scheduling.
>>> 
>>> TODO:
>>> =====
>>> 
>>>  1. With this patch, the assembly is:
>>>        cmpl    %edi, %esi
>>>        adcl    $0, %edx
>>>        movl    %edx, %eax
>>>    Compared to gcc, critial path has 3 instructions vs 2 instrs in gcc. As of I write
>>>    this mail, I have not yet got chance to investigate how gcc reduce the critial path.
>>>    Maybe it is just lucky.  This opt seems to be bit difficult.
>>> 
>>>  2. gcc is way better than llvm in this case:
>>> 
>>>    int test3(unsigned int x, unsigned int y, int res) {
>>>      if (x > 100)
>>>        res++;
>>>      return res;
>>>    }
>>> 
>>>    gcc give:
>>>        movl    %edx, %eax
>>>        cmpl    $101, %edi
>>>        sbbl    $-1, %eax
>>>        ret
>>> 
>>>    Gcc handles all these cases in ce_try_addcc() of if-conversion pass in @ifcvt.c,
>>>    while in llvm, the logic of such if-conv scatter in many places.
>>> 
>>>  3. With -m32, the instruction sequence is worse than -m64, I have not yet got chance
>>>     to dig the root cause.
>>> 
>>> <diff.patch>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>