[llvm-commits] [PATCH] add-carray/sub-borrow optimization (CodeGen, X86)
Manman Ren
mren at apple.com
Wed Oct 31 10:05:32 PDT 2012
On Oct 31, 2012, at 9:47 AM, Shuxin Yang wrote:
> Hi, Manman:
>
> Thank you so much for the code review. I will fix the typo. Sorry for the sloppiness.
> I break the complicate the conditions into two nesting if-statements,
> because I think it is easier to read.
> As to "Can we directly return a SETCC_CARRY instead of falling to the if"..., certainly we can
> but it duplicate the codes, and have to maintain the code. Unfortunately, if the duplication
> is not very high, compiler is able to duplicate that code for us via range-propagation/jump-threading.
>> + CC = X86::COND_B;
>> + SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
>> + EFLAGS.getNode()->getVTList(),
>> + EFLAGS.getOperand(1), EFLAGS.getOperand(0));
>> + EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
>> + SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
>> + DAG.getConstant(CC, MVT::i8), EFLAGS);
>> + N = NewVal.getNode ();
I suggested to return instead of falling to the "if" mainly due to readability.
if (CC == X86::COND_B)
return DAG.getNode(ISD::AND, DL, MVT::i8,
DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,
DAG.getConstant(CC, MVT::i8), EFLAGS),
DAG.getConstant(1, MVT::i8));
If we return SETCC_CARRY directly, there is no need to create NewVal.
I guess we have different opinions on readability :)
Thanks,
Manman
> I sketch the code bellow:
>
> if (cc == X86::COND_A) {
> ...
> cc = X86::COND_B;
> /* point x */
> }
>
> if (cc == X86::COND_B)
> statements xyz; // will be tail-dup to point x if cost permit.
>
> Thank you again for the code review.
>
> Shuxin
>
>
>
>
>
>
>
>
> On 10/31/12 9:17 AM, Manman Ren wrote:
>> Hi Shuxin,
>>
>> A few comments:
>> + if (CC == X86::COND_A) {
>> + // Try to convert cond_a into cond_b in an attemp to facilitate
>> Typo attempt
>> + // materializing "setb reg"; see the following code.
>> + //
>> + // Do not flip "e > c", where "c" is a constant, because Cmp instruction
>> + // cannot take an immedidate as its first operand.
>> Typo immediate
>> + //
>> + if (EFLAGS.getOpcode() == X86ISD::SUB && EFLAGS.hasOneUse() &&
>> + EFLAGS.getValueType().isInteger() &&
>> + !isa<ConstantSDNode>(EFLAGS.getOperand(1))) {
>> + CC = X86::COND_B;
>> + SDValue NewSub = DAG.getNode(X86ISD::SUB, EFLAGS.getDebugLoc(),
>> + EFLAGS.getNode()->getVTList(),
>> + EFLAGS.getOperand(1), EFLAGS.getOperand(0));
>> + EFLAGS = SDValue(NewSub.getNode(), EFLAGS.getResNo());
>> + SDValue NewVal = DAG.getNode(X86ISD::SETCC, DL, N->getVTList(),
>> + DAG.getConstant(CC, MVT::i8), EFLAGS);
>> + N = NewVal.getNode ();
>> Why not a single "if"?
>> Can we directly return a SETCC_CARRY instead of falling to the "if (CC == X86::COND_B)"?
>> Otherwise, looks good.
>>
>> Thanks,
>> Manman
>> + }
>> + }
>> +
>>
>> On Oct 30, 2012, at 3:37 PM, Shuxin Yang wrote:
>>
>>> The motivating example:
>>> ========================
>>>
>>> The attached patch is to fix the performance defect reported in rdar://problem/12579915.
>>> The motivating example is:
>>>
>>> -----------------------------------------
>>> int foo(unsigned x, unsigned y, int ret) {
>>> if (x > y)
>>> ++ret; /* or --ret */
>>> return ret;
>>> }
>>> -----------------------------------------
>>>
>>> Gcc gives:
>>> movl %edx, %eax // return val = 3rd parameter
>>> cmpl %edi, %esi // carry-bit = (y < x)
>>> adcl $0, %eax // return val = 0 + carry-bit.
>>> ret
>>>
>>> and LLVM gave:
>>> cmpl %esi, %edi // flags = x cmp y
>>> seta %al // %al = 1 iff x > y
>>> movzbl %al, %eax // cmp-result = zext (%al)
>>> addl %edx, %eax // return-val = <ret> + cmp-result
>>> ret
>>>
>>> unsigned-less-than (ult) cmp has a nice feature: carray bit is set iff the cmp is satisified.
>>> Code-gen an take advantage of this feature to optimize expr like this:
>>> (ult) = x + 1
>>> (ult) = x - 1
>>>
>>>
>>> The Fix
>>> ========
>>>
>>> LLVM is already able to generate right code if the comparision is "<" (unsigned).
>>> So, this patch is simply to flip "x >u y" to "y <u x" in PerformSETCCCombine().
>>>
>>> One testing case is provied; and a "CHECK: sub" in another testing case is removed
>>> because it is irrelevant, and its position depends on scheduling.
>>>
>>> TODO:
>>> =====
>>>
>>> 1. With this patch, the assembly is:
>>> cmpl %edi, %esi
>>> adcl $0, %edx
>>> movl %edx, %eax
>>> Compared to gcc, critial path has 3 instructions vs 2 instrs in gcc. As of I write
>>> this mail, I have not yet got chance to investigate how gcc reduce the critial path.
>>> Maybe it is just lucky. This opt seems to be bit difficult.
>>>
>>> 2. gcc is way better than llvm in this case:
>>>
>>> int test3(unsigned int x, unsigned int y, int res) {
>>> if (x > 100)
>>> res++;
>>> return res;
>>> }
>>>
>>> gcc give:
>>> movl %edx, %eax
>>> cmpl $101, %edi
>>> sbbl $-1, %eax
>>> ret
>>>
>>> Gcc handles all these cases in ce_try_addcc() of if-conversion pass in @ifcvt.c,
>>> while in llvm, the logic of such if-conv scatter in many places.
>>>
>>> 3. With -m32, the instruction sequence is worse than -m64, I have not yet got chance
>>> to dig the root cause.
>>>
>>> <diff.patch>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list