[PATCH] [Multiprecision Arithmetic Optimization] Teach SelectionDAG how to promote the integer carry result of {add, sub}{c, e} to a larger integer type.

Wed May 22 14:29:22 PDT 2013

On May 22, 2013, at 12:33 AM, Duncan Sands <duncan.sands at gmail.com> wrote:

> Hi Michael,
> 
> On 21/05/13 23:53, Michael Gottesman wrote:
>> Hey Duncan. I am going to respond in a backwards order since I think it will
>> yield additional clarity [if it does not, please tell me and I will elaborate = )]:
>> 
>>> Actually, isn't ADDC really exactly the same as UADDO?  If so, how about
>>> eliminating the ADDC in favour of UADDO?  In any case, there's already
>>> promotion logic for UADDO, DAGTypeLegalizer::PromoteIntRes_UADDSUBO, so
>>> maybe you could take a look.
> 
> actually I should probably have said: eliminate UADDO, replacing it with ADDC
> everywhere.
> 

This seems out of scope for this patch since this patch is a small feature addition patch. Changing UADDO -> ADDC everywhere will definitely require changing a bunch of code in the X86 backend at least if not more backends. If we want to do this, we should file a PR (just due to the magnitude of the work).

>> 
>> While ADDC/UADDO seem similar they are meant to model two different things.
>> 
>> UADDO is meant to model the situation where one performs an addition and wishes
>> to detect overflow as a boolean value and perform some non-arithmetic operation,
> 
> How do you know it will be used for non-arithmetic operations?

I was basing my assertion on the prior art in the X86 backend.

> 
>> not perform additional arithmetic with said overflow value. Thus it makes sense
>> when we are promoting UADDO into an add + setne operation since UADDO is lowered
>> into similar code anyways (see LowerXALUO X86ISelLowering.cpp) meaning no
>> performance hit.
>> 
>> On the other hand ADDC is meant to model the situation where one wishes to
>> perform additional arithmetic with the overflow value (specifically in a
>> multiprecision arithmetic context), i.e. on intel:
>> 
>> addq (%rbx), %rax
>> adcq 8(%rbx), %rcx
>> adcq 16(%rbx), %ddx
>> ...
>> 
>> If we were to lower ADDC like UADDO, we would instead have something along the
>> lines of the following non-optimal codegen:
>> 
>> xorq %rax, %rax
>> addq (%rbx), %rcx
>> setne %al
>> movq %rax, %r8
>> addq 8(%rbx), %r9
>> getne %al
>> addq %r9, %r8
>> ...
>> 
>> Thus the necessity of this distinction. X86 definitely follows said distinction
>> if you look at the lowering.
> 
> I disagree.  The ADDC and UADDO SDAG nodes have exactly the same definition.
> IMO, the fact that UADDO codegen is poor is a codegen weakness that should be
> improved, not a reason to keep two nodes, unless there really is no other way.

Again, I feel that changing UADDO -> ADDC to be out of scope for this patch since the patch is really only about handling the lowering of ADDC, not changing a bunch of code to change all usages of UADDO -> ADDC everywhere.

> 
>> 
>> Which leads me to your first paragraph.
>> 
>>> thanks for adding the ResNo == 0 case, but I don't think it is correct.
>>> That's because you continue to return an ADDC code, yet because you zero
>>> extended the add operands there is no reason why the extended operands
>>> should overflow (set the carry flag) exactly when the original operation
>>> would.
>> 
>> This is not completely true. It depends on how you do it. Specifically it
>> depends on how one legalizes the source operations of the addc.
> 
> No it doesn't, this is not how type legalization works.  Upper bits of promoted
> integers contain *undefined bits*.  If they happen to be zext'd then you just
> got lucky, you can't rely on it.  That's why if you *require* the extra bits to
> be zero-extended you need to use ZExtPromotedInteger, which explicitly zeros out
> the extra bits (the optimizers are often able to get rid of the explicit zeroing
> out if instead they can use an incoming node for which they can prove that the
> bits are zero already).
> 
> Anyway, this is something of a digression, since promotion *doesn't change the
> meaning of what you are calculating*.  Consider the following operation:
> 
>  (i8 x, i1 flag) = addc i8 255, 2
> 
> Then you get:
> 
>   x = i8 1
>   flag = i1 1
> 
> On a platform where i8 and i1 have to be promoted to 32 bits, this *must* get
> legalized to:
> 
>  x = a 32 bit integer, where the 8 lower bits are 00000001 (aka i8 1), and the
> rest can be anything (it doesn't matter what they are, you can choose something
> convenient)
> 
>  flag = a 32 bit integer, where the lowest bit is 1, and the rest can be
> anything.  (If promoting according to the so-called "boolean contents" then
> the extra bits may have additional requirements, such as being zero- or sign-
> extended from the lowest bit).
> 
> Now consider what your logic does, supposing first you have to promote ResNo 0.
> Your logic forms
> 
>  (i32 x', i1 flag') = addc i32 255, 2
> 
> So you get:
> 
>  x' = i32 257
>  flag' = i1 0  (as this i32 operation does not set the carry flag)
> 
> It is OK to use x' for the promoted value of x, since the 8 lowest bits are
> 00000001.  But it is not OK to use flag' for flag, since the lowest bit is
> zero, not 1.
> 
> In short, the problem is correctly calculating the flag (you have to check if
> bit 8 of x' was set), not calculating the sum.  See the UADDO promotion logic
> for one way that calculating the flag can be done.

Ok. My point was that if we had certain guarantees than it *could* work. On the other hand as you have pointed out in this section of your response we do not have any of those guarantees. The end result is my digression was a waste of text and time = /.

Would zero extending just ResNo 0 be a satisfactory lowering (i.e. no inputs, just ResNo 0)?

Thanks for the brain dumping = ).

Michael

> 
> For instance if
>> one is loading from an 8 bit array, zext to a 32 bit operation and then perform
>> the addc you would have the issue you are thinking of. In contrast, if you could
>> guarantee that the memory access was promoted to 32 bit (since addition of
>> multiprecision operations with different word sizes used yields the same
>> result), one would no longer have that issue. On the other hand, I do not think
>> I can assume this will be done and it is irrelevant to the discussion since we
>> are discussing PromotingIntegerResults = p.
>> 
>> Given that we are talking about promoting integer results, perhaps the proper
>> way to do this is to just zero extend ResNo == 0 (i.e. not the operands, just
>> the result). What do you think Duncan?
>> 
>> Michael
>> 
>> P.S. I realized the reason why I returned SDValue was b/c I thought the integer
>> promotion was like DAGCombine where one returns SDValue if one is not doing
>> anything vs asserting in PromoteIntegerResult where it looks like you assert. = (.
> 
> In type legalization, first the target gets to have a go at legalizing the
> operation.  It can return SDValue() if it doesn't know how or doesn't want to.
> Then the generic logic (eg your ADDC promotion method) kicks in.  If it can't
> legalize the value *then there is no fall-back*: no-one knows how to legalize
> the value, type legalization has failed.  So someone has to assert somewhere!
> As returning SDValue() is used to mean "I legalized the type by completely
> zapping the node" there is no convenient way to signal to the machinery that
> you failed (i.e. having it assert for you), so you have to just assert yourself.

> 
> Ciao, Duncan.
> 
>> 
>> On May 21, 2013, at 12:12 PM, Duncan Sands <duncan.sands at gmail.com
>> <mailto:duncan.sands at gmail.com>> wrote:
>> 
>>> Hi Michael,
>>> 
>>> On 17/05/13 23:33, Michael Gottesman wrote:
>>>> Hows this look Duncan/Sean?
>>> 
>>> 
>>> I didn't think about ADDE but at a glance it has the same flaw.
>>> 
>>> Ciao, Duncan.
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Michael
>>>> 
>>>> On May 16, 2013, at 10:41 AM, Michael Gottesman <mgottesman at apple.com
>>>> <mailto:mgottesman at apple.com>
>>>> <mailto:mgottesman at apple.com>> wrote:
>>>> 
>>>>> I am trying to be incremental and did not have a need for such lowering. On
>>>>> the other hand if you want I can implement that in this patch. Just state your
>>>>> preference = ).
>>>>> 
>>>>> Michael
>>>>> 
>>>>> On May 16, 2013, at 1:34 AM, Duncan Sands <duncan.sands at gmail.com
>>>>> <mailto:duncan.sands at gmail.com>
>>>>> <mailto:duncan.sands at gmail.com>> wrote:
>>>>> 
>>>>>> Hi Michael,
>>>>>> 
>>>>>> On 15/05/13 19:37, Michael Gottesman wrote:
>>>>>>> The attached patch teaches selection DAG how to promote the integer carry
>>>>>>> result of an {add,sub}{c,e} node to a larger integer type. It is a part of a
>>>>>>> larger set of patches that I will be committing over the next little bit to
>>>>>>> enable the optimization of uaddo/usubo chains.
>>>>>>> 
>>>>>>> Please Review,
>>>>>>> Michael
>>>>>> 
>>>>>> +SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBC(SDNode *N, unsigned ResNo) {
>>>>>> +  if (ResNo == 1)
>>>>>> +    return PromoteIntRes_Overflow(N);
>>>>>> +  return SDValue();
>>>>>> +}
>>>>>> +
>>>>>> +SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBE(SDNode *N, unsigned ResNo) {
>>>>>> +  if (ResNo == 1)
>>>>>> +    return PromoteIntRes_Overflow(N);
>>>>>> +  return SDValue();
>>>>>> +}
>>>>>> 
>>>>>> why do you return SDValue() if ResNo != 1 ?  Shouldn't you either assert or
>>>>>> promote the value?
>>>>>> 
>>>>>> Ciao, Duncan.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu><mailto:llvm-commits at cs.uiuc.edu>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130522/234aa3ff/attachment.html>