[PATCH] D35635: [ARM] Optimize {s,u}{add,sub}.with.overflow

Thu Dec 7 10:06:25 PST 2017

jgalenson added inline comments.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:3942
+    // We do not use it in the USUBO case as Value may not be used.
+    Value = DAG.getNode(ISD::ADDC, dl, Op.getValueType(), LHS, RHS).getValue(0);
     OverflowCmp = DAG.getNode(ARMISD::CMP, dl, MVT::Glue, Value, LHS);
----------------
rogfer01 wrote:
> efriedma wrote:
> > Still the wrong type... you need to call getVTList to get the right type for an ADDC.
> > 
> > Also, are you sure you don't want an ARMISD::ADDC, rather than an ISD::ADDC?
> This comment now looks odd because I had to revert D35192 (still working on it, though).
> 
> What was the reason to use a target specific node here? Did you want to save some round trips lowering this or you needed it for better codegen?
I used the target-specific node here because efriedma suggested it, but using the generic ISD::ADDC does seem to work (it passes all the tests).  Should I change it?  I don't know much about the difference between them.

It is interesting that this still works after you reverted your patch.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:3916
 std::pair<SDValue, SDValue>
 ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
                                  SDValue &ARMcc) const {
----------------
rogfer01 wrote:
> We could return `std::tuple<SDValue, SDValue, SDValue>` here I think, but better we leave this for a later change.
Yes, that does seem cleaner to me.  I can make a follow-up patch after this lands.

================
Comment at: lib/Target/ARM/ARMISelLowering.cpp:4542-4543
+    // Reverse the condition code.
+    ARMCC::CondCodes CondCode =
+        (ARMCC::CondCodes)cast<const ConstantSDNode>(ARMcc)->getZExtValue();
+    CondCode = ARMCC::getOppositeCondition(CondCode);
----------------
rogfer01 wrote:
> jgalenson wrote:
> > rogfer01 wrote:
> > > Apologies if this question looks a bit clueless about all the transformation, but why you need to reverse the condition code here?
> > No, it's a good question, and I'm a bit confused about this myself.
> > 
> > The getARMXALUOOp function seems to return the condition for whether there is not an overflow.  It doesn't seem to be documented anywhere, but for example, in its SADDO case, it uses the VC flag, which is the "no overflow" case.
> > 
> > Assuming that's right, then since these branches are branching when an overflow occurs, I need to invert the condition.
> > 
> > Does that logic seem right?
> It is certainly confusing.
> 
> It returns three things: the intended arithmetic code (in `Value`), the comparison of the overflow (in `OverflowCmp`) and a condition code related to `OverflowCmp` (in `ARMcc`). For  unsigned addition (`UADDO`) it does the following
> 
> ```
> Value: add z, x, y
> OverflowCmp: cmp z, x
> ARMcc: HS
> ```
> 
> The comparison will compute `z - x` and update the flags. Using the `HS` condition code means `z >= x` and this is exactly *no overflow* (because `z ← x + y`).
> 
> Similarly when doing a signed addition (`SADDO`)
> ```
> Value: add z, x, y
> OverflowCmp: cmp z, x
> ARMcc: VC
> ```
> 
> This case is less obvious but if there is no overflow in this case there must not be overflow in the comparison (`VC`): if there were no overflow in the `add`, then `cmp` because it does `z - x` to update the flags would compute `y` which does not overflow.
> 
> Similarly for `USUBO` and `SSUBO` but here the comparison is `cmp x, y` (instead of `cmp z, x`) which is (suprisingly) more intuitive because unsigned `x - y` will not overflow if `x >= y` and trivially for the signed case as not overflowing is exactly `VC`.
> 
> So after this wall of text, I believe your logic is sensible here.
> 
> 
> Given that `getARMXALUOOp` is now going to be used in three places after your change, would it be possible to add a comment to that function like the one below (or an improved wording if you come up with a better one!)
> 
> ```lang=cpp
> // This function returns three things: the arithmetic computation itself
> // (Value), a comparison (OverflowCmp) and a condition code (ARMcc).  The
> // comparison and the condition code define the case in which the arithmetic
> // computation *does not* overflow.
> std::pair<SDValue, SDValue>
> ARMTargetLowering::getARMXALUOOp(SDValue Op, SelectionDAG &DAG,
>                                  SDValue &ARMcc) const {
> ...
> ```
> 
> 
Thanks for the long double-checking!  That comment looks good to me, so I'll add it.

https://reviews.llvm.org/D35635