[PATCH] D38160: [AArch64] Improve codegen for inverted overflow checking intrinsics

Thu Oct 5 12:07:58 PDT 2017

kristof.beyls added inline comments.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:1980-2012
+  ConstantSDNode *COp1 = dyn_cast<ConstantSDNode>(Other);
+  unsigned Opc = Sel.getOpcode();
+  // If the operand is an overflow checking operation, invert the condition
+  // code and kill the xor(op, 1).
+  if (Sel.getResNo() == 1 &&
+      (Opc == ISD::SADDO || Opc == ISD::UADDO || Opc == ISD::SSUBO ||
+       Opc == ISD::USUBO || Opc == ISD::SMULO || Opc == ISD::UMULO) &&
----------------
aemerson wrote:
> kristof.beyls wrote:
> > aemerson wrote:
> > > kristof.beyls wrote:
> > > > Hi Amara,
> > > > 
> > > > I'm only vaguely familiar with this area of the code base.
> > > > My understanding is that you're aiming to have one more pattern apply involving an xor node.
> > > > I think it'd be nice to write out the pattern in a comment just like is available for the pattern matched in the existing code in LowerXOR.
> > > > 
> > > > Apart from that, with my unfamiliarity of this code base, I wonder why this pattern matching optimization is done during lowering.
> > > > Are there good reasons this optimization isn't done elsewhere, e.g. described by a tablegen pattern or during DAGCombine (e.g. in performXorCombine in this same source file)?
> > > > Apologies if the answer is blatantly obvious to people more experienced in this area.
> > > > 
> > > > Thanks,
> > > > 
> > > > Kristof
> > > Thanks for taking a look. This is part of lowering because I want to re-use the code for detecting the overflow arithmetic nodes in getAArch64XALUOOp(). That helper function is used to recognize the patterns in a few other places like LowerBR_CC and LowerSelect for example. If we leave this combine until later we can't re-use that as the pattern is destroyed.
> > > 
> > > I'll improve the comment to better explain the transformation.
> > Thanks Amara, that makes sense to me.
> > Now that I've looked at getAArch64XALUOOp() and where it's used, I couldn't help but notice that almost everywhere it's used, the same boiler-plate kind-of code is present around it:
> > 
> > ```
> > if (Sel.getResNo() == 1 &&
> >       (Opc == ISD::SADDO || Opc == ISD::UADDO || Opc == ISD::SSUBO ||
> >        Opc == ISD::USUBO || Opc == ISD::SMULO || Opc == ISD::UMULO) && ...) {
> >       // Only lower legal XALUO ops.
> >     if (!DAG.getTargetLoweringInfo().isTypeLegal(LHS->getValueType(0)))
> >       return SDValue();
> > ...
> >     AArch64CC::CondCode OFCC;
> >     SDValue Value, Overflow;
> >     std::tie(Value, Overflow) = getAArch64XALUOOp(OFCC, CCVal.getValue(0), DAG);
> >     SDValue CCVal = DAG.getConstant(OFCC, DL, MVT::i32);
> > ...
> >   return DAG.getNode(AArch64ISD::CSEL, DL, Op.getValueType(), TVal, FVal,
> >                    CCVal, Overflow);
> > }
> > 
> > ```
> > 
> > Couldn't that code be factored out somehow instead of having it copy pasted a few times?
> > I think that could improve the readability of the code and result in the advantages that the DRY-principle brings.
> > 
> Yeah, I think this is a borderline case. There's not a way I can see that can factor things out in a nice way due to the differences in the initial if-condition as well as how the AArch64 CC is mutated in two of the cases (while maintaining readability). I can split out the Opc checks though as they're the same in all cases?
Right.
I hope factoring out something like

```
bool matchesLegalSuAddSubMulWithOverflow(SDValue Op, SelectionDAG &DAG)
{
  unsigned Opc = Op.getOpcode();
  if (Op.getResNo() == 1 &&
      (Opc == ISD::SADDO || Opc == ISD::UADDO || Opc == ISD::SSUBO ||
       Opc == ISD::USUBO || Opc == ISD::SMULO || Opc == ISD::UMULO))
    if (DAG.getTargetLoweringInfo().isTypeLegal(Op->getValueType(0)))
    return true;
  return false;
}
```
and using that would already reduce the copy paste a lot; and the actual logic of the code might become a bit more simple and obvious?
But it's debatable, so I'll leave the decision on whether to factor this out or not up to you.

Repository:
  rL LLVM

https://reviews.llvm.org/D38160