[PATCH] D58015: [SelectionDAG][AArch64] Legalize VECREDUCE

Tue Feb 26 04:04:57 PST 2019

sdesmalen added a comment.

Thanks for this patch @nikic! Please find some comments inline.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:3972
+  case ISD::VECREDUCE_FMIN:
+    Results.push_back(TLI.expandVecReduce(Node, DAG));
+    break;
----------------
Although the code in SelectionDAGLegalize::LegalizeOp() will fallthrough and get into ConvertNodeToLibcall(), should this conceptually not be done in SelectionDAGLegalize::ExpandNode() ? It is not creating libcalls, but rather expanding the operation into scalar operations on the vector's elements.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1538
+  if (EltVT.bitsGT(VT))
+    VT = EltVT;
+  return DAG.getNode(N->getOpcode(), SDLoc(N), VT, Op);
----------------
This promotes the result type of the operation as well as the operand. As far as I understand it, the result type is legal at this point. If bits(EltVT) > bits(VT), should this promote the operation to work on a bigger type, and finally truncate the result?

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:3245
+  // both halves independently.
+  SDValue Res = TLI.expandVecReduce(N, DAG);
+  SplitInteger(Res, Lo, Hi);
----------------
Do we need to expand the entire operation here if only the result needs expansion (and not the input vector itself)?

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:4264
+  unsigned OldElts = N->getOperand(0).getValueType().getVectorNumElements();
+  return TLI.expandVecReduce(Reduce.getNode(), DAG, OldElts);
+}
----------------
Do we want to expand the operation here, or just widen the input operand (inserted with zeros as you suggest in the comment)?
If a target just wants the operation to be widened, but does have support for reductions on the widened vectors, we probably prefer that over the default lowering that expandVecReduce provides.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:5640
+    SDNode *Node, SelectionDAG &DAG, unsigned Count) const {
+  SDLoc dl(Node);
+  bool NoNaN = Node->getFlags().hasNoNaNs();
----------------
Since all the VECREDUCE_<OP> are non-strict, can we split the vector in a low- and high half, and do the reduction on both halves? This would eventually boil down to using ScalarizeVecOp_VECREDUCE that transforms <1 x i32> => i32, and would allow more parallelised execution of the reduction. Is this maybe something you already tried?

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:5674
+  SDValue Res = Ops[0];
+  for (unsigned i = 1; i < Count; i++) {
+    Res = DAG.getNode(BaseOpcode, dl, EltVT, Res, Ops[i], Node->getFlags());
----------------
nit: unnecessary curly braces

================
Comment at: lib/CodeGen/TargetLoweringBase.cpp:670
+    // Vector reduction default to expand.
+    setOperationAction(ISD::VECREDUCE_FADD, VT, Expand);
+    setOperationAction(ISD::VECREDUCE_FMUL, VT, Expand);
----------------
This patch has AArch64 tests for VECREDUCE_ADD,  VECREDUCE_FMAX and VECREDUCE_UMAX, which all have custom lowering but actually adds lots of generic support for VECREDUCE operations that don't have custom lowering. I think we'll also want to add some tests for those cases.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58015/new/

https://reviews.llvm.org/D58015