[PATCH] D58015: [SelectionDAG][AArch64] Legalize VECREDUCE

Fri Mar 1 15:12:18 PST 2019

nikic added inline comments.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:3972
+  case ISD::VECREDUCE_FMIN:
+    Results.push_back(TLI.expandVecReduce(Node, DAG));
+    break;
----------------
sdesmalen wrote:
> Although the code in SelectionDAGLegalize::LegalizeOp() will fallthrough and get into ConvertNodeToLibcall(), should this conceptually not be done in SelectionDAGLegalize::ExpandNode() ? It is not creating libcalls, but rather expanding the operation into scalar operations on the vector's elements.
Thanks for catching this, I didn't intend to put it in here. I scrolled too far while trying to find the end of the switch :)

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1538
+  if (EltVT.bitsGT(VT))
+    VT = EltVT;
+  return DAG.getNode(N->getOpcode(), SDLoc(N), VT, Op);
----------------
sdesmalen wrote:
> This promotes the result type of the operation as well as the operand. As far as I understand it, the result type is legal at this point. If bits(EltVT) > bits(VT), should this promote the operation to work on a bigger type, and finally truncate the result?
Right, this was also promoting the result type (if necessary), but missed the truncate. I've added it now.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:3245
+  // both halves independently.
+  SDValue Res = TLI.expandVecReduce(N, DAG);
+  SplitInteger(Res, Lo, Hi);
----------------
sdesmalen wrote:
> Do we need to expand the entire operation here if only the result needs expansion (and not the input vector itself)?
I think so. Expanding the result without expanding the operand would need something like a hi/lo reduce, which can compute the hi/lo half of the reduction result.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:4264
+  unsigned OldElts = N->getOperand(0).getValueType().getVectorNumElements();
+  return TLI.expandVecReduce(Reduce.getNode(), DAG, OldElts);
+}
----------------
sdesmalen wrote:
> Do we want to expand the operation here, or just widen the input operand (inserted with zeros as you suggest in the comment)?
> If a target just wants the operation to be widened, but does have support for reductions on the widened vectors, we probably prefer that over the default lowering that expandVecReduce provides.
I agree that this is the better approach in general and have implemented it now. It's slightly more complicated than just padding with zeros, as the neutral element is different for the different reductions.

Doing this gives some pretty bad code for cases like `v1i8`, where we end up inserting 7 zero elements before performing the reduction. To avoid this, I've added a DAGCombine to convert a VECREDUCE of one element into an EXTRACT_VECTOR_ELT, but generating good code for cases like `v9i8` would need more work.

================
Comment at: lib/CodeGen/SelectionDAG/TargetLowering.cpp:5640
+    SDNode *Node, SelectionDAG &DAG, unsigned Count) const {
+  SDLoc dl(Node);
+  bool NoNaN = Node->getFlags().hasNoNaNs();
----------------
sdesmalen wrote:
> Since all the VECREDUCE_<OP> are non-strict, can we split the vector in a low- and high half, and do the reduction on both halves? This would eventually boil down to using ScalarizeVecOp_VECREDUCE that transforms <1 x i32> => i32, and would allow more parallelised execution of the reduction. Is this maybe something you already tried?
This mostly already happens as part of the SplitVecOp legalization. Mostly, because it only goes down to a legal type. I've used a naive implementation here, because this expansion code is intended only for legalization purposes, not to produce particularly good code for unsupported general reductions. If a target doesn't support a reduction, it should request it to be expanded prior to SDAG construction, which will produce a shuffle reduction if possible. AArch64 currently always opts out of expansion, even though it does not support all reductions -- I plan to change that in a separate patch.

================
Comment at: lib/CodeGen/TargetLoweringBase.cpp:670
+    // Vector reduction default to expand.
+    setOperationAction(ISD::VECREDUCE_FADD, VT, Expand);
+    setOperationAction(ISD::VECREDUCE_FMUL, VT, Expand);
----------------
sdesmalen wrote:
> This patch has AArch64 tests for VECREDUCE_ADD,  VECREDUCE_FMAX and VECREDUCE_UMAX, which all have custom lowering but actually adds lots of generic support for VECREDUCE operations that don't have custom lowering. I think we'll also want to add some tests for those cases.
I've added two more tests for AND and FADD. As mentioned in another comment, I plan to make these go through the pre-SDAG expansion code in the future though.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58015/new/

https://reviews.llvm.org/D58015