[PATCH] D89382: [SVE][CodeGen] Lower scalable integer vector reductions

Fri Oct 23 05:12:45 PDT 2020

paulwalker-arm added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3324
     const unsigned EltBitWidth = VecVT.getScalarSizeInBits();
-    const unsigned NumSrcElts = VecVT.getVectorNumElements();
+    const unsigned NumSrcElts = VecVT.getVectorMinNumElements();

----------------
I think you want to just reject extracts from scalable vectors here.  There's code at the top of this function which normally does the job, but it relies on the result type and so EXTRACT_VECTOR-ELT slips through.  FYI: Eli has the same fix in D87651 so I'd match that and crown whoever lands first the winner.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9787
     case ISD::VECREDUCE_AND:
-      return LowerFixedLengthReductionToSVE(AArch64ISD::ANDV_PRED, Op, DAG);
+      if (SrcVT.getVectorElementType() == MVT::i1)
+        return LowerPredReductionToSVE(Op, DAG);
----------------
Given LowerPredReductionToSVE has it's own switch block can this be done universally regardless of the opcode. i.e.
```
if (SrcVT.isScalableVector() ||
      useSVEForFixedLengthVectorVT(SrcVT, OverrideNEON)) {

  if (SrcVT.getVectorElementType() == MVT::i1)
    return LowerPredReductionToSVE(Op, DAG);

  switch (Op.getOpcode()) {
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16325
+
   SDValue Pg = getPredicateForVector(DAG, DL, SrcVT);

----------------
As you're moving this, can you put it immediately before the `SDValue Rdx` line so it has some company?

================
Comment at: llvm/test/CodeGen/AArch64/vecreduce-umax-legalization.ll:145-152
+; CHECK-NEXT:    // kill: def $d0 killed $d0 def $q0
+; CHECK-NEXT:    umov w10, v0.h[1]
+; CHECK-NEXT:    umov w11, v0.h[0]
+; CHECK-NEXT:    umov w9, v0.h[2]
+; CHECK-NEXT:    orr w10, w11, w10
+; CHECK-NEXT:    umov w8, v0.h[3]
+; CHECK-NEXT:    orr w9, w10, w9
----------------
Presumably this is down to the canonicalisation?  I'm not sure there's any immediate performance concerns here, and I stand behind canonicalisation being the correct solution.  I guess we can see if anybody disagrees with this.

The worst that can happen is more custom lowering for NEON is required but because the existing lowering doesn't do anything special for OR, and MIN/MAX operations on i1 is weird anyways, I'd rather not do that in this patch unless we really have to.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89382/new/

https://reviews.llvm.org/D89382