[PATCH] D89576: [SVE][CodeGen] Lower scalable masked scatters

Thu Oct 29 07:11:40 PDT 2020

sdesmalen added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:9337-9338
+
+  if (Index.getOpcode() == ISD::ZERO_EXTEND &&
+      TLI.isLegalMaskedGSIndexType(Op.getValueType())) {
+    Index = Op;
----------------
This condition isn't correct, I think you want to always set the index type to unsigned when the index is zero-extended. Only the setting of `Index` to be the source operand of the zero-extend needs to be guarded by `TLI.isLegalMaskedGSIndexType`. (same for the sign-extend case below).

================
Comment at: llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7382
          "Vector width mismatch between mask and data");
-  assert(N->getIndex().getValueType().getVectorNumElements() >=
-             N->getValue().getValueType().getVectorNumElements() &&
+  assert(ElementCount::isKnownGE(
+           N->getIndex().getValueType().getVectorElementCount(),
----------------
Can you also add a check that the scalable flags match, i.e. that

  N->getIndex().getValueType().getVectorElementCount().isScalable() == 
  N->getValue().getValueType().getVectorElementCount().isScalable()

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3688
+bool AArch64TargetLowering::isLegalMaskedGSIndexType(EVT VT) const {
+ if (VT == MVT::nxv2i32 || VT == MVT::nxv4i32 || VT == MVT::nxv2i64)
+   return true;
----------------
Is that right though? nxv2i32 is not a legal type.

I feel like the interface is actually asking the wrong question. I think the question the DAGCombiner wants to ask is: "should I remove the sign/zero-extend for the gather/scatter index". For SVE, that is true for indices that are extended like:
  nxv4i32 -> nxv4i64
  nxv8i32 -> nxv8i64
  ..
because the instruction can do this. It adds little value to add this for:
  nxv2i32
because the type itself needs to be promoted to nxv2i64 anyway.

If it asks the right question and you make it work for nxv8i32, then codegen for the following test would much improve:
```define void @masked_scatter_nxv8f32_zext(<vscale x 8 x float> %data, float* %base, <vscale x 8 x i32> %indexes, <vscale x 8 x i1> %masks) nounwind #0 {
  %ext = zext <vscale x 8 x i32> %indexes to <vscale x 8 x i64>
  %ptrs = getelementptr float, float* %base, <vscale x 8 x i64> %ext
  call void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float> %data, <vscale x 8 x float*> %ptrs, i32 0, <vscale x 8 x i1> %masks)
  ret void
}
declare void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float*>, i32, <vscale x 8 x i1>)
```
(for which tests are currently missing in the patch) 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89576/new/

https://reviews.llvm.org/D89576