[PATCH] D106028: [AArch64][SelectionDAG] Add legalization for widening LOAD/MLOAD.

Tue Aug 3 13:40:26 PDT 2021

efriedma added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:4881-4883
+    EVT ConcatOpVT = EVT::getVectorVT(
+        *DAG.getContext(), NOutVT.getVectorElementType(),
+        N->getOperand(0).getValueType().getVectorElementCount());
----------------
sdesmalen wrote:
> nit: Is/would this be the same as:
> 
>   SDValue Op0 = N->getOperand(0);
>   TLI.getTypeToTransformTo(*DAG.getContext(), Op0.getValueType());
> 
> ?
No, it's not the same.

We need to extend the element VT to the element VT of the result type.  For SVE, the resulting vector type is usually promotable.  If it is, we deal with it in PromoteIntOp_CONCAT_VECTORS.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:4048
+
+  // FIXME: Figure out how to replace constant "2".
+  if (VT.isScalableVector() && VT.getVectorMinNumElements() % 2 != 0) {
----------------
sdesmalen wrote:
> efriedma wrote:
> > sdesmalen wrote:
> > > Do we also want to do this for other element-counts that are not a power of 2? (e.g. `<vscale x 6 x i8>`)
> > > 
> > > Was this code intended to work for `<vscale x 1 x eltty>` ? (I tried that a few days ago with this patch and ran into some failures. I didn't really investigate further though)
> > Should work for 1, I guess?  Not sure I actually tried it.  But note we only support a very limited set of operations.  See also D105591.
> > 
> > For `<vscale x 6 x i8>`, we can use a different sequence: generate extloads for `<vscale x 2 x i64>` and `<vscale x 4 x i32>`, and merge them together.  GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.
> > 
> > I'll add more testcases, in any case.
> Is the FIXME still relevant?
> 
> > For <vscale x 6 x i8>, we can use a different sequence: generate extloads for <vscale x 2 x i64> and <vscale x 4 x i32>, and merge them together. GenWidenVectorLoads should handle that in theory; not sure how well it works in practice.
> > I'll add more testcases, in any case.
> Nice. From what I can see from the test you've added, <vscale x 6 x i16> is handled correctly by GenWidenVectorLoads.
Yes, it's still an issue.  The constant "2" is specific to SVE.

The question we need to figure out here is, will GenWidenVectorLoads succeed (and be profitable)?  For SVE, it should succeed for any multiple of 2, because we can use 64-bit extending loads.  Not sure about profitability for SVE; the generated code is a bit messy, but the messy bits are all invariant (and probably less messy once we start using whilelo optimally).

Other vector instruction sets might have different restrictions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106028/new/

https://reviews.llvm.org/D106028