[PATCH] D71215: [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.

Wed Dec 11 17:43:44 PST 2019

efriedma added inline comments.

================
Comment at: llvm/lib/Analysis/Loads.cpp:144
+  // For unsized types or scalable vectors we don't know exactly how many bytes
+  // are dereferenceable, so bail out.
+  if (!Ty->isSized() || (Ty->isVectorTy() && Ty->getVectorIsScalable()))
----------------
"how many bytes are dereferenced".

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1326
+    Base = N;
+    OffImm = CurDAG->getTargetConstant(0, dl, MVT::i64);
+    return true;
----------------
sdesmalen wrote:
> efriedma wrote:
> > This is sort of weird for a method named "SelectAddrModeFrameIndexSVE"; should it not just fail?
> Agreed, that should not have been there. Fixed.
I'm not sure how you're proving that "N" is a FrameIndexSDNode here?

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2237
+    // Width = mbytes * elements
+    Scale = 16;
+    Width = SVEMaxBytesPerVector;
----------------
sdesmalen wrote:
> efriedma wrote:
> > This seems sort of confusing. "Scale" here is implicitly multiplied by vl, and there's isn't any way for the caller to tell except by checking the opcode.
> I'm not sure if is an actual issue in practice though. Are you suggesting to make Scale a `TypeSize` instead of an `unsigned`?
Yes, that would force the callers to explicitly handle scalable types.  It looks like some of them don't.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1178
+
+  defm Pat_ST1B        : unpred_store<nxv16i8, ST1B_IMM, PTRUE_B>;
+  defm Pat_ST1H        : unpred_store<nxv8i16, ST1H_IMM, PTRUE_H>;
----------------
sdesmalen wrote:
> efriedma wrote:
> > Should we always use PTRUE_B, even for non-byte element sizes, to encourage CSE?
> > 
> > Should we prefer to use ldr/str where legal, to take advantage of the larger immediate offset?
> > Should we always use PTRUE_B, even for non-byte element sizes, to encourage CSE?
> Our experience is that vectorized loops have most predicates CSEd anyway. For a loop that operates on two lanes, often a predicate is already available and there is no need to introduce an extra `ptrue_b`. If a loop using floats is vectorized with VF=2, we don't want operations on `<vscale x 2 x float>` to use `ptrue.b` because that would enable operations on all (vscale x) 4 lanes, which may not be valid.
> 
> > Should we prefer to use ldr/str where legal, to take advantage of the larger immediate offset?
> That would not be endian safe, hence the preference to use ST1 (note that the order is dictated by the AAPCS for when passing the vectors by reference). This case of saving/restoring to/from the stack like this is pretty rare. Normal spills and fills will indeed use the STR/LDR instructions. And normal load/store vector instructions that are not storing to a local will likely use other addressing modes like reg+reg.
Okay, that makes sense.  For the CSE thing, we could maybe add an optimization pass after isel if it's necessary.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71215/new/

https://reviews.llvm.org/D71215