[PATCH] D71215: [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.

Wed Jan 22 13:55:03 PST 2020

efriedma added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1267
+  defm Pat_Load_P4  : unpred_load_predicate<nxv4i1, LDR_PXI>;
+  defm Pat_Load_P2  : unpred_load_predicate<nxv2i1, LDR_PXI>;
 }
----------------
sdesmalen wrote:
> sdesmalen wrote:
> > efriedma wrote:
> > > nxv2i1 has the same memory layout as nxv16i1?  I guess that makes sense given the available instructions.  We might need to modify the datalayout to make that work properly; I think, without any explicit guidance from the layout string, it will assume a nxv2i1 load reads "vscale" bytes, not "vscale * 2" bytes.
> > > 
> > > Not something to change in this patch, of course.
> > DataLayout assumes that each boolean has a memory size of `i8` as each predicate needs to be individually addressable, which leads to `storesize(<vscale x 2 x i1>) == storesize(<vscale x 2 x i8>)`.
> > This also means that alloca's of predicates may allocate too much stack space depending on the number of elements. The generated code is correct because all the offsets scale accordingly; you can see this for example in `spill_nxv16i1` and `spill_nxv2i1`. The former (nxv16i1) allocates the sizeof two nxv16i8 vectors and loads the second predicate from offset `8 [* sizeof(predicate)]`, where the latter allocates the sizeof one nxv16i8 vector, and loads the second predicate from offset `2 [* sizeof(predicate)]`. (`sizeof(predicate) = (vscale * 2 bytes)`)
> > This is different from spills introduced by e.g. the register allocator, where LLVM allocates space for the size of an (otherwise opaque) predicate register, set to 2 bytes.
> I should probably also point out that there is no other interface for users to read/write these predicates other than through `svbool_t`, which is an opaque type, so I don't think there is any need to expand the store of `nxv16i1` to a store of `nxv16i8`.
> DataLayout assumes that each boolean has a memory size of i8 as each predicate needs to be individually addressable

The whole area is still messy, unfortunately.

Like I stated before, the "store size" for vectors assumes the bits are tightly packed.  For non-scalable vectors, SelectionDAG legalization assumes the bits are tightly packed. (I think we fixed all the legalization routines to be consistent with this.)  And for AVX-512, loads and stores of `<16 x i1>` etc. are lowered to bit-packed operations (kmovw).

I just did some quick tests, though, and unfortunately, it looks like the alignment (and therefore the allocation size) is messed up.  The alignment of vectors is currently based on the alignment of the element type, not the size of the vector, so it's much larger than the store size for `<N x i1>`.  Unless the store size is exactly 64 or 128 bits wide, in which case the alignment is 64/128 bits respectively.

Probably someone needs to spend more time in this area at some point.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71215/new/

https://reviews.llvm.org/D71215