[PATCH] D71074: [Aarch64][SVE] Add intrinsics for scatter stores

Mon Dec 9 14:17:36 PST 2019

efriedma added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11957
+  if (Dst.getValueType().isFloatingPoint())
+    DstNew = DAG.getNode(ISD::BITCAST, DL, HwRetVt, Dst);
+  else
----------------
andwar wrote:
> efriedma wrote:
> > Does this work correctly for nxv2f32?  It looks like we're missing test coverage, also.
> That's a good catch, thanks! For scatter stores ACLE only supports _packed_  vectors of single or double precision floats. I will add a check for this.
> 
> I also missed this when implementing `performLD1GatherLoad` (which is very similar to this method, but it felt worthwhile to keep them seperate). When this patch is approved I'll prepare an update for `performLD1GatherLoad`.
We could support this in the backend even if clang can't generate it...  but I guess it's fine to leave it out.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-scaled-offsets.ll:157
+                                                                     double* %base,
+                                                                     <vscale x 2 x i64> %indices)
+  ret void
----------------
andwar wrote:
> efriedma wrote:
> > Why are the offsets here using `<vscale x 2 x i64>`?  `<vscale x 2 x i32>` seems more natural.
> * Although these _are_ 32-bit wide offsets, they are stored in 64-bit registers (and are implicitly sign or zero-extended) 
> * By using `<vscale x 2 x i64>` we can rely on `LLVMScalarOrSameVectorWidth` to capture error (i.e. `incorrect argument type`)
> * This is consistent with the implementation for gather loads
> 
> Do you see any disadvantages of using `<vscale x 2 x i64>` instead of `<vscale x 2 x i32>`?
The biggest advantage of using `<vscale x 2 x i32>` is that it's obvious the high bits are unused, so you don't end up with extra mask/64-bit multiply/etc.  That said, we can recover this in other ways.

I'm mostly concerned that we should try to be consistent here.  We use `<vscale x 2 x i32>` for the stored value in `llvm.aarch64.sve.st1.scatter.index.nxv2i32`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71074/new/

https://reviews.llvm.org/D71074