[PATCH] D71074: [Aarch64][SVE] Add intrinsics for scatter stores

Mon Dec 9 05:35:59 PST 2019

andwar marked 5 inline comments as done.
andwar added a subscriber: eli.friedman.
andwar added a comment.

Thank you for taking a look @eli.friedman!

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11929
+      MaxVT.getSizeInBits().getKnownMinSize())
+    return SDValue();
+
----------------
efriedma wrote:
> Is the getKnownMinSize comparison trying to reject illegal types?  Or something else?
It's meant to reject illegal types that won't fit into an SVE register (i.e. `nxv8i32`). I've added a comment to clarify this.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11957
+  if (Dst.getValueType().isFloatingPoint())
+    DstNew = DAG.getNode(ISD::BITCAST, DL, HwRetVt, Dst);
+  else
----------------
efriedma wrote:
> Does this work correctly for nxv2f32?  It looks like we're missing test coverage, also.
That's a good catch, thanks! For scatter stores ACLE only supports _packed_  vectors of single or double precision floats. I will add a check for this.

I also missed this when implementing `performLD1GatherLoad` (which is very similar to this method, but it felt worthwhile to keep them seperate). When this patch is approved I'll prepare an update for `performLD1GatherLoad`.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-scaled-offsets.ll:157
+                                                                     double* %base,
+                                                                     <vscale x 2 x i64> %indices)
+  ret void
----------------
efriedma wrote:
> Why are the offsets here using `<vscale x 2 x i64>`?  `<vscale x 2 x i32>` seems more natural.
* Although these _are_ 32-bit wide offsets, they are stored in 64-bit registers (and are implicitly sign or zero-extended) 
* By using `<vscale x 2 x i64>` we can rely on `LLVMScalarOrSameVectorWidth` to capture error (i.e. `incorrect argument type`)
* This is consistent with the implementation for gather loads

Do you see any disadvantages of using `<vscale x 2 x i64>` instead of `<vscale x 2 x i32>`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71074/new/

https://reviews.llvm.org/D71074