[PATCH] D71074: [Aarch64][SVE] Add intrinsics for scatter stores

Thu Dec 12 06:42:37 PST 2019

andwar marked 6 inline comments as done.
andwar added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:990
+      : Intrinsic<
+            [],
+            [
----------------
sdesmalen wrote:
> nit: The formatting for these intrinsics is odd.
Fixed - I try to use clang-format (which IMHO does a good job here), but sadly the result is inconsistent with the rest of the file. 

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11851
+                                        unsigned Opcode) {
+  EVT SrcVT = N->getOperand(2)->getValueType(0);
+  assert(SrcVT.isScalableVector() &&
----------------
sdesmalen wrote:
> It is unclear what `N->getOperand(2)` is:
> ```EVT SrcVT = N->getOperand(2)->getValueType(0);
> const SDValue Dst = N->getOperand(2);```
That's a copy & paste leftover, sorry! I've renamed `Dst` as well as few other variables for consistency.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-scaled-offsets.ll:157
+                                                                     double* %base,
+                                                                     <vscale x 2 x i64> %indices)
+  ret void
----------------
sdesmalen wrote:
> efriedma wrote:
> > andwar wrote:
> > > efriedma wrote:
> > > > Why are the offsets here using `<vscale x 2 x i64>`?  `<vscale x 2 x i32>` seems more natural.
> > > * Although these _are_ 32-bit wide offsets, they are stored in 64-bit registers (and are implicitly sign or zero-extended) 
> > > * By using `<vscale x 2 x i64>` we can rely on `LLVMScalarOrSameVectorWidth` to capture error (i.e. `incorrect argument type`)
> > > * This is consistent with the implementation for gather loads
> > > 
> > > Do you see any disadvantages of using `<vscale x 2 x i64>` instead of `<vscale x 2 x i32>`?
> > The biggest advantage of using `<vscale x 2 x i32>` is that it's obvious the high bits are unused, so you don't end up with extra mask/64-bit multiply/etc.  That said, we can recover this in other ways.
> > 
> > I'm mostly concerned that we should try to be consistent here.  We use `<vscale x 2 x i32>` for the stored value in `llvm.aarch64.sve.st1.scatter.index.nxv2i32`.
> We envisioned the type for the indices generated by Clang to be `<vscale x 2 x i64>`, since the ACLE does not have any knowledge of unpacked types (like `<vscale x 2 x i32>`). The uxtw in the intrinsic would make it clear that the indices will need to be zero-extended from from word to double-word.
> 
> However, I agree this isn't very clear when reading/writing the intrinsics in IR and it should be simple to change the patch to take `<vscale x 2 x i32>` indices instead and generating a `truncate` in Clang. If the `scatterSt1Combine` then ANY_EXTENDs the data, the truncate will likely fall away.
> 
> If we change this, we should be consistent and update the loads as well.
Thank you both, I'll update accordingly. I suggest that gather loads are updated in a separate patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71074/new/

https://reviews.llvm.org/D71074