[PATCH] D71074: [Aarch64][SVE] Add intrinsics for scatter stores

Wed Dec 11 07:31:27 PST 2019

sdesmalen added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:990
+      : Intrinsic<
+            [],
+            [
----------------
nit: The formatting for these intrinsics is odd.

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:1445
 def int_aarch64_sve_flogb : AdvSIMD_SVE_LOGB_Intrinsic;
+
 }
----------------
nit: unrelated whitespace change.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11851
+                                        unsigned Opcode) {
+  EVT SrcVT = N->getOperand(2)->getValueType(0);
+  assert(SrcVT.isScalableVector() &&
----------------
It is unclear what `N->getOperand(2)` is:
```EVT SrcVT = N->getOperand(2)->getValueType(0);
const SDValue Dst = N->getOperand(2);```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11861
+  EVT MaxVT = llvm::MVT::getScalableVectorVT(SrcElVT, NumElements);
+  if (SrcVT.getSizeInBits().getKnownMinSize() >
+      MaxVT.getSizeInBits().getKnownMinSize())
----------------
Should we simply compare:
```SrcVT.getSizeInBits().getScalableSize() > AArch64::SVEBitsPerBlock```
?

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-scatter-stores-32bit-scaled-offsets.ll:157
+                                                                     double* %base,
+                                                                     <vscale x 2 x i64> %indices)
+  ret void
----------------
efriedma wrote:
> andwar wrote:
> > efriedma wrote:
> > > Why are the offsets here using `<vscale x 2 x i64>`?  `<vscale x 2 x i32>` seems more natural.
> > * Although these _are_ 32-bit wide offsets, they are stored in 64-bit registers (and are implicitly sign or zero-extended) 
> > * By using `<vscale x 2 x i64>` we can rely on `LLVMScalarOrSameVectorWidth` to capture error (i.e. `incorrect argument type`)
> > * This is consistent with the implementation for gather loads
> > 
> > Do you see any disadvantages of using `<vscale x 2 x i64>` instead of `<vscale x 2 x i32>`?
> The biggest advantage of using `<vscale x 2 x i32>` is that it's obvious the high bits are unused, so you don't end up with extra mask/64-bit multiply/etc.  That said, we can recover this in other ways.
> 
> I'm mostly concerned that we should try to be consistent here.  We use `<vscale x 2 x i32>` for the stored value in `llvm.aarch64.sve.st1.scatter.index.nxv2i32`.
We envisioned the type for the indices generated by Clang to be `<vscale x 2 x i64>`, since the ACLE does not have any knowledge of unpacked types (like `<vscale x 2 x i32>`). The uxtw in the intrinsic would make it clear that the indices will need to be zero-extended from from word to double-word.

However, I agree this isn't very clear when reading/writing the intrinsics in IR and it should be simple to change the patch to take `<vscale x 2 x i32>` indices instead and generating a `truncate` in Clang. If the `scatterSt1Combine` then ANY_EXTENDs the data, the truncate will likely fall away.

If we change this, we should be consistent and update the loads as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71074/new/

https://reviews.llvm.org/D71074