[PATCH] D70542: [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets

Mon Nov 25 07:43:13 PST 2019

andwar added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11910-11913
+    case Intrinsic::aarch64_sve_ld1_gather:
+      return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);
+    case Intrinsic::aarch64_sve_ld1_gather_index:
+      return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);
----------------
fpetrogalli wrote:
> Nit: the style of the file seems to be more of  having a single invocation of the function shared by the two N->getOpcode(), with the ISD node selection inside the function.
> 
> ```
> static SDValue performLD1GatherCombine(SDNode *N, SelectionDAG &DAG,
>                                        ) {
>   unsigned Opcode;
>   switch(N->getOpcode()) {
>   default:
>      llvm_unreachable();  // <- this would guarantee that the function is not invoked on something that it cannot handle yet?
>   case case Intrinsic::aarch64_sve_ld1_gather:
>      Opcode = AArch64ISD::GLD1;
>      break;
>    case ...
>   }
>   EVT RetVT = N->getValueType(0);
>   assert(RetVT.isScalableVector() &&
>          "Gather loads are only possible for SVE vectors");
> 
> }
> 
> 
> //...
>     case Intrinsic::aarch64_sve_ld1_gather:
>     case Intrinsic::aarch64_sve_ld1_gather_index:
>       return performLD1GatherCombine(N, DAG);
> // ...
> ```
Good point! However, that would lead to 2 separate switch statements with similar cases (i.e. code duplication). In other words, either way it won't be ideal. I would like to keep the current implementation for now.

================
Comment at: llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h:654
+// <n x (M*P) x t> vector (such as index 1) are undefined.
+const unsigned SVEBitsPerBlock = 128;
+} // end namespace AArch64
----------------
fpetrogalli wrote:
> `static constexpr unsigned ` should make sure that we don't run into duplicate variable declaration if the header get's included somewhere else (admittedly, an unlikely situation in this specific case).
Good point, updated!

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-offset.ll:15
+; CHECK-NEXT: ret
+  %load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
+                                                                     i8* %base,
----------------
efriedma wrote:
> This doesn't match the way the corresponding C intrinsics are defined in the ACLE spec.  Are you intentionally diverging here?
Thank you for taking a look @efriedma! Yes, this is intentional. Well spotted!

If we used `<nxv2i64>` as the return type here then we wouldn't need `zext`. However, we'd need some other way to differentiate between `ld1b` and `ld1sb` later. That would basically  double the number of intrinsics. We felt that leaving the `sign/zero` here (to be folded using a simple DAGCombine) is a good compromise. I will be upstreaming that code shortly.

I should also point out the we have more intrinsics like this to upstream - this patch covers only 2 addressing modes.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70542/new/

https://reviews.llvm.org/D70542