[PATCH] D70542: [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsets
Andrzej Warzynski via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 25 07:43:13 PST 2019
andwar added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:11910-11913
+ case Intrinsic::aarch64_sve_ld1_gather:
+ return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1);
+ case Intrinsic::aarch64_sve_ld1_gather_index:
+ return performLD1GatherCombine(N, DAG, AArch64ISD::GLD1_SCALED);
----------------
fpetrogalli wrote:
> Nit: the style of the file seems to be more of having a single invocation of the function shared by the two N->getOpcode(), with the ISD node selection inside the function.
>
> ```
> static SDValue performLD1GatherCombine(SDNode *N, SelectionDAG &DAG,
> ) {
> unsigned Opcode;
> switch(N->getOpcode()) {
> default:
> llvm_unreachable(); // <- this would guarantee that the function is not invoked on something that it cannot handle yet?
> case case Intrinsic::aarch64_sve_ld1_gather:
> Opcode = AArch64ISD::GLD1;
> break;
> case ...
> }
> EVT RetVT = N->getValueType(0);
> assert(RetVT.isScalableVector() &&
> "Gather loads are only possible for SVE vectors");
>
> }
>
>
> //...
> case Intrinsic::aarch64_sve_ld1_gather:
> case Intrinsic::aarch64_sve_ld1_gather_index:
> return performLD1GatherCombine(N, DAG);
> // ...
> ```
Good point! However, that would lead to 2 separate switch statements with similar cases (i.e. code duplication). In other words, either way it won't be ideal. I would like to keep the current implementation for now.
================
Comment at: llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h:654
+// <n x (M*P) x t> vector (such as index 1) are undefined.
+const unsigned SVEBitsPerBlock = 128;
+} // end namespace AArch64
----------------
fpetrogalli wrote:
> `static constexpr unsigned ` should make sure that we don't run into duplicate variable declaration if the header get's included somewhere else (admittedly, an unlikely situation in this specific case).
Good point, updated!
================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-gather-loads-64bit-offset.ll:15
+; CHECK-NEXT: ret
+ %load = call <vscale x 2 x i8> @llvm.aarch64.sve.ld1.gather.nxv2i8(<vscale x 2 x i1> %pg,
+ i8* %base,
----------------
efriedma wrote:
> This doesn't match the way the corresponding C intrinsics are defined in the ACLE spec. Are you intentionally diverging here?
Thank you for taking a look @efriedma! Yes, this is intentional. Well spotted!
If we used `<nxv2i64>` as the return type here then we wouldn't need `zext`. However, we'd need some other way to differentiate between `ld1b` and `ld1sb` later. That would basically double the number of intrinsics. We felt that leaving the `sign/zero` here (to be folded using a simple DAGCombine) is a good compromise. I will be upstreaming that code shortly.
I should also point out the we have more intrinsics like this to upstream - this patch covers only 2 addressing modes.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D70542/new/
https://reviews.llvm.org/D70542
More information about the llvm-commits
mailing list