[PATCH] D75580: [llvm][CodeGen][SVE] Implement IR intrinsics for gather prefetch.

Thu Mar 12 15:45:25 PDT 2020

fpetrogalli marked 21 inline comments as done.
fpetrogalli added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:1263
+                  llvm_anyvector_ty, // Offsets
+                  LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate
+                  llvm_i32_ty // Prfop
----------------
sdesmalen wrote:
> I realise this only now, but having the predicate as the third operand is different from the gather loads. Can you make this to be the first operand, rather than the third, to streamline their definition with the gather loads?
Good point. I have redefined the class as 

```
class SVE_gather_prf_scalar_base_vector_offset_scaled
    : Intrinsic<[],
                [
                  LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, // Predicate
                  llvm_ptr_ty, // Base address
                  llvm_anyvector_ty, // Offsets
                  llvm_i32_ty // Prfop
		],
                [IntrInaccessibleMemOrArgMemOnly, NoCapture<0>, ImmArg<3>]>;
```

and consequently redefined the intrinsics from, say, `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(i8* %base, <vscale x 4 x i32> %offset, <vscale x 4 x i1> %Pg, i32 %prfop)`, to `declare void @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nx4vi32(<vscale x 4 x i1> %Pg, i8* %base, <vscale x 4 x i32> %offset, i32 %prfop)`, but I get `llc` runtime failures as the following:

```
Wrong types for attribute: inalloca nest noalias nocapture nonnull readnone readonly signext sret zeroext byval dereferenceable(1) dereferenceable_or_null(1)
void (<vscale x 4 x i1>, i8*, <vscale x 4 x i32>, i32)* @llvm.aarch64.sve.gather.prfb.scaled.uxtw.nxv4i32
```

I cannot figure out what I am doing wrong though... Am I right assuming that `llvm_anyvec_ty` is still the overloaded type that needs to be mentioned in the name of the intrinsic in IR? (`nxv4i32` for the example reported here).

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12914

+/// Combine the gather prefetch (scalar + vector addressing mode) when
+/// the offset vector is an unpacked 32-bit scalable vector.
----------------
andwar wrote:
> sdesmalen wrote:
> > nit: this comment is a bit confusing. It doesn't so much 'combine' anything, but it rather 'legalizes the offset'.
> This method is only needed for unpacked offset, right? Maybe worth stating that for all other cases the input node is already OK? Otherwise I read this as: 
> * do something for unpacked 32-bit offsets, in all other cases "I don't know".
> 
>  And in reality it's more like: 
> * do something for unpacked 32-bit offsets, in all other cases it's already OK
I am not sure what you want me to do here. If it helps, I have updated the comment of the method to the following:

```
/// Legalize the gather prefetch (scalar + vector addressing mode) when the
/// offset vector is an unpacked 32-bit scalable vector. The other cases (Offset
/// != nxv2i32) do not need legalization.
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12918
+                                                            SelectionDAG &DAG) {
+  const SDValue Offset = N->getOperand(3);
+
----------------
sdesmalen wrote:
> nit: unnecessary use of `const` (same for other places in this patch).
Goodbye, beloved `const`. :)

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12965
+
+/// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>`
+/// into a node that uses
----------------
andwar wrote:
> Hm, it's a bit more like: Check whether we need to remap? Yes - remap, No - all good! The way it's written now suggests that this method will always remap.
I have updated the comment of the method to the following, let me know if it is still unclear.

```
/// Combines a node carrying the intrinsic `aarch64_sve_gather_prf<T>` into a
/// node that uses `aarch64_sve_gather_prf<T>_scaled_uxtw` when the scalar
/// offset passed to `aarch64_sve_gather_prf<T>` is not a valid immediate for
/// the sve gather prefetch instruction with vector plus immediate addressing
/// mode.
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12973
+static SDValue performSVEPrefetchVecBaseImmOffCombine(
+    SDNode *N, SelectionDAG &DAG, unsigned NewIID, unsigned ScalarSizeInBytes) {
+  // No need to combine the node if the immediate is valid...
----------------
andwar wrote:
> Broken indentation.
Hum, CI would have caught it. I usually run `git clang-format` to my patches before submitting them. I'll double check if I haven't miss this one.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:12984
+  SDValue Ops[] = {
+      N->getOperand(0),                      // Chain
+      DAG.getConstant(NewIID, DL, MVT::i64), // Intrinsic ID
----------------
sdesmalen wrote:
> sdesmalen wrote:
> > Given that you've gone the route of doing this in ISelLowering rather than having ComplexPatterns, it's probably better to create ISD nodes rather than passing the intrinsics around. This means we can later reuse them if there is ever a llvm.gather.prefetch, but also to streamline the implementation of prefetches with that of the LD1 gathers. It would also do away with having to pass the operands explicitly like this and thus simplify these combines.
> We discussed this offline and decided it's fine for now to use the intrinsics directly. We may revisit this later when we clean up the logic in this file for mapping the gathers/scatters/prefetches.
Thank you!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75580/new/

https://reviews.llvm.org/D75580