[PATCH] D75751: [AArch64][SVE] Implement structured load intrinsics

Thu Mar 19 03:11:39 PDT 2020

c-rhodes added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll:23
+                                                               <vscale x 16 x i8>* %addr)
+  %v2 = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %res, i32 1)
+  ret <vscale x 16 x i8> %v2
----------------
fpetrogalli wrote:
> I just realized... do we need these tests? I might have missed something, but it seems to me that none of the code you have added here is related to the interaction between `ld<N>` and `tuple.get`. For example, the fact that in this specific test the load is loading memory in `z31` and `z0`, in this order, seems to me more of a register allocation problem other than anything related to the fact that `%res` is coming from a `ld<N>` intrinsic.
> 
> I think that we can simplify these tests by removing the `ld<N>` and just writing tests that involve using the `tuple.get` on the input parameter of the test. For example, in the specific case of `@ld2b_i8_1`, I think you want to run `llc` on this code:
> 
> ```
> define <vscale x 16 x i8> @ld2b_i8_1(<vscale x 32 x i8> %res) {
> ; CHECK-LABEL: ld2b_i8_1:
> ; CHECK: mov z0.d, z1.d
> ; CHECK-NEXT: ret
>   %v2 = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %res, i32 1)
>   ret <vscale x 16 x i8> %v2
> ```
> 
> What you would be testing here is that whatever register is assigned to the input parameter, it is shuffled around by the intrinsic to be returned by the correct output register.
> 
> If you agree on this approach though, I think this deserves a separate patch.
> 
> One last (minor) extra reason for splitting things like I suggested: if we make these tests `ld<N>`-free, we won't have to bother updating/extending them when we need to do any work around the `ld<N>`.
> 
I implemented these tests before we had calling convention support for passing and returning tuples of scalable vectors to/from functions (downstream) so the extract was necessary. There are already tests similar to what you suggested in D75674 covering `tuple.get` that should be sufficient (?), I can remove these if they're not adding anything.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75751/new/

https://reviews.llvm.org/D75751