[PATCH] D75751: [AArch64][SVE] Implement structured load intrinsics

Thu Mar 19 05:21:54 PDT 2020

fpetrogalli added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-loads-with-extract.ll:23
+                                                               <vscale x 16 x i8>* %addr)
+  %v2 = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %res, i32 1)
+  ret <vscale x 16 x i8> %v2
----------------
c-rhodes wrote:
> fpetrogalli wrote:
> > I just realized... do we need these tests? I might have missed something, but it seems to me that none of the code you have added here is related to the interaction between `ld<N>` and `tuple.get`. For example, the fact that in this specific test the load is loading memory in `z31` and `z0`, in this order, seems to me more of a register allocation problem other than anything related to the fact that `%res` is coming from a `ld<N>` intrinsic.
> > 
> > I think that we can simplify these tests by removing the `ld<N>` and just writing tests that involve using the `tuple.get` on the input parameter of the test. For example, in the specific case of `@ld2b_i8_1`, I think you want to run `llc` on this code:
> > 
> > ```
> > define <vscale x 16 x i8> @ld2b_i8_1(<vscale x 32 x i8> %res) {
> > ; CHECK-LABEL: ld2b_i8_1:
> > ; CHECK: mov z0.d, z1.d
> > ; CHECK-NEXT: ret
> >   %v2 = call <vscale x 16 x i8> @llvm.aarch64.sve.tuple.get.nxv32i8(<vscale x 32 x i8> %res, i32 1)
> >   ret <vscale x 16 x i8> %v2
> > ```
> > 
> > What you would be testing here is that whatever register is assigned to the input parameter, it is shuffled around by the intrinsic to be returned by the correct output register.
> > 
> > If you agree on this approach though, I think this deserves a separate patch.
> > 
> > One last (minor) extra reason for splitting things like I suggested: if we make these tests `ld<N>`-free, we won't have to bother updating/extending them when we need to do any work around the `ld<N>`.
> > 
> I implemented these tests before we had calling convention support for passing and returning tuples of scalable vectors to/from functions (downstream) so the extract was necessary. There are already tests similar to what you suggested in D75674 covering `tuple.get` that should be sufficient (?), I can remove these if they're not adding anything.
I think you can safely remove them. As you said, they were written when we didn't have the tuple calling conventions, but now we have it, and it is tested in https://reviews.llvm.org/D75674.

Nit: it might not be a problem, but in https://reviews.llvm.org/D75674 I cannot see tests that check when a tuple like <vscale x 8 x i64> is passed to a function definition. I see tests that are getting multiple "one -register" parameters in input, building a tuple with an intrinsic and passing it to a `tuple.get` intrinsic or to a function call, but not a test like the one I extracted in which the tuple is in input to the define. I'll leave a note there and drag @sdesmalen attention to see whether we should add them.

Thanks,

Francesco

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75751/new/

https://reviews.llvm.org/D75751