[PATCH] D119338: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Thu Feb 10 09:14:42 PST 2022

sdesmalen added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:317
+  %alloca1 = alloca <vscale x 2 x double>, i32 13
+  %alloca2 = alloca <vscale x 4 x i32>, i32 2
+  %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1, i64 9, i64 0
----------------
david-arm wrote:
> sdesmalen wrote:
> > alloca2 is unused? (also true for other cases)
> This is something I asked @kmclaughlin to do because it's the only way to expose some of the code changes in this patch. All the tests ending _valid_imm do this for that reason. If you look at `isAArch64FrameOffsetLegal` we return a StackOffset, which is always zero for all tests in this file except ones like this. Having a non-zero StackOffset helped to ensure we were calculating the remainder/offset correctly using the Scale property set in `getMemOpInfo`. We can remove the test, but I'm worried we're not fully testing the changes that's all.
> 
> For example, in `ld3b_f32_valid_imm` you'll notice the addvl just before the ld3b, which happens precisely because StackOffset is non-zero.
I assumed that was what the `gep` was for. Maybe it's because of how this is written. If you write:

  %alloca1 = alloca <vscale x 64 x double>, align 4                                              
  %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
  %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 28, i64 0 
  %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)

Then that results in:

  ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [sp, #28, mul vl]

Whereas

  %alloca1 = alloca <vscale x 64 x double>, align 4                                              
  %alloca1.bc = bitcast <vscale x 64 x double>* %alloca1 to <vscale x 2 x double>*               
  %base = getelementptr <vscale x 2 x double>, <vscale x 2 x double>* %alloca1.bc, i64 32, i64 0 
  %ld4 = call <vscale x 8 x double> @llvm.aarch64.sve.ld4.nxv8f64(<vscale x 2 x i1> %pg, double* %base)

Results in:

  <x8 = calculations for sp + 28 * sizeof(VL)>
  ld4d    { z0.d, z1.d, z2.d, z3.d }, p0/z, [x8]

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119338/new/

https://reviews.llvm.org/D119338