[PATCH] D119338: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Wed Feb 9 07:46:50 PST 2022

david-arm added a comment.

Hi @kmclaughlin, this looks like a nice fix! I've got a few comments so far about the load tests. Some of the comments about choosing the right size for the alloca probably apply to the store tests too.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:2935
+  case AArch64::ST2D_IMM:
+    Scale = TypeSize::Scalable(16) * 2;
+    Width = SVEMaxBytesPerVector * 2;
----------------
nit: I think you can just write TypeSize::Scalable(32) and similarly for the others.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:11
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    ld2b { z0.b, z1.b }, p0/z, [sp]
+; CHECK-NEXT:    add sp, sp, #16
----------------
I'm not sure if this is really exercising the code path you've changed because the stack pointer looks wrong here, i.e. we've not actually allocated anything and are just loading random data. I think it's because nothing got stored into the alloca and so it was optimised away. I wonder if it's worth adding some kind of normal store before the load, i.e. something like

  store [16 x i8] zeroinitializer, [16 x i8]* %alloc, align 16

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:28
+; CHECK-NEXT:    ret
+  %alloc = alloca [4 x i16], i32 1, align 16
+  %ptr = bitcast [4 x i16]* %alloc to i16*
----------------
I think that all the allocas here have to at least match the size of the data you're storing out. The test above has the same problem too I think. I think for all these tests you can just do:

  %alloc = alloca [NumVecs * 32 x i8], i32 1, align 16

I choose '32' here because you've set vscale_range to (2,2). For ld2 NumVecs=2, ld3 NumVecs=3, etc.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:72
+  %bc = bitcast [16 x i8]* %alloc to <vscale x 32 x i8>*
+  %gep = getelementptr inbounds <vscale x 32 x i8>, <vscale x 32 x i8>* %bc, i64 2, i64 0
+  %ld2 = call <vscale x 32 x i8> @llvm.aarch64.sve.ld2.nxv32i8(<vscale x 16 x i1> %pg, i8* %gep)
----------------
Again, here you're indexing into some bigger than the alloca. You need to allocate at least 3 x the space required for <vscale x 32 x i8> here.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:170
+
+define <vscale x 12 x float> @ld3b_f32_valid_imm(<vscale x 4 x i1> %pg) #0 {
+; CHECK-LABEL: ld3b_f32_valid_imm:
----------------
I think this is a valid test, but perhaps worth adding a comment why it's useful? It's because this exercises the path where isAArch64FrameOffsetLegal returns a non-zero stack offset due to extra alloca.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119338/new/

https://reviews.llvm.org/D119338