[PATCH] D119338: [AArch64][SVE] Add structured load/store opcodes to getMemOpInfo

Thu Feb 10 07:41:26 PST 2022

david-arm added a comment.

This is looking good now @kmclaughlin! I just had a few more minor comments on the tests ...

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:79
+; CHECK-NEXT:    ret
+  %alloc = alloca [64 x i8], i32 1
+  %bc = bitcast [64 x i8]* %alloc to <vscale x 32 x i8>*
----------------
Hi @kmclaughlin, I think this needs to be:

  %alloc = alloca [64 x i8], i32 3

because of the GEP.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:183
+; CHECK-NEXT:    ret
+  %alloc = alloca [48 x i16], i32 1
+  %bc = bitcast [48 x i16]* %alloc to <vscale x 24 x i16>*
----------------
I think this also needs to be:

  %alloc = alloca [48 x i16], i32 4

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:204
+; CHECK-NEXT:    ret
+  %alloca1 = alloca <vscale x 4 x float>, i32 2
+  %alloca2 = alloca <vscale x 4 x half>, i32 1
----------------
This needs to be:

  %alloca1 = alloca <vscale x 4 x float>, i32 12

due to the GEP index (9) + the fact you're actually loading <vscale x 12 x float> worth of data.

================
Comment at: llvm/test/CodeGen/AArch64/ldN-reg-imm-alloca.ll:313
+; CHECK-NEXT:    ret
+  %alloca1 = alloca <vscale x 2 x double>, i32 5
+  %alloca2 = alloca <vscale x 4 x i32>, i32 2
----------------
This needs to be

  %alloca1 = alloca <vscale x 2 x double>, i32 13

due to the GEP offset (9) + the data size loaded.

================
Comment at: llvm/test/CodeGen/AArch64/stN-reg-imm-alloca.ll:18
+  %ptr = bitcast [64 x i8]* %alloc to i8*
+  store i8 zeroinitializer, i8* %ptr, align 16
+  call void @llvm.aarch64.sve.st2.nxv16i8(<vscale x 16 x i8> %in, <vscale x 16 x i8> %in, <vscale x 16 x i1> %pg, i8* %ptr)
----------------
Hi @kmclaughlin, I'm afraid I may have sent you down a pointless path here and that's my fault for not explaining this more clearly! For the structured stores you don't actually need the extra store here because the alloca cannot be folded away, since the structured store already prevented that.

If it's not too much trouble, I think it's worth removing these extra stores to make the tests simpler.

NOTE: The only exception to this is where you have two allocas in the test, in which case having a store for each alloca is right!

================
Comment at: llvm/test/CodeGen/AArch64/stN-reg-imm-alloca.ll:88
+; CHECK-NEXT:    ret
+  %alloc = alloca [8 x i64], i32 2
+  %bc = bitcast [8 x i64]* %alloc to <vscale x 2 x i64>*
----------------
I think this needs to be:

  %alloc = alloca [8 x i64], i32 4

to account for the GEP offset + store data.

================
Comment at: llvm/test/CodeGen/AArch64/stN-reg-imm-alloca.ll:344
+; CHECK-NEXT:    ret
+  %alloc = alloca [32 x i32], i32 1
+  %bc = bitcast [32 x i32]* %alloc to <vscale x 4 x i32>*
----------------
I think this needs to be:

  %alloc = alloca [32 x i32], i32 2

================
Comment at: llvm/test/CodeGen/AArch64/stN-reg-imm-alloca.ll:372
+; CHECK-NEXT:    ret
+  %alloca1 = alloca <vscale x 2 x double>, i32 4
+  %alloca2 = alloca <vscale x 2 x double>, i32 1
----------------
I think this needs to be:

  %alloca1 = alloca <vscale x 2 x double>, i32 8

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119338/new/

https://reviews.llvm.org/D119338